ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Thangalin via ntg-context <ntg-context@ntg.nl>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Cc: Thangalin <thangalin@gmail.com>
Subject: Re: String substitution using regular expressions and backreferences
Date: Thu, 25 Aug 2022 12:44:12 -0700	[thread overview]
Message-ID: <CAANrE7pw2N_GCgxtgp4B0UhM7oxSUCLG2KZHQi1okWDN0Rc=ug@mail.gmail.com> (raw)
In-Reply-To: <20329b8b-6347-2bf8-7d63-9f5ac3d01e8e@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 3131 bytes --]

I've attempted to apply Wolfgang's subtle suggestion of using Lua to parse
the input document using a regular expression via lpeg.replacer. The
replacement itself works fine; however, in doing so the XML document
structure is converted to text, which means that it is no longer possible
to "flush" the XML for further processing as XML. The result is that any
unresolved XML tags are written verbatim to the PDF:

https://i.stack.imgur.com/9ZFND.png

There are two other issues with this approach. First is efficiency. Second
is that the processing function would have to be called for every XML
element to capture the replacement.

My original post asked about applying regex word substitution in a ConTeXt
way, such as:

\definereplacement[SubstMac][ match={Mc([A-Z].*)}, replace={\Mac \\1} ]
\definereplacement[SubstPostmeridian][ match={[Pp]\\.[Mm]\\.},
replace={\cap{pm}} ]

That seems like the cleanest approach because it would work on top of XML
or any other source document. Nevertheless, here is what I tried, which
partially works:

\startbuffer[main]
<html>
  <p>“Mr. McAnulty, I presume?”</p>
  <p>Regular text. <em>Irregular text.</em></p>
</html>\stopbuffer
\startxmlsetups xml:xhtml
  \xmlsetsetup{\xmldocument}{*}{-}
  \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}\stopxmlsetups
\startxmlsetups xml:html
  \startdocument
    \xmlflush{#1}
  \stopdocument\stopxmlsetups
% Paragraphs are followed by a paragraph break, but only if not
nested.\startxmlsetups xml:p
  \xmlfunction{#1}{p}
  \par\stopxmlsetups
\startxmlsetups xml:em
  \dontleavehmode{\em\xmlflush{#1}}\stopxmlsetups
\startluacode
function xml.functions.p( t )
  rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
  x = lpeg.replacer( rep ):match( tostring( xml.text( t ) ) )

  buffers.assign( "p", context( x ) )
  context.getbuffer{ "p" }
end\stopluacode
\xmlregistersetup{xml:xhtml}
\def\Mac{%
  % Determine the sizes of 'M' and 'c'.
  \newbox\MacMBox%
  \setbox\MacMBox\hbox{M}%
  \newbox\MacCBox%
  \setbox\MacCBox\hbox{c}%
  %
  % Cheat to dynamically derive the kerning size by putting Mc in a box.
  %
  \newbox\MacKernBox%
  \setbox\MacKernBox\hbox{\inframed[offset=\zeropoint, width=fit]{Mc}}%
  \def\MacDelta{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
  \def\MacUWidth{\dimexpr\wd\MacCBox-.75\MacDelta\relax}%
  \def\MacRule{\vrule width \MacUWidth height .04em depth \zeropoint \relax}%
  \def\MacKern{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
  \def\MacHeight{\dimexpr\ht\MacMBox-\ht\MacCBox\relax}%
  %
  % Write Mc, where c has a macron, to the document.
  %
  M{%
    \dontleavehmode{\raisebox{\MacHeight}\hbox{c}}%
    \kern-1.04\MacUWidth
    \MacRule
    \kern.08\MacUWidth
  }%
}%
\xmlprocessbuffer{main}{main}{}

As shown in the screen shot, this doesn't correctly handle nested XML
elements.

Any ideas on what approach to take to perform a string replacement in
ConTeXt?

Thanks again!


[Your] input is XML which means a lot more can be done than your simple TeX
> based example demonstrates.
>
> Wolfgang
>
>

[-- Attachment #1.2: Type: text/html, Size: 9717 bytes --]

[-- Attachment #2: Type: text/plain, Size: 496 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

  parent reply	other threads:[~2022-08-25 19:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-01 19:58 Thangalin via ntg-context
2022-08-01 20:13 ` Wolfgang Schuster via ntg-context
2022-08-01 20:56   ` Thangalin via ntg-context
2022-08-25 19:44   ` Thangalin via ntg-context [this message]
2022-08-26  7:34     ` Hans Hagen via ntg-context

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAANrE7pw2N_GCgxtgp4B0UhM7oxSUCLG2KZHQi1okWDN0Rc=ug@mail.gmail.com' \
    --to=ntg-context@ntg.nl \
    --cc=thangalin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).