From: Hans Hagen via ntg-context <ntg-context@ntg.nl>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Cc: Hans Hagen <j.hagen@freedom.nl>
Subject: Re: String substitution using regular expressions and backreferences
Date: Fri, 26 Aug 2022 09:34:08 +0200 [thread overview]
Message-ID: <64e5b76d-4443-838f-ce0e-753ac5ea2e0c@freedom.nl> (raw)
In-Reply-To: <CAANrE7pw2N_GCgxtgp4B0UhM7oxSUCLG2KZHQi1okWDN0Rc=ug@mail.gmail.com>
On 8/25/2022 9:44 PM, Thangalin via ntg-context wrote:
> I've attempted to apply Wolfgang's subtle suggestion of using Lua to parse
> the input document using a regular expression via lpeg.replacer. The
> replacement itself works fine; however, in doing so the XML document
> structure is converted to text, which means that it is no longer possible
> to "flush" the XML for further processing as XML. The result is that any
> unresolved XML tags are written verbatim to the PDF:
>
> https://i.stack.imgur.com/9ZFND.png
>
> There are two other issues with this approach. First is efficiency. Second
> is that the processing function would have to be called for every XML
> element to capture the replacement.
>
> My original post asked about applying regex word substitution in a ConTeXt
> way, such as:
>
> \definereplacement[SubstMac][ match={Mc([A-Z].*)}, replace={\Mac \\1} ]
> \definereplacement[SubstPostmeridian][ match={[Pp]\\.[Mm]\\.},
> replace={\cap{pm}} ]
>
> That seems like the cleanest approach because it would work on top of XML
> or any other source document. Nevertheless, here is what I tried, which
> partially works:
>
> \startbuffer[main]
> <html>
> <p>“Mr. McAnulty, I presume?”</p>
> <p>Regular text. <em>Irregular text.</em></p>
> </html>\stopbuffer
> \startxmlsetups xml:xhtml
> \xmlsetsetup{\xmldocument}{*}{-}
> \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}\stopxmlsetups
> \startxmlsetups xml:html
> \startdocument
> \xmlflush{#1}
> \stopdocument\stopxmlsetups
> % Paragraphs are followed by a paragraph break, but only if not
> nested.\startxmlsetups xml:p
> \xmlfunction{#1}{p}
> \par\stopxmlsetups
> \startxmlsetups xml:em
> \dontleavehmode{\em\xmlflush{#1}}\stopxmlsetups
> \startluacode
> function xml.functions.p( t )
> rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
> x = lpeg.replacer( rep ):match( tostring( xml.text( t ) ) )
>
> buffers.assign( "p", context( x ) )
> context.getbuffer{ "p" }
> end\stopluacode
> \xmlregistersetup{xml:xhtml}
> \def\Mac{%
> % Determine the sizes of 'M' and 'c'.
> \newbox\MacMBox%
> \setbox\MacMBox\hbox{M}%
> \newbox\MacCBox%
> \setbox\MacCBox\hbox{c}%
> %
> % Cheat to dynamically derive the kerning size by putting Mc in a box.
> %
> \newbox\MacKernBox%
> \setbox\MacKernBox\hbox{\inframed[offset=\zeropoint, width=fit]{Mc}}%
> \def\MacDelta{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
> \def\MacUWidth{\dimexpr\wd\MacCBox-.75\MacDelta\relax}%
> \def\MacRule{\vrule width \MacUWidth height .04em depth \zeropoint \relax}%
> \def\MacKern{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
> \def\MacHeight{\dimexpr\ht\MacMBox-\ht\MacCBox\relax}%
> %
> % Write Mc, where c has a macron, to the document.
> %
> M{%
> \dontleavehmode{\raisebox{\MacHeight}\hbox{c}}%
> \kern-1.04\MacUWidth
> \MacRule
> \kern.08\MacUWidth
> }%
> }%
> \xmlprocessbuffer{main}{main}{}
>
> As shown in the screen shot, this doesn't correctly handle nested XML
> elements.
>
> Any ideas on what approach to take to perform a string replacement in
> ConTeXt?
Best stay at the xml end ...
\startbuffer[main]
<html>
<p>“Mr. McAnulty, I presume?”</p>
<p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer
\startxmlsetups xml:xhtml
\xmlsetsetup{\xmldocument}{*}{-}
\xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups
\startxmlsetups xml:html
\xmlflush{#1}
\stopxmlsetups
\startxmlsetups xml:p
\xmlfunction{#1}{p}
\xmlcontext{#1}
\par
\stopxmlsetups
\startxmlsetups xml:em
\dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups
\startluacode
local rep = lpeg.replacer { [1] = { "McAnulty", "\\Mac Anulty" } }
function xml.functions.p(t)
local dt = t.dt
for i=1,#dt do
local di = dt[i]
if type(di) == "string" then
dt[i] = lpeg.match(rep,di)
end
end
end
\stopluacode
\xmlregistersetup{xml:xhtml}
\startdocument
\xmlprocessbuffer{main}{main}{}
\stopdocument
But this is more fun and probably also more reliable:
\startbuffer[main]
<html>
<p>“Mr. McAnulty, I presume?”</p>
<p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer
\startxmlsetups xml:xhtml
\xmlsetsetup{\xmldocument}{*}{-}
\xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups
\startxmlsetups xml:html
\xmlflush{#1}
\stopxmlsetups
\startxmlsetups xml:p
\xmlcontext{#1}
\par
\stopxmlsetups
\startxmlsetups xml:em
\dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups
\xmlregistersetup{xml:xhtml}
\usemodule[gimmicks] % in latest uploads
\chardef\MacAnulty = \getprivateglyphslot{MacAnulty}
\startsetups [box:mcanulty:\number\MacAnulty]
\Mac Anulty
\stopsetups
\registerboxglyph category {mcanulty} unicode \MacAnulty \relax
\startluacode
fonts.handlers.otf.addfeature {
name = "mcanulty",
type = "ligature",
nocheck = true,
data = {
[fonts.constructors.privateslots.MacAnulty] = {
"M", "c", "A", "n", "u", "l", "t", "y",
},
}
}
\stopluacode
\definefontfeature[default][default][box=mcanulty,mcanulty=yes]
\startdocument
\xmlprocessbuffer{main}{main}{}
\stopdocument
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage : https://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : https://contextgarden.net
___________________________________________________________________________________
prev parent reply other threads:[~2022-08-26 7:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-01 19:58 Thangalin via ntg-context
2022-08-01 20:13 ` Wolfgang Schuster via ntg-context
2022-08-01 20:56 ` Thangalin via ntg-context
2022-08-25 19:44 ` Thangalin via ntg-context
2022-08-26 7:34 ` Hans Hagen via ntg-context [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64e5b76d-4443-838f-ce0e-753ac5ea2e0c@freedom.nl \
--to=ntg-context@ntg.nl \
--cc=j.hagen@freedom.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).