ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Hans Hagen via ntg-context <ntg-context@ntg.nl>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Cc: Hans Hagen <j.hagen@freedom.nl>
Subject: Re: String substitution using regular expressions and backreferences
Date: Fri, 26 Aug 2022 09:34:08 +0200	[thread overview]
Message-ID: <64e5b76d-4443-838f-ce0e-753ac5ea2e0c@freedom.nl> (raw)
In-Reply-To: <CAANrE7pw2N_GCgxtgp4B0UhM7oxSUCLG2KZHQi1okWDN0Rc=ug@mail.gmail.com>

On 8/25/2022 9:44 PM, Thangalin via ntg-context wrote:
> I've attempted to apply Wolfgang's subtle suggestion of using Lua to parse
> the input document using a regular expression via lpeg.replacer. The
> replacement itself works fine; however, in doing so the XML document
> structure is converted to text, which means that it is no longer possible
> to "flush" the XML for further processing as XML. The result is that any
> unresolved XML tags are written verbatim to the PDF:
> 
> https://i.stack.imgur.com/9ZFND.png
> 
> There are two other issues with this approach. First is efficiency. Second
> is that the processing function would have to be called for every XML
> element to capture the replacement.
> 
> My original post asked about applying regex word substitution in a ConTeXt
> way, such as:
> 
> \definereplacement[SubstMac][ match={Mc([A-Z].*)}, replace={\Mac \\1} ]
> \definereplacement[SubstPostmeridian][ match={[Pp]\\.[Mm]\\.},
> replace={\cap{pm}} ]
> 
> That seems like the cleanest approach because it would work on top of XML
> or any other source document. Nevertheless, here is what I tried, which
> partially works:
> 
> \startbuffer[main]
> <html>
>    <p>“Mr. McAnulty, I presume?”</p>
>    <p>Regular text. <em>Irregular text.</em></p>
> </html>\stopbuffer
> \startxmlsetups xml:xhtml
>    \xmlsetsetup{\xmldocument}{*}{-}
>    \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}\stopxmlsetups
> \startxmlsetups xml:html
>    \startdocument
>      \xmlflush{#1}
>    \stopdocument\stopxmlsetups
> % Paragraphs are followed by a paragraph break, but only if not
> nested.\startxmlsetups xml:p
>    \xmlfunction{#1}{p}
>    \par\stopxmlsetups
> \startxmlsetups xml:em
>    \dontleavehmode{\em\xmlflush{#1}}\stopxmlsetups
> \startluacode
> function xml.functions.p( t )
>    rep = { [1] = { "McAnulty", "\\Mac Anulty" } }
>    x = lpeg.replacer( rep ):match( tostring( xml.text( t ) ) )
> 
>    buffers.assign( "p", context( x ) )
>    context.getbuffer{ "p" }
> end\stopluacode
> \xmlregistersetup{xml:xhtml}
> \def\Mac{%
>    % Determine the sizes of 'M' and 'c'.
>    \newbox\MacMBox%
>    \setbox\MacMBox\hbox{M}%
>    \newbox\MacCBox%
>    \setbox\MacCBox\hbox{c}%
>    %
>    % Cheat to dynamically derive the kerning size by putting Mc in a box.
>    %
>    \newbox\MacKernBox%
>    \setbox\MacKernBox\hbox{\inframed[offset=\zeropoint, width=fit]{Mc}}%
>    \def\MacDelta{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
>    \def\MacUWidth{\dimexpr\wd\MacCBox-.75\MacDelta\relax}%
>    \def\MacRule{\vrule width \MacUWidth height .04em depth \zeropoint \relax}%
>    \def\MacKern{\dimexpr\wd\MacKernBox-\wd\MacMBox-\wd\MacCBox\relax}%
>    \def\MacHeight{\dimexpr\ht\MacMBox-\ht\MacCBox\relax}%
>    %
>    % Write Mc, where c has a macron, to the document.
>    %
>    M{%
>      \dontleavehmode{\raisebox{\MacHeight}\hbox{c}}%
>      \kern-1.04\MacUWidth
>      \MacRule
>      \kern.08\MacUWidth
>    }%
> }%
> \xmlprocessbuffer{main}{main}{}
> 
> As shown in the screen shot, this doesn't correctly handle nested XML
> elements.
> 
> Any ideas on what approach to take to perform a string replacement in
> ConTeXt?
Best stay at the xml end ...

\startbuffer[main]
<html>
   <p>“Mr. McAnulty, I presume?”</p>
   <p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer

\startxmlsetups xml:xhtml
   \xmlsetsetup{\xmldocument}{*}{-}
   \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups

\startxmlsetups xml:html
     \xmlflush{#1}
\stopxmlsetups

\startxmlsetups xml:p
     \xmlfunction{#1}{p}
     \xmlcontext{#1}
     \par
\stopxmlsetups

\startxmlsetups xml:em
   \dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups

\startluacode
     local rep = lpeg.replacer { [1] = { "McAnulty", "\\Mac Anulty" } }
     function xml.functions.p(t)
         local dt = t.dt
         for i=1,#dt do
             local di = dt[i]
             if type(di) == "string" then
                 dt[i] = lpeg.match(rep,di)
             end
         end
     end
\stopluacode

\xmlregistersetup{xml:xhtml}

\startdocument
     \xmlprocessbuffer{main}{main}{}
\stopdocument

But this is more fun and probably also more reliable:

\startbuffer[main]
<html>
   <p>“Mr. McAnulty, I presume?”</p>
   <p>Regular text. <em>Irregular text.</em></p>
</html>
\stopbuffer

\startxmlsetups xml:xhtml
   \xmlsetsetup{\xmldocument}{*}{-}
   \xmlsetsetup{\xmldocument}{html|p|em}{xml:*}
\stopxmlsetups

\startxmlsetups xml:html
     \xmlflush{#1}
\stopxmlsetups

\startxmlsetups xml:p
     \xmlcontext{#1}
     \par
\stopxmlsetups

\startxmlsetups xml:em
   \dontleavehmode{\em\xmlflush{#1}}
\stopxmlsetups

\xmlregistersetup{xml:xhtml}

\usemodule[gimmicks] % in latest uploads

\chardef\MacAnulty = \getprivateglyphslot{MacAnulty}

\startsetups [box:mcanulty:\number\MacAnulty]
     \Mac Anulty
\stopsetups

\registerboxglyph category {mcanulty} unicode \MacAnulty \relax

\startluacode
     fonts.handlers.otf.addfeature {
         name    = "mcanulty",
         type    = "ligature",
         nocheck = true,
         data    = {
             [fonts.constructors.privateslots.MacAnulty] = {
                 "M", "c", "A", "n", "u", "l", "t", "y",
             },
         }
     }
\stopluacode

\definefontfeature[default][default][box=mcanulty,mcanulty=yes]

\startdocument
     \xmlprocessbuffer{main}{main}{}
\stopdocument

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

      reply	other threads:[~2022-08-26  7:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-01 19:58 Thangalin via ntg-context
2022-08-01 20:13 ` Wolfgang Schuster via ntg-context
2022-08-01 20:56   ` Thangalin via ntg-context
2022-08-25 19:44   ` Thangalin via ntg-context
2022-08-26  7:34     ` Hans Hagen via ntg-context [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64e5b76d-4443-838f-ce0e-753ac5ea2e0c@freedom.nl \
    --to=ntg-context@ntg.nl \
    --cc=j.hagen@freedom.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).