From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/115604 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Thangalin via ntg-context Newsgroups: gmane.comp.tex.context Subject: String substitution using regular expressions and backreferences Date: Mon, 1 Aug 2022 12:58:53 -0700 Message-ID: Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1591946875588563314==" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12634"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Thangalin To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Mon Aug 01 22:00:12 2022 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane-mx.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oIbaG-00036T-CJ for gctc-ntg-context-518@m.gmane-mx.org; Mon, 01 Aug 2022 22:00:12 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 1863828A6FF; Mon, 1 Aug 2022 21:59:31 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eOO7rqoDhzGp; Mon, 1 Aug 2022 21:59:28 +0200 (CEST) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id EB16F28A6FD; Mon, 1 Aug 2022 21:59:27 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 6904F28A6EE for ; Mon, 1 Aug 2022 21:59:08 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qwee-KmkyzzW for ; Mon, 1 Aug 2022 21:59:06 +0200 (CEST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.167.47; helo=mail-lf1-f47.google.com; envelope-from=thangalin@gmail.com; receiver= Original-Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id 4064328A6C1 for ; Mon, 1 Aug 2022 21:59:06 +0200 (CEST) Original-Received: by mail-lf1-f47.google.com with SMTP id x39so9390453lfu.7 for ; Mon, 01 Aug 2022 12:59:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=OwuTcKvlEEr1PUCwHCg54t9/OqhWhQFoiUPUZ4HxWf0=; b=A3ylosbgz8b6mZIpl0SQXxNF5SbJD2NgliGB79ZAAAnLC9XnARLKgOBIDX5uR0wkpT Lm2A55584KhjfkClMvEKZzKcXrJwReDyoHqdxidgFhKjsqpm+WT29g3BK9MW8lkE5/KK o9Nk8EuM41V1c+XboGTYUgWNHrN8lp7rYTBPb81jNwvX+ViypWr86aWzBSZoPENkbtxN puHKZr+kw37O62RElsMh6XRhHdu2iGtXnZDoDl3LbJKSj412CBP+jImSPaFN6KE4zTnO urb4X3kBtSNHrr+l/BhnkSgLuU+ZtBv9LdMXP/0CJUBTYZ1YY02j+fttN+0S632FeJPm JckQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=OwuTcKvlEEr1PUCwHCg54t9/OqhWhQFoiUPUZ4HxWf0=; b=vViJ4YRArQw5YVuXD+DlJS9xeHmrOLNs+zO7emNJh50xJjIFIJX2XgaoVwWgMY21rr nW57ajLbtakkA1V0rjLXID4irpUf/hUsCNdU4/MH3VG5b7llIVzyvZWZzbdvLV5OoynN 1M2EaSOUS+qwO10n5yL9pGcLGkTW1E6qqV+YEbCKca8ZAsXqrL6ooiv8Q6l3jXU6SjKq /nVCoDnuR+JjnOVoXC82c8vDPQGIo+AANOYUDGF0oEDLW0+gMY13iNwJMv7OOfJnDNon GXsqVShGGetHWBoIpFqc2Dj88jAL17so+4I9To369qm1rj7k6DeWDYfoY3T8ivJUUrfG mmhA== X-Gm-Message-State: AJIora80ocI5TbKHQLRYPck3fwn+SHAHEGJlxS1hCDiaNC7VfeZo2g5B e14iH8xDH4jtSmnSIllBCDe0AOLZ1R6kSlRjQf2Bt+mK8bY= X-Google-Smtp-Source: AGRyM1syTP+krtsxWyCKJyQ1U+Eqf1kTPHcmkqe24pl62bnZRXDzwoMw6pTeim6arlm9szDp4wKVs91HUY/dfz63qss= X-Received: by 2002:a05:6512:22c8:b0:488:e69b:9311 with SMTP id g8-20020a05651222c800b00488e69b9311mr6032284lfu.564.1659383945029; Mon, 01 Aug 2022 12:59:05 -0700 (PDT) X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.io gmane.comp.tex.context:115604 Archived-At: --===============1591946875588563314== Content-Type: multipart/alternative; boundary="00000000000096ebe005e5336e21" --00000000000096ebe005e5336e21 Content-Type: text/plain; charset="UTF-8" Hi list, I'm looking to perform text replacements. \definereplacement[SubstPostmeridian][ match={[Pp].[Mm].]}, replace={\cap{pm}} ] The \replaceword command doesn't handle periods well. The translate module doesn't seem flexible enough to cover edge cases. Consider the following example document containing both sample inputs and sample outputs: \starttext {\bf Markdown Input} Our grandmother clock rang 11 p.m. and we fled. Our grandmother clock rang 11 p.m., so we fled. Our grandmother clock rang 11 p.m. We fled. \blank[big] {\bf \ConTeXt{} Output} Our grandmother clock rang 11 \cap{pm} and we fled. Our grandmother clock rang 11 \cap{pm}, so we fled. Our grandmother clock rang 11 \cap{pm}. We fled. \stoptext It would be most convenient to write: % Strip periods from p.m. \definereplacement[SubstPostmeridianLowercase][ match={[Pp].[Mm]. ([^:upper:])}, replace={\cap{pm} \1} ] % Preserve terminal period for p.m. (e.e. cummings notwithstanding) \definereplacement[SubstPostmeridianTerminal][ match={[Pp].[Mm]. ([:upper:])}, replace={\cap{pm}. \1} ] % Apply a macron for lowercase 'c' (McAnulty, McGenius, etc.) % Well, not quite a macron: https://tex.stackexchange.com/q/364024/2148 \definereplacement[SubstMac][ match={Mc([:upper:]\w)}, replace={M\macronbelow{c}\1} ] The \1 may be problematic. Other sigils include $1 and #1, which may also have issues. Thank you! --00000000000096ebe005e5336e21 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi = list,

I'm looking to = perform text replacements.

\definereplacement[SubstPostmeridian][
=C2=A0 match=3D{[Pp].[Mm].]= },
=C2=A0 replace=3D{\cap{pm}}
]

The \replaceword command doesn't handle periods well.= The translate module doesn't seem flexible enough to cover edge cases.= Consider the following example document containing both sample inputs and = sample outputs:

\starttext
=C2=A0 {\bf Markdown Input}

=C2= =A0 Our grandmother clock rang 11 p.m. and we fled.

=C2=A0 Our grand= mother clock rang 11 p.m., so we fled.

=C2=A0 Our grandmother clock = rang 11 p.m. We fled.

=C2=A0 \blank[big]

=C2=A0 {\bf \ConTeXt= {} Output}

=C2=A0 Our grandmother clock rang 11 \cap{pm} and we fled= .

=C2=A0 Our grandmother clock rang 11 \cap{pm}, so we fled.

= =C2=A0 Our grandmother clock rang 11 \cap{pm}. We fled.
\stoptext

It would be most convenient= to write:

=
% Strip period= s from p.m.
\definereplacement[SubstPostmeridianLowercase][
=C2=A0 ma= tch=3D{[Pp].[Mm]. ([^:upper:])},
=C2=A0 replace=3D{\cap{pm} \1}
]

% Preserve terminal period for p.m. (e.e. cummin= gs notwithstanding)
\definereplacement[SubstPostmeridianTerminal][
= =C2=A0 match=3D{[Pp].[Mm]. ([:upper:])},
=C2=A0 replace=3D{\cap{pm}. \1}=
]
--00000000000096ebe005e5336e21-- --===============1591946875588563314== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cHM6Ly93d3cubnRnLm5sL21haWxtYW4v bGlzdGluZm8vbnRnLWNvbnRleHQKd2VicGFnZSAgOiBodHRwczovL3d3dy5wcmFnbWEtYWRlLm5s IC8gaHR0cDovL2NvbnRleHQuYWFuaGV0Lm5ldAphcmNoaXZlICA6IGh0dHBzOi8vYml0YnVja2V0 Lm9yZy9waGcvY29udGV4dC1taXJyb3IvY29tbWl0cy8Kd2lraSAgICAgOiBodHRwczovL2NvbnRl eHRnYXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --===============1591946875588563314==--