From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31593 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bastien DUMONT Newsgroups: gmane.text.pandoc Subject: Re: Glossary Filter for MD2Tex Date: Tue, 18 Oct 2022 22:06:42 +0000 Message-ID: References: <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com> <3307993F-F813-405F-BFEC-F17FAF27BEA5@gmail.com> <7072522D-F2FE-4BAC-A575-93426852FCFB@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3464"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDCINCES2QJRB5OGXSNAMGQEH3BH6YI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Oct 19 00:06:49 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wm1-f63.google.com ([209.85.128.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1okujZ-0000mH-5v for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 19 Oct 2022 00:06:49 +0200 Original-Received: by mail-wm1-f63.google.com with SMTP id az11-20020a05600c600b00b003c6e3d4d5b1sf8713018wmb.7 for ; Tue, 18 Oct 2022 15:06:49 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666130808; cv=pass; d=google.com; s=arc-20160816; b=IVKoYuZooRozaTT2rCm2DCryTSqrcvOpjlhEbYTANHqdeiEfcc7gvmlYBQNVDmbSUh 0yIqsJ9UJYW/NDTTXnzMePdS0OwwekvXRR455I4Zd/ndvwO79E6BCUbX9o0jpHY6nJlB hJfc1n74JTPwLesldX42hayMcaDCiKLA7TH+ZLvHU+yFnwees1SxudvypfOUKTkVODbw 3xc2exhPO7HX5Y+Mw3dgzmkbfW3nHj9iY6H1wbb4BgHEzLqm8ESuiWWsDx/7jdRlrv1b 1ved2a7sQpGNouJd+Lli2VkX86my/UacRM9nePrhgH725vBFd7Bf3FBuhT2Xxt7MB9y+ aubw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:dkim-signature; bh=ZS8AOKrO/8ocVw7l7HHoLT7iBBSrepc4Dw0N4IoXNLg=; b=poMgLDidIJ+RjbEaMIPDW/HVTqVPosdu1PiT+ZrFzE9GIFPz45T5U8o8DWjTvY/9VR kcxGWp+7iniip7BEzP2dkWSE9XrBuVFDX4pU6AVJf09DqLFIbxeWaj9kUx2evtQIowwV O4HTx4R0XwI2mJgfke6FIpJDECFVuikPEAnoueri6PqGi2Zx965lu5zsEKG7Tdfj1dI1 kPNQ+WrJjMKfeNiwQpmOEkWQrPT8v6X1qcba/cCEv/A/Il6Wi2YtYr13DLaA6HkoW9F6 XxvKT/5NBPBohjmBQUBg41P42pmeCflsnroMydXYLoxaY77xL0j1idEm4XDJXRx0svIB quww== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=R5fCorEn; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:from:to:cc :subject:date:message-id:reply-to; bh=ZS8AOKrO/8ocVw7l7HHoLT7iBBSrepc4Dw0N4IoXNLg=; b=gBpLQcK21RDOoZaQtpXSR03KUAaphEzLyhxlO+xt8FNd2H8AKfWNZeBe3iHeg2QvtC SQdOvYeCYCePYeG5K1jK4/9q/nV/TcuFiU+Kp7Ii/QRoWzCDtbuLvqs51AhTgWhK9wBl upeQFpomDj0EUHg6xjsXQ2viCYZLuGD6s+qHVlMgxJ3tZJ6tcCyQG6CNfvxFOgkDQYOV 46dSFLGzI6RkybSKreywtqhxx5JTxUPHNFdqfKJGewva84q/SUzjIXENcxV0BKYS3gX2 jHwxgAPk8cwdTdLMzBBTImhETEryU7o9GmVyKSiZCqk6V X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=ZS8AOKrO/8ocVw7l7HHoLT7iBBSrepc4Dw0N4IoXNLg=; b=SVjw7yKflNSB4w8XEwFfS2cCFjakW+lyG9c/vSM/V3QbkpdoRmyoo6uT/LdQe3mYOO yA+NzT09WYCfWR2yi+4/dze8E92E4Kml9Iuw+YflyREtY3uXTkjVbWRla5G0MmhBnjOI QxqXqKtAR8tlE6gup+yK+dyFWru6nIHM6v34CJTTif29jYzabMlpOneVICOiuovpKD+T AT2smK28vt6Fqh/Mr5LyQ9QkbMjjg2YtvofnFTqOVwmPYe74M7aMlOeoaJjpFI90gPXG Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf2e5Gajn6Crn3FdaHq47U98jbPaiIKLhSfQR5pm//FLMLncbwN9 5Et4s/TS52uVRURmfoyfmCo= X-Google-Smtp-Source: AMsMyM7QinoyP8ks/IJqhYRjZDrMrupWQO0ZpJcMDx4MBrZX9CdwZl0a8OMEfEB/D1S+yuqJjk5xTg== X-Received: by 2002:a05:6000:1549:b0:22e:519e:f39d with SMTP id 9-20020a056000154900b0022e519ef39dmr3127483wry.703.1666130808751; Tue, 18 Oct 2022 15:06:48 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6000:69e:b0:22e:5d8a:c92d with SMTP id bo30-20020a056000069e00b0022e5d8ac92dls6527598wrb.1.-pod-prod-gmail; Tue, 18 Oct 2022 15:06:44 -0700 (PDT) X-Received: by 2002:a5d:5847:0:b0:231:21fa:ba96 with SMTP id i7-20020a5d5847000000b0023121faba96mr3078536wrf.477.1666130804874; Tue, 18 Oct 2022 15:06:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666130804; cv=none; d=google.com; s=arc-20160816; b=urjaJGZuHmmbac2LYu7Og2kq/t/WULICwU4Se62oxot2uP2bHasZVGPcRylqPCxwak KPwQU+Fx2vH35PmQ9c/ipWVfIDfMLB27fJFWxKTkrtNnLSW4Bd6gSTO3o32Sr4HD4Wg/ OcQJtsjKyikGQ3YACO1fJl6yOp0KtSbOYpGwCa8EgNh3OFcwcgvHAwNtmkiUBW2t641l +yMJoLVzVPvFRsK70OgDwXj8srM4FN8zjl94GcohaFGbmkKnEumsetJK+8SYcg8F+R43 ueI9voZNfn1vShVTFzOM3OdxBYvkxKeG2s5eD/Vcrq6Z+4UzUCXe0grsXAGqOxhj77hA Tj+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=rOT+pCQ0AVAtsXbr2IDl2YmG3ktN/nWQZ23APFtKFyE=; b=UZ3nTgsKtP2hVbGMjiRJ46FPaK1meEpqF2NDuW1GcIFubVEivIb9LgKQDE8dwkN6sJ N1GgTUuWj0c3VT67HpkMhiBSSBop0fMDFGDOw0wx5mbuBpN+I+Ray8mF87TcQ1mZhNg3 3VCsIyxknvm1heunGfgUOT6eADKKBpmpqGZosYzVhMvu5SxuHfibZtlgTTzOmjp5VEMe /1R5NfrqNppQHDxIqKqcOuq9MUrHYrivOyzthP5Vu3GcCWgPyZ+a/bWlk8W6Lpsd8CZD 2imWuay8fnT67xnlp3bRTiuRBVhReexM3ivPHjTQBfUMFojAeXsnLbRI6UFC9M9PBYTf jttw== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=R5fCorEn; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Original-Received: from mout02.posteo.de (mout02.posteo.de. [185.67.36.66]) by gmr-mx.google.com with ESMTPS id m125-20020a1ca383000000b003b56ce98812si138760wme.3.2022.10.18.15.06.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Oct 2022 15:06:44 -0700 (PDT) Received-SPF: pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) client-ip=185.67.36.66; Original-Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 77534240101 for ; Wed, 19 Oct 2022 00:06:44 +0200 (CEST) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4MsSd00M7gz6tmB for ; Wed, 19 Oct 2022 00:06:43 +0200 (CEST) Content-Disposition: inline In-Reply-To: X-Original-Sender: bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=R5fCorEn; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31593 Archived-At: Yes, it could! You would have access to the corresponding metadata object i= n the AST. Le Tuesday 18 October 2022 =C3=A0 06:43:48PM, Bernardo C.D.A. Vasconcelos a= =C3=A9crit : > The data is mostly in database format and could be output in the best for= mat > for the task, but I wanted to make it friendly for other people to use as= well. > Could a YAML metadata block be a solution? >=20 > glossary: > glossary_lang: grc > entries: > - headword: =E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 > text: "=E2=96=A1 *pt.* bom; =E2=96=A1 *en.* good; and so on and so f= orth" > match: > - =CE=B3=CE=B1=CE=B8=CE=AD > - =CE=B3=CE=B1=CE=B8=CE=BF=CE=AF > - =CE=BA=E1=BC=80=CE=B3=CE=AC=CE=B8 > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC=CF=82 > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE=CE=BD > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=B1=CE=AF > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CE=AF > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CF=82 > - headword: =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD > transliteration: agapan > text: "=E2=96=A1 *pt.* estar satisfeito, gostar; =E2=96=A1 *en.* be = satisfied, like;" > match: > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1 > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1=CE=B9=CF=82 > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7 > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CE=BD > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CF=82 > - =E1=BC=80=CE=B3=CE=AC=CF=80=E1=BF=83 > - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6 > - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD > - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CF=82 >=20 > On 18 Oct 2022, at 14:34, Bastien DUMONT wrote: >=20 > No, citeproc receives a data structure produced by pandoc. Pandoc is > responsible for the parsing. I think that your script would not be so= hard > to rewrite in Lua, the main problem is to know if you can achieve you= r > goals this way. If your main concern is portability, then writing a L= ua > filter with no dependancies certainly is a good solution provided tha= t you > feed it with a Lua data structure (or embed the code responsible for = JSON > parsing in your script). >=20 > Le Tuesday 18 October 2022 =C3=A0 02:16:16PM, Bernardo C.D.A. Vasconc= elos a > =C3=A9crit : >=20 > Thank you for the suggestions, Bastien. There is technically no n= eed > for > regex, as all the forms are spelled out to avoid the need to crea= te ad > hoc > regex rules for each term. Now that I think about it, the princip= le is > the > same as Citeproc's: a tagged inline element will be matched again= st a > lookup > table and replaced. I will look at the citeproc code to see if it= leads > anywhere or if it could be reused in anyway. >=20 > On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: >=20 > Yes, but it is limited to this utf8 library. For instance, if > perform a > regexp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE=B8= =CF=8C=CF=82', '[=CE=B3=CE=B4]')`, it try to > match one > of the four bytes inside the square brackets against the stri= ng > '=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it will return = the first byte of =CE=B3, not =CE=B3. To > circumvent > this limitation, you would be forced to test =CE=B3 and =CE= =B4 separately. > Nevertheless, if you always perform comparisons between whole > strings as > you currently do in your script, this should not be a problem= . >=20 > As for your concern with dependancies, you most probably woul= d have > to > rely on a JSON library such as lunajson. However, if your JSO= N > files are > not supposed to change, you could also convert them to a Lua = file > using > a JSON library and a serialization library, so as to be able = to > import > the resulting Lua data structure directly in your filter. >=20 > Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A. > Vasconcelos a > =C3=A9crit : >=20 > As for translating the filter note that Lua can't rea= lly > handle > UTF-8. > There is some rudimentary support for converting code= point > number =E2=86=94 > UTF-8 > byte sequences and for iterating through a string of = bytes > representing > UTF-8 encoded characters but no concept of chars as o= pposed > to > bytes. > This > may become a show stopper if you need to manipulate s= trings > containing > UTF-8 text. >=20 > Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 on= wards > includes > UTF-8 support. Have you seen it? E.g. [1]https:// > q-syshelp.qsc.com/Content/Control_Scripting/ > Lua_5.3_Reference_Manual/Standard_Libraries/ > 4_-_Basic_UTF-8_Support.htm >=20 > For Ancient Greek you want grc as the language tag. >=20 > Indeed it is (and that is generally what I use), but =E1= =BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 is > just > Polytonic Greek, which is not the same as Ancient Greek. >=20 > -- > You received this message because you are subscribed to t= he > Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails = from > it, > send an email to pandoc-discuss+unsubscribe@googlegroups.= com. > To view this discussion on the web visit [2]https:// > groups.google.com/d/msgid/pandoc-discuss/ > 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com. >=20 > -- > You received this message because you are subscribed to the G= oogle > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from= it, > send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [3]https:// > groups.google.com/d/msgid/pandoc-discuss/ > Y07VnbuRsuqUg8US%40localhost. >=20 > -- > You received this message because you are subscribed to the Googl= e > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it,= send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [4]https://groups.google= .com/d > /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmai= l.com. >=20 > -- > You received this message because you are subscribed to the Google Gr= oups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, sen= d an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [5]https://groups.google.com= /d/ > msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost. >=20 > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email > to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [7]https://groups.google.com/d/m= sgid/ > pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com. >=20 > References: >=20 > [1] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference= _Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm > [2] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-B= FEC-F17FAF27BEA5%40gmail.com > [3] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40l= ocalhost > [4] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A= 575-93426852FCFB%40gmail.com > [5] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%4= 0localhost > [6] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [7] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-B= A96-2E37BA4FB779%40gmail.com?utm_medium=3Demail&utm_source=3Dfooter --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.