From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31602 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bastien DUMONT Newsgroups: gmane.text.pandoc Subject: Re: Glossary Filter for MD2Tex Date: Wed, 19 Oct 2022 21:28:41 +0000 Message-ID: References: <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com> <3307993F-F813-405F-BFEC-F17FAF27BEA5@gmail.com> <7072522D-F2FE-4BAC-A575-93426852FCFB@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="PBU288CvtxbvIz8n" Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11961"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDCINCES2QJRBDGYYGNAMGQEII2TKKQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Oct 19 23:28:49 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wm1-f63.google.com ([209.85.128.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1olGcJ-0002rY-JO for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 19 Oct 2022 23:28:47 +0200 Original-Received: by mail-wm1-f63.google.com with SMTP id 125-20020a1c0283000000b003c6d73209b0sf4912508wmc.1 for ; Wed, 19 Oct 2022 14:28:47 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666214927; cv=pass; d=google.com; s=arc-20160816; b=SV2Ge+KYSzWVZFZkhOiRWPan0vICogeQPW+p1K6BQl+1g7wrG5SijdoppoL0soKr/h RKsmgCLUC6tk6tna8FAQL7IVgiga890RUYcjJiiQJQeci5Ra1wj6TRuFNwcvqdhJ8Fi+ cwfIKypzeZCFpbWAzDrXgh4aYR3fXWbt3wThND0qCKVtYAjkCXP4E35kuVVMQk2DYfDs 3ObVwrHbY61hPe3U8lyI/IdiMTA9/kJOjP5HUA4UdAoqkucYEpkpIZnSxhtySWAqsgRO mKDUwKKm9sP4EncoQHrDxDdo4E9gr3CqD32BrJ7+RBGDTOux/X0lL3U3Mw+eX4BO8TBq XA+g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:dkim-signature; bh=z1vgTM8vl5FQvCOwEfD3LnZ/8+WjAeP9e8kzQ+of8ek=; b=beqjIV1MM4sJAHaiMPh4gmTvwGyLC+pVlYMYQzdV4/Ps6iAxW3hQxBbpMyZX/1pIrZ Wq2ERcnWFwIaciY7y/yb6Y+ueYatM6Xhhb8lEuf/WkJhv6VnNPLf2u6FrBlTnaY3h+8h ZOAT1LJF3L82CZdw4CQKuhpOE1JNRTeR3LJ+DX7qI4gTfDEYYqTHh92XYL695f4uH01D rxx8uIATsyzxQLGKiJ44gRXxvG1p/ZR00CVk2Tn9FrRl3PhZmaPRwC8+c/B2xJNT1bTv vZLxjbpusw36J4bJgOv8EFVs6kV9EG2Y4H0osLmQa2q2gV/rYiky7qrCwl05UjDrJxSC nfJw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=dxnwdMhC; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:from:to:cc :subject:date:message-id:reply-to; bh=z1vgTM8vl5FQvCOwEfD3LnZ/8+WjAeP9e8kzQ+of8ek=; b=r9gni+rIwceofteDTF83uLmFsRi4LIkIyHTj3auL5E9mTUI7EKEQq6yyTQm3aANB66 M9+qql+mllo6gnzTPDHw1LQlEZbFpcnAa6v3XF3za8GVKOgn/I9cH/zVujQMLf9FC4T0 LP4ttfHV44UDiRu9Qfgn5g+s0eM4MeF71/y6G/F2y0zoWHLcmo7Esy+jFyfl+cRySajn 8yTWdkkh2n14ZYIkqfgN8OVl26OASLV4KMxXaOj9VTTmuQtIhxGM62ZiIW3ws/XpZPVX Ls0zpZQFgfMRP3CEh4P2a9OfQ2QPnhkwKLiQKc4L+nwmN X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=z1vgTM8vl5FQvCOwEfD3LnZ/8+WjAeP9e8kzQ+of8ek=; b=eZwouYB6R11QeYRojvmsHMxBfPs49Vi3omWXfM46ihi8UZRjLJaqHo/YQwofVuLLpI aWuOnYdbzhYxegSjg7AiWVNL2jnsZ9KVjgrE1ws6MI4HvyqBTeUhwpc6O1nXyi7jRr2O bsx8tGw8OijWZMidRuBdVsQZcdgN8xmrpNXzayKJ7oy5TFNnD9dCjPfP1Ehdr9L8QVOD K2Qlvo6421p/ioRcP47YqEa2jvjMgjm6sRN54zKxmUIq2DqNb8kK92NfJ8HemGc0yEOB Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf2/6PP/FlsdQGMTNHcylbXOwtIuQF2wyIGdZEgobvkNiHZv+mWJ /d8uBsQtAhp8cCuGS035beA= X-Google-Smtp-Source: AMsMyM4pwyY3jhwFxWUA8k0KmC856WSlH9QE5RfpR1UE9VrjEdTog75zkXUrrY8C2+ZgunOFX2UwxQ== X-Received: by 2002:adf:f911:0:b0:21e:c0f6:fd26 with SMTP id b17-20020adff911000000b0021ec0f6fd26mr6379223wrr.361.1666214927199; Wed, 19 Oct 2022 14:28:47 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6000:69e:b0:22e:5d8a:c92d with SMTP id bo30-20020a056000069e00b0022e5d8ac92dls11378850wrb.1.-pod-prod-gmail; Wed, 19 Oct 2022 14:28:43 -0700 (PDT) X-Received: by 2002:adf:ef82:0:b0:234:ef87:dc8d with SMTP id d2-20020adfef82000000b00234ef87dc8dmr1318134wro.297.1666214923134; Wed, 19 Oct 2022 14:28:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666214923; cv=none; d=google.com; s=arc-20160816; b=otImgUtvdD7dqWSIv4b6ZdCK/FMWxgRgIX6xz+hWR1jtOxF5qKsIPXVTIbORYTExog uZanNELqwKd/5amG274ZWaWPsh/h4YGXs8l976YRcFXqAhkP3hASHZlwdPZkXYzaU2o0 8CfR4S45VZil1ZaXSdOMbt9IaRQecvEoMr2sDvm9ZRjukk+6DZZVKSUJzTdFUcK4knMk ZQSbHOojv5jxpnRZsTAeZd/jln87hxU5bIhl3SNIgHqPKi+Vi8AbZOFuVrpRVXI/rv1I GLxn8w6YwRV3rlIS+k7FNHM9qHESLi3rmUEv00h4cOSBKLTEXCurrwiHDeD4NBqfPcNY zRmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=ZES5cXGWWFM2D4O4HO8nojVROmI27MgGDnfPDxrfr9o=; b=xdqOPY/SSLiUiJpxh8GU1E0oIVxqz/h5dF+6OVQXlGEokZTP39cUZuWZiiJhhbhm5z ZZzZGq1YDHcpvqzTMZYC92hSGD8upUmiwrurWLfkOY8xv4Ym+jIvGYR/8bwHrsoruuSe zPV0ktzE76llus44aGveD1hnz9WVyb+JDA0NTDY16pkBAvHbgNzPpyBFitfrDuOmBOQA 3xPv4YEypxiBEXcf26Zfs7O4/HhPCMxfTXNWDswOUYQ3je58TGRXJr/tN+YXu5FN5igE fRXACxZxvswT8V2qL2apH7466g90p7n4ZEVveOscrYTAmJODMW2kfqcM7DuLdSRdmcnI WdMQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=dxnwdMhC; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Original-Received: from mout01.posteo.de (mout01.posteo.de. [185.67.36.65]) by gmr-mx.google.com with ESMTPS id ay3-20020a5d6f03000000b0022e04ae3a44si734607wrb.6.2022.10.19.14.28.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Oct 2022 14:28:43 -0700 (PDT) Received-SPF: pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) client-ip=185.67.36.65; Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id B1136240026 for ; Wed, 19 Oct 2022 23:28:42 +0200 (CEST) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Mt3kf1130z9rxM for ; Wed, 19 Oct 2022 23:28:42 +0200 (CEST) Content-Disposition: inline In-Reply-To: X-Original-Sender: bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=dxnwdMhC; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31602 Archived-At: --PBU288CvtxbvIz8n Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I think that the attached script could be a good starting point. Le Wednesday 19 October 2022 =C3=A0 04:50:25PM, Bernardo C.D.A. Vasconcelos= a =C3=A9crit : > I have found this little script that takes me nearly there: >=20 > local vars =3D {} >=20 > function Meta(meta) > for k, v in pairs(meta) do > vars["%" .. k .. "%"] =3D v > end > end >=20 > function Str(elem) > if vars[elem.text] then > return vars[elem.text] > else > return elem > end > end >=20 > return { > { Meta =3D Meta }, > { Str =3D Str } > } >=20 >=20 > Instead, we would use: meta.glossary.entries. The crux for me is looping > through the list of entries, adding all the values of the to_match field > (a.k.a. known forms) (of each entry) to vars as a key with the content of= some > other field (e.g. glslink) as value. E.g. vars[ .. entry.to_match.each ..= ] =3D > entry.glslink. >=20 > On 18 Oct 2022, at 19:06, Bastien DUMONT wrote: >=20 > Yes, it could! You would have access to the corresponding metadata ob= ject > in the AST. >=20 > Le Tuesday 18 October 2022 =C3=A0 06:43:48PM, Bernardo C.D.A. Vasconc= elos a > =C3=A9crit : >=20 > The data is mostly in database format and could be output in the = best > format > for the task, but I wanted to make it friendly for other people t= o use > as well. > Could a YAML metadata block be a solution? >=20 > glossary: > glossary_lang: grc > entries: > - headword: =E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 > text: "=E2=96=A1 *pt.* bom; =E2=96=A1 *en.* good; and so on and s= o forth" > match: > - =CE=B3=CE=B1=CE=B8=CE=AD > - =CE=B3=CE=B1=CE=B8=CE=BF=CE=AF > - =CE=BA=E1=BC=80=CE=B3=CE=AC=CE=B8 > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC=CF=82 > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE=CE=BD > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=B1=CE=AF > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CE=AF > - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CF=82 > - headword: =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD > transliteration: agapan > text: "=E2=96=A1 *pt.* estar satisfeito, gostar; =E2=96=A1 *en.* = be satisfied, like;" > match: > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1 > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1=CE=B9=CF=82 > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7 > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CE=BD > - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CF=82 > - =E1=BC=80=CE=B3=CE=AC=CF=80=E1=BF=83 > - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6 > - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD > - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CF=82 >=20 > On 18 Oct 2022, at 14:34, Bastien DUMONT wrote: >=20 > No, citeproc receives a data structure produced by pandoc. Pandoc= is > responsible for the parsing. I think that your script would not b= e so > hard > to rewrite in Lua, the main problem is to know if you can achieve= your > goals this way. If your main concern is portability, then writing= a Lua > filter with no dependancies certainly is a good solution provided= that > you > feed it with a Lua data structure (or embed the code responsible = for > JSON > parsing in your script). >=20 > Le Tuesday 18 October 2022 =C3=A0 02:16:16PM, Bernardo C.D.A. Vas= concelos a > =C3=A9crit : >=20 > Thank you for the suggestions, Bastien. There is technically no n= eed > for > regex, as all the forms are spelled out to avoid the need to crea= te ad > hoc > regex rules for each term. Now that I think about it, the princip= le is > the > same as Citeproc's: a tagged inline element will be matched again= st a > lookup > table and replaced. I will look at the citeproc code to see if it= leads > anywhere or if it could be reused in anyway. >=20 > On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: >=20 > Yes, but it is limited to this utf8 library. For instance, if > perform a > regexp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE=B8=CF= =8C=CF=82', '[=CE=B3=CE=B4]')`, it try to > match one > of the four bytes inside the square brackets against the string > '=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it will return the = first byte of =CE=B3, not =CE=B3. To > circumvent > this limitation, you would be forced to test =CE=B3 and =CE=B4 se= parately. > Nevertheless, if you always perform comparisons between whole > strings as > you currently do in your script, this should not be a problem. >=20 > As for your concern with dependancies, you most probably would ha= ve > to > rely on a JSON library such as lunajson. However, if your JSON > files are > not supposed to change, you could also convert them to a Lua file > using > a JSON library and a serialization library, so as to be able to > import > the resulting Lua data structure directly in your filter. >=20 > Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A. > Vasconcelos a > =C3=A9crit : >=20 > As for translating the filter note that Lua can't really > handle > UTF-8. > There is some rudimentary support for converting codepoint > number =E2=86=94 > UTF-8 > byte sequences and for iterating through a string of bytes > representing > UTF-8 encoded characters but no concept of chars as opposed > to > bytes. > This > may become a show stopper if you need to manipulate strings > containing > UTF-8 text. >=20 > Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards > includes > UTF-8 support. Have you seen it? E.g. [1]https:// > q-syshelp.qsc.com/Content/Control_Scripting/ > Lua_5.3_Reference_Manual/Standard_Libraries/ > 4_-_Basic_UTF-8_Support.htm >=20 > For Ancient Greek you want grc as the language tag. >=20 > Indeed it is (and that is generally what I use), but =E1=BC=80=CE= =B3=CE=B1=CE=B8=CF=8C=CF=82 is > just > Polytonic Greek, which is not the same as Ancient Greek. >=20 > -- > You received this message because you are subscribed to the > Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from > it, > send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [2]https:// > groups.google.com/d/msgid/pandoc-discuss/ > 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com. >=20 > -- > You received this message because you are subscribed to the Googl= e > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, > send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [3]https:// > groups.google.com/d/msgid/pandoc-discuss/ > Y07VnbuRsuqUg8US%40localhost. >=20 > -- > You received this message because you are subscribed to the Googl= e > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it,= send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [4][1]https:// > groups.google.com/d > /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmai= l.com. >=20 > -- > You received this message because you are subscribed to the Googl= e > Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it,= send > an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [5][2]https:// > groups.google.com/d/ > msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost. >=20 > -- > You received this message because you are subscribed to the Googl= e > Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it,= send > an email > to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [7][3]https:// > groups.google.com/d/msgid/ > pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com. >=20 > References: >=20 > [1] [4]https://q-syshelp.qsc.com/Content/Control_Scripting/ > Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Suppo= rt.htm > [2] [5]https://groups.google.com/d/msgid/pandoc-discuss/ > 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com > [3] [6]https://groups.google.com/d/msgid/pandoc-discuss/ > Y07VnbuRsuqUg8US%40localhost > [4] [7]https://groups.google.com/d/msgid/pandoc-discuss/ > 7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com > [5] [8]https://groups.google.com/d/msgid/pandoc-discuss/ > Y07ji07FFokQdOR%2B%40localhost > [6] [9]mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [7] [10]https://groups.google.com/d/msgid/pandoc-discuss/ > D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=3Dema= il& > utm_source=3Dfooter >=20 > -- > You received this message because you are subscribed to the Google Gr= oups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, sen= d an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [11]https://groups.google.co= m/d/ > msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost. >=20 > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email > to [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [13]https://groups.google.com/d/= msgid/ > pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com. >=20 > References: >=20 > [1] https://groups.google.com/d > [2] https://groups.google.com/d/ > [3] https://groups.google.com/d/msgid/ > [4] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference= _Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm > [5] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-B= FEC-F17FAF27BEA5%40gmail.com > [6] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40l= ocalhost > [7] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A= 575-93426852FCFB%40gmail.com > [8] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%4= 0localhost > [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [10] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-= BA96-2E37BA4FB779%40gmail.com?utm_medium=3Demail&utm_source=3Dfooter > [11] https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40= localhost > [12] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [13] https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-= 929D-592B578B184F%40gmail.com?utm_medium=3Demail&utm_source=3Dfooter --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/Y1BsCdqttFxOi/pa%40localhost. --PBU288CvtxbvIz8n Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="tag-greek-words.lua" -- I suppose that you always use plain strings in the -- "headword", "transliteration" and "match" fields, -- so I stringify the corresponding Inlines to be able -- to more easily insert the values in the LaTeX string. local stringify = pandoc.utils.stringify local open_glslink_scd_arg = pandoc.RawInline('latex', '{') local close_glslink_scd_arg = pandoc.RawInline('latex', '}') -- I use two tables: one to store the data relative to the headwords, -- the other to map the forms to the corresponding entries -- in headwords_data. -- Since the entries in headwords_data are tables -- and tables are always passed by reference in Lua, -- this approach avoids a lot of redundant writings in memory. local headwords_data = {} local forms_to_headwords = {} local function get_glossary_data(meta) for _, entry in ipairs(meta.glossary.entries) do local headword = stringify(entry.headword) headwords_data[headword] = { headword = headword, text = entry.text, transliteration = stringify(entry.transliteration) } for _, form in ipairs(entry.match) do forms_to_headwords[stringify(form)] = headwords_data[headword] end end end local function tag_words(span) if span.attributes.lang == 'el' then local content = stringify(span.content) local word_data = forms_to_headwords[content] if word_data then local linguistic_tags = -- If the "transliteration" field is missing, Lua will throw an error. -- I suppose that this should not happen, but if it can be so, -- uncomment the following line (supposing that the lonely @ -- will not cause problems): -- word_data.transliteration = word_data.transliteration or '' pandoc.RawInline('latex', '\\index{' .. word_data.transliteration .. '@' .. word_data.headword .. '}' .. '\\glslink{' .. word_data.transliteration .. '}') return { linguistic_tags, open_glslink_scd_arg, span, close_glslink_scd_arg } end end end return { { Meta = get_glossary_data }, { Span = tag_words } } --PBU288CvtxbvIz8n Content-Type: text/markdown; charset=utf-8 Content-Disposition: attachment; filename="test.md" Content-Transfer-Encoding: 8bit --- glossary: glossary_lang: grc entries: - headword: ἀγαθός transliteration: agathos text: "□ *pt.* bom; □ *en.* good; and so on and so forth" match: - γαθέ - γαθοί - κἀγάθ - κἀγαθά - κἀγαθάς - κἀγαθή - κἀγαθήν - κἀγαθαί - κἀγαθοί - κἀγαθος - headword: ἀγαπᾶν transliteration: agapan text: "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied, like;" match: - ἀγάπα - ἀγάπαις - ἀγάπη - ἀγάπην - ἀγάπης - ἀγάπῃ - ἀγαπᾶ - ἀγαπᾶν - ἀγαπᾶς --- The words [κἀγαθά]{lang=el} and [ἀγαπᾶς]{lang=el}. --PBU288CvtxbvIz8n--