From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31599 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "Bernardo C.D.A. Vasconcelos" Newsgroups: gmane.text.pandoc Subject: Re: Glossary Filter for MD2Tex Date: Wed, 19 Oct 2022 16:50:25 -0300 Message-ID: References: <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com> <3307993F-F813-405F-BFEC-F17FAF27BEA5@gmail.com> <7072522D-F2FE-4BAC-A575-93426852FCFB@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=_MailMate_2E9185A2-411C-4F76-B87D-127CF30C4D40_=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10540"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDUKFWODQ4ARBBVKYGNAMGQEA5XC2LQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Oct 19 21:50:34 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oa1-f61.google.com ([209.85.160.61]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1olF5G-0002aa-Fu for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 19 Oct 2022 21:50:34 +0200 Original-Received: by mail-oa1-f61.google.com with SMTP id 586e51a60fabf-13234741239sf8384308fac.7 for ; Wed, 19 Oct 2022 12:50:34 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666209033; cv=pass; d=google.com; s=arc-20160816; b=lOi8BcS+rQY81/kfCZnpkKAsocHCX7p6qFb5lOISphepUoXn1MMsKZeWo2AMHeUdkf EqJXYKGGS2RtOIw7KZj/T8hQ1ULW6pwGaExPwmChddMF60larBtrl1LSegWIkAQTduWX h7uGLvbjZyPXjKzkTtzS6WyjAoONbyKLrpYnD52haB4J6HSDSMDseVd62yycNbSITnfp hv3iEAtGAdcUoMy4bSQiKFJEiEQl+MxjvKaEgkIl9d0Uw8Q2rlM7af4T/3uvC8Noo3rt 8VssKduElykRewxBCT2sII4kX1vVQ701rfxQpChnOO/ukI+sW/vwOTh9Yor3zvxYKQWa nUKg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:references :in-reply-to:message-id:date:subject:to:from:sender:dkim-signature :dkim-signature; bh=67jAA7LFXMnZ/4AWOkejKDj03WQoO8jiVYFASyv7Vq8=; b=aETzXjXTiCABJCuhaFhND28L7RnYTM0fpGoHzkJcM1FZgfnkXurQgPloAaHkr2OaZ7 ckg4cS6qEqp/NuS6IqRaIR7rhmUJXu7rVJ0zKeBLEAZnnljnA3eyVWgAmMRSCovxisFU ZAxLVSEUNAE8+9/YeKQOOappuXhgWKtjSCQcDAMzbA3kgw81oV8pA5WefvwYjNNKaBg0 +bqIDFdgssjylwWQyO3oejOx4zZJ/QO7jW6YIsq3SfZkjW8haCa3axw2msuPJl4FKnKu e2Uz6fUDpkjAKlNvbWnKFUinpS3Q/c1Pe+GJcSNm6AI7Wmj12VPwNlKvTcqBRED87lds gQUQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=MdEKInun; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::32c as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :references:in-reply-to:message-id:date:subject:to:from:sender:from :to:cc:subject:date:message-id:reply-to; bh=67jAA7LFXMnZ/4AWOkejKDj03WQoO8jiVYFASyv7Vq8=; b=SLdBF+x/fOysfzsvteekOSjjrZ3w7IeOUVBIN787t3EzXm0mEHbZYidCPEjEz4vFXB eikKzDH4fWoml+nB9lLEnYh/Jn6/X37T2vRSNrfiQVBfZ5O6c5DQVW7IagW9SGiBc8WW htBVgCY0+FdChP+r5vICJikP3AS7H7LJGPzMzmTNRyJbwNzDFaaG8iHBLdpvoLEEcFBu RZ3nq5IV0lrnASi7Grx7Z/FtlsxpcmVXH5NAE3kbXM+zCoI9FRQ9mGLX3zY09z5ryzne KMQivFtpki7axGQtYe10VVgm+QjyzbNp5T4knWUm6CZDeb2DS63k4vOqVumJU0oJXnDw UMTw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :references:in-reply-to:message-id:date:subject:to:from:from:to:cc :subject:date:message-id:reply-to; bh=67jAA7LFXMnZ/4AWOkejKDj03WQoO8jiVYFASyv7Vq8=; b=k1oYPR189iKbmzcd/TVLhMG3VrV0Z+Pb4n5C1/5GDAPUIA3pvz2mOEthHq37pAmLa/ dFBmjbTAt/OV32Zdlnr9PJDoTnDfuILf0vBXUtYle+Rh8flHh6gnme2pm4Rc9/zK1Yr2 U/fNt8wQZdNdUBpLRzmVKxD8KuPpsFafizARuVvaqdkIB9k58AAyUnVYRw97btzLhrYm thfO/ZTUTKEKpVubpw3oCHBmMqnSv+DDEoeiASTmLLets1VpJDvVRXGwer2knffoQWHp yiA5Nz7k3PQKUgmcrDuI3p7JadoNO1LtIciuwFpDTr9QrpjGo/O0N0A8KuMaa85wkQhZ IXAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :references:in-reply-to:message-id:date:subject:to:from :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=67jAA7LFXMnZ/4AWOkejKDj03WQoO8jiVYFASyv7Vq8=; b=BnlobzxzhYrliQujmmupZJzyfEXl3fNU053bJ3VuYYN4MLpX7D9B9GVfxy30L/Cv9x NLmIp/V7YhL0F29zLjDkqj+UadovZtJBvLlgQwHd8lUsy7f0FkIX7kjwlSa+6JPCmasX OOnoN6J5+iLkxhj3JLnxR9pkk3U42SotSCJbjvzDJS1T3bqRp/Jl7EIWMAyR1fP98hmi gWx/viN3oluZjBVRY+bCoD2HE9tTzQ3SrgWKTX7jdu0lXg47qMoC99d1qvSiiXS9rp5O Ba+LS6p21Cg4apGffFROMFJaB4GwNdYYJxkHbbuIP3hy4u Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf0D3x+LszNr7+im9Wxlkwp1LgNCD/RQIzsqfoJXjPoAFHB0HDGK IojG6jZc1WpRYryIh+7YnlU= X-Google-Smtp-Source: AMsMyM7J+y7OK3snS1hJuB2lHvJmlkUDQSTi3yfElrK6w467oZ2O87zj8gjQxufGiSzC3y4HUwMbrg== X-Received: by 2002:a9d:4e6:0:b0:65b:f9ff:327b with SMTP id 93-20020a9d04e6000000b0065bf9ff327bmr4920819otm.163.1666209033397; Wed, 19 Oct 2022 12:50:33 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:5c6:0:b0:351:10b5:2e7b with SMTP id 189-20020aca05c6000000b0035110b52e7bls735321oif.11.-pod-prod-gmail; Wed, 19 Oct 2022 12:50:29 -0700 (PDT) X-Received: by 2002:aca:bbd4:0:b0:353:f167:6fd3 with SMTP id l203-20020acabbd4000000b00353f1676fd3mr5493195oif.287.1666209029841; Wed, 19 Oct 2022 12:50:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666209029; cv=none; d=google.com; s=arc-20160816; b=u/oTIibxZiS/hRo3QAQbH1Y+1Qkt4ltkM9YKR8MbAuyxUAZL3u7it2+kylGANLJaIb nLx2HTlDYZGVblA/1n/ghkzRADsGiJADwXPYs8SgrlS8ejBy2GamFhIs3/iFHyXIwF6R IwEjGeAmhnKCMITYoECPzeNLfgXGUDnaOr1awsoTM+d3SD5WhOeIYMeJPW1f/Yvk8GiA JCYlCXGiPoIZ1zzqViypDp9JmjfowHEs29q99cZ/SjzaPttltxe9Cn3pf51LTVoy6QjQ s0VSH82KzmWT3YWi227BdWgwcmwgG4QJKaVqzNmolHrvvU9k806c7mmIYB2TBvy7TlBd KMLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=nMg0uCu4KzHGFvgomQTl7Idt30lPKcW7Ay4vB1J2Oo4=; b=gCm56T8dBZ/5CBFqkx3XZlP4hUGdbAgrWaZx8MSpabXemfJf3Upo3PHWr8Y1MHPcL1 3cSRpXysX3vI0Ka1NaAx/E4RBiApZ2qqanINsTnFLGpJewhzDPHoLAFXDVYJX9h5zaQ+ 2xEJY1VFJSGlY4vYGdlJoe7HTbLKRn5B7Z/Gqjk7P5pn7iYz/zt8DZSqJwmbMfKZmnFw GMVwpy+W//TzMcvcv/4C2k0oxOn8Gw8vrwAw3iWglb03ah3UFpJcTJ5G1uOeUHimdTvK TR4ItBKZ5Nv7fydy/w/IL88Xy3hPRJ+29rAufCIQnAzZaPMCQOfvsnfcJKUDiYvyJaD5 Jnlg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=MdEKInun; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::32c as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com. [2607:f8b0:4864:20::32c]) by gmr-mx.google.com with ESMTPS id p8-20020a056870a54800b001371e49ab90si800927oal.3.2022.10.19.12.50.29 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 19 Oct 2022 12:50:29 -0700 (PDT) Received-SPF: pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::32c as permitted sender) client-ip=2607:f8b0:4864:20::32c; Original-Received: by mail-ot1-x32c.google.com with SMTP id p24-20020a9d6958000000b00661c528849eso10129566oto.9 for ; Wed, 19 Oct 2022 12:50:29 -0700 (PDT) X-Received: by 2002:a05:6830:22cf:b0:661:a3ea:c010 with SMTP id q15-20020a05683022cf00b00661a3eac010mr4857484otc.156.1666209028841; Wed, 19 Oct 2022 12:50:28 -0700 (PDT) Original-Received: from [10.0.3.20] ([179.134.114.126]) by smtp.gmail.com with ESMTPSA id x27-20020a056870a79b00b001372c1902afsm7912386oao.52.2022.10.19.12.50.27 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Oct 2022 12:50:28 -0700 (PDT) X-Mailer: MailMate (1.13.2r5673) In-Reply-To: X-Original-Sender: bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=MdEKInun; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::32c as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31599 Archived-At: --=_MailMate_2E9185A2-411C-4F76-B87D-127CF30C4D40_= Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable I have found this little script that takes me nearly there: ``` local vars =3D {} function Meta(meta) for k, v in pairs(meta) do vars["%" .. k .. "%"] =3D v end end function Str(elem) if vars[elem.text] then return vars[elem.text] else return elem end end return { { Meta =3D Meta }, { Str =3D Str } } ``` Instead, we would use: `meta.glossary.entries`. The crux for me is=20 looping through the list of entries, adding all the values of the=20 `to_match` field (a.k.a. known forms) (of each entry) to `vars` as a key=20 with the content of some other field (e.g. `glslink`) as value. E.g.=20 `vars[ .. entry.to_match.each .. ] =3D entry.glslink`. On 18 Oct 2022, at 19:06, Bastien DUMONT wrote: > Yes, it could! You would have access to the corresponding metadata=20 > object in the AST. > > Le Tuesday 18 October 2022 =C3=A0 06:43:48PM, Bernardo C.D.A. Vasconcelos= =20 > a =C3=A9crit : >> The data is mostly in database format and could be output in the best=20 >> format >> for the task, but I wanted to make it friendly for other people to=20 >> use as well. >> Could a YAML metadata block be a solution? >> >> glossary: >> glossary_lang: grc >> entries: >> - headword: =E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 >> text: "=E2=96=A1 *pt.* bom; =E2=96=A1 *en.* good; and so on and so = forth" >> match: >> - =CE=B3=CE=B1=CE=B8=CE=AD >> - =CE=B3=CE=B1=CE=B8=CE=BF=CE=AF >> - =CE=BA=E1=BC=80=CE=B3=CE=AC=CE=B8 >> - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC >> - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC=CF=82 >> - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE >> - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE=CE=BD >> - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=B1=CE=AF >> - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CE=AF >> - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CF=82 >> - headword: =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD >> transliteration: agapan >> text: "=E2=96=A1 *pt.* estar satisfeito, gostar; =E2=96=A1 *en.* be= =20 >> satisfied, like;" >> match: >> - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1 >> - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1=CE=B9=CF=82 >> - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7 >> - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CE=BD >> - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CF=82 >> - =E1=BC=80=CE=B3=CE=AC=CF=80=E1=BF=83 >> - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6 >> - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD >> - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CF=82 >> >> On 18 Oct 2022, at 14:34, Bastien DUMONT wrote: >> >> No, citeproc receives a data structure produced by pandoc. Pandoc=20 >> is >> responsible for the parsing. I think that your script would not=20 >> be so hard >> to rewrite in Lua, the main problem is to know if you can achieve=20 >> your >> goals this way. If your main concern is portability, then writing=20 >> a Lua >> filter with no dependancies certainly is a good solution provided=20 >> that you >> feed it with a Lua data structure (or embed the code responsible=20 >> for JSON >> parsing in your script). >> >> Le Tuesday 18 October 2022 =C3=A0 02:16:16PM, Bernardo C.D.A.=20 >> Vasconcelos a >> =C3=A9crit : >> >> Thank you for the suggestions, Bastien. There is technically=20 >> no need >> for >> regex, as all the forms are spelled out to avoid the need to=20 >> create ad >> hoc >> regex rules for each term. Now that I think about it, the=20 >> principle is >> the >> same as Citeproc's: a tagged inline element will be matched=20 >> against a >> lookup >> table and replaced. I will look at the citeproc code to see=20 >> if it leads >> anywhere or if it could be reused in anyway. >> >> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: >> >> Yes, but it is limited to this utf8 library. For=20 >> instance, if >> perform a >> regexp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE= =B8=CF=8C=CF=82',=20 >> '[=CE=B3=CE=B4]')`, it try to >> match one >> of the four bytes inside the square brackets against the=20 >> string >> '=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it will return= the first byte of =CE=B3,=20 >> not =CE=B3. To >> circumvent >> this limitation, you would be forced to test =CE=B3 and =CE= =B4=20 >> separately. >> Nevertheless, if you always perform comparisons between=20 >> whole >> strings as >> you currently do in your script, this should not be a=20 >> problem. >> >> As for your concern with dependancies, you most probably=20 >> would have >> to >> rely on a JSON library such as lunajson. However, if your=20 >> JSON >> files are >> not supposed to change, you could also convert them to a=20 >> Lua file >> using >> a JSON library and a serialization library, so as to be=20 >> able to >> import >> the resulting Lua data structure directly in your filter. >> >> Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A= . >> Vasconcelos a >> =C3=A9crit : >> >> As for translating the filter note that Lua can't=20 >> really >> handle >> UTF-8. >> There is some rudimentary support for converting=20 >> codepoint >> number =E2=86=94 >> UTF-8 >> byte sequences and for iterating through a string=20 >> of bytes >> representing >> UTF-8 encoded characters but no concept of chars=20 >> as opposed >> to >> bytes. >> This >> may become a show stopper if you need to=20 >> manipulate strings >> containing >> UTF-8 text. >> >> Thanks, @BPJ, for the explanation. Apparently, Lua=20 >> 5.3 onwards >> includes >> UTF-8 support. Have you seen it? E.g. [1]https:// >> q-syshelp.qsc.com/Content/Control_Scripting/ >> Lua_5.3_Reference_Manual/Standard_Libraries/ >> 4_-_Basic_UTF-8_Support.htm >> >> For Ancient Greek you want grc as the language=20 >> tag. >> >> Indeed it is (and that is generally what I use), but=20 >> =E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 is >> just >> Polytonic Greek, which is not the same as Ancient=20 >> Greek. >> >> -- >> You received this message because you are subscribed=20 >> to the >> Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving=20 >> emails from >> it, >> send an email to=20 >> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit [2]https:// >> groups.google.com/d/msgid/pandoc-discuss/ >> 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com. >> >> -- >> You received this message because you are subscribed to=20 >> the Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails=20 >> from it, >> send >> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit [3]https:// >> groups.google.com/d/msgid/pandoc-discuss/ >> Y07VnbuRsuqUg8US%40localhost. >> >> -- >> You received this message because you are subscribed to the=20 >> Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from=20 >> it, send >> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit=20 >> [4]https://groups.google.com/d >> /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gma= il.com. >> >> -- >> You received this message because you are subscribed to the=20 >> Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it,=20 >> send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit=20 >> [5]https://groups.google.com/d/ >> msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost. >> >> -- >> You received this message because you are subscribed to the Google=20 >> Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it,=20 >> send an email >> to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit=20 >> [7]https://groups.google.com/d/msgid/ >> pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com. >> >> References: >> >> [1]=20 >> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Ma= nual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm >> [2]=20 >> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC= -F17FAF27BEA5%40gmail.com >> [3]=20 >> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40loca= lhost >> [4]=20 >> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575= -93426852FCFB%40gmail.com >> [5]=20 >> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40lo= calhost >> [6] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org >> [7]=20 >> https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96= -2E37BA4FB779%40gmail.com?utm_medium=3Demail&utm_source=3Dfooter > > --=20 > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send=20 > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40local= host. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com. --=_MailMate_2E9185A2-411C-4F76-B87D-127CF30C4D40_= Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I have found this little script that takes me nearly there:=

local vars =3D {}

function Meta(meta)
    for k, v in pairs(meta) do
        vars["%" .. k .. "%"] =3D v
    end
end

function Str(elem)
    if vars[elem.text] then
        return vars[elem.text]
    else
        return elem
    end
end

return {
    { Meta =3D Meta },
    { Str  =3D Str  }
}

Instead, we would use: meta.glossary.entries. = The crux for me is looping through the list of entries, adding all the valu= es of the to_match field (a.k.a. known forms) (of each entry) = to vars as a key with the content of some other field (e.g. glslink) as value. E.g. vars[ .. entry.to_match.each .. ] = =3D entry.glslink.

On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:

Yes, it could! You wou= ld have access to the corresponding metadata object in the AST.

Le Tuesday 18 October 2022 =C3=A0 06:43:48PM, Bernardo C.D.A. Vasconcelos a= =C3=A9crit :

The data is mostly in database format and could= be output in the best format
for the task, but I wanted to make it friendly for other people to use as w= ell.
Could a YAML metadata block be a solution?

glossary:
glossary_lang: grc
entries:
- headword: =E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82
text: "=E2=96=A1 *pt.* bom; =E2=96=A1 *en.* good; and so on and s= o forth"
match:
- =CE=B3=CE=B1=CE=B8=CE=AD
- =CE=B3=CE=B1=CE=B8=CE=BF=CE=AF
- =CE=BA=E1=BC=80=CE=B3=CE=AC=CE=B8
- =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC
- =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC=CF=82
- =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE
- =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE=CE=BD
- =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=B1=CE=AF
- =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CE=AF
- =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CF=82
- headword: =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD
transliteration: agapan
text: "=E2=96=A1 *pt.* estar satisfeito, gostar; =E2=96=A1 *en.* = be satisfied, like;"
match:
- =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1
- =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1=CE=B9=CF=82
- =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7
- =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CE=BD
- =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CF=82
- =E1=BC=80=CE=B3=CE=AC=CF=80=E1=BF=83
- =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6
- =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD
- =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CF=82

On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:

No, citeproc receives a data structure produced by pandoc. Pandoc is responsible for the parsing. I think that your script would not be so h= ard
to rewrite in Lua, the main problem is to know if you can achieve your<= br> goals this way. If your main concern is portability, then writing a Lua=
filter with no dependancies certainly is a good solution provided that = you
feed it with a Lua data structure (or embed the code responsible for JS= ON
parsing in your script).

Le Tuesday 18 October 2022 =C3=A0 02:16:16PM, Bernardo C.D.A. Vasconcel= os a
=C3=A9crit :

Thank you for the suggestions, Bastien. There is technically no nee= d
for
regex, as all the forms are spelled out to avoid the need to create= ad
hoc
regex rules for each term. Now that I think about it, the principle= is
the
same as Citeproc's: a tagged inline element will be matched aga= inst a
lookup
table and replaced. I will look at the citeproc code to see if it l= eads
anywhere or if it could be reused in anyway.

On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:

Yes, but it is limited to this utf8 library. For instance, if perform a
regexp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE= =B8=CF=8C=CF=82', '[=CE=B3=CE=B4]')`, it try to
match one
of the four bytes inside the square brackets against the string=
'=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it will r= eturn the first byte of =CE=B3, not =CE=B3. To
circumvent
this limitation, you would be forced to test =CE=B3 and =CE=B4 = separately.
Nevertheless, if you always perform comparisons between whole strings as
you currently do in your script, this should not be a problem.<= br>
As for your concern with dependancies, you most probably would = have
to
rely on a JSON library such as lunajson. However, if your JSON<= br> files are
not supposed to change, you could also convert them to a Lua fi= le
using
a JSON library and a serialization library, so as to be able to=
import
the resulting Lua data structure directly in your filter.

Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A. Vasconcelos a
=C3=A9crit :

As for translating the filter note that Lua can't r= eally
handle
UTF-8.
There is some rudimentary support for converting codepo= int
number =E2=86=94
UTF-8
byte sequences and for iterating through a string of by= tes
representing
UTF-8 encoded characters but no concept of chars as opp= osed
to
bytes.
This
may become a show stopper if you need to manipulate str= ings
containing
UTF-8 text.

Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwa= rds
includes
UTF-8 support. Have you seen it? E.g. [1]https://
q-syshelp.qsc.com/Content/Control_Scripting/
Lua_5.3_Reference_Manual/Standard_Libraries/
4_-_Basic_UTF-8_Support.htm

For Ancient Greek you want grc as the language tag.

Indeed it is (and that is generally what I use), but =E1=BC= =80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 is
just
Polytonic Greek, which is not the same as Ancient Greek.
--
You received this message because you are subscribed to the=
Google
Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails fr= om
it,
send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.org= m.
To view this discussion on the web visit [2]https://
groups.google.com/d/msgid/pandoc-discuss/
3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.

--
You received this message because you are subscribed to the Goo= gle
Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from i= t,
send
an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [3]https://
groups.google.com/d/msgid/pandoc-discuss/
Y07VnbuRsuqUg8US%40localhost.

--
You received this message because you are subscribed to the Google<= br> Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, s= end
an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [4]https://groups.google.com/d
/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.= com.

--
You received this message because you are subscribed to the Google Grou= ps
"pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send = an
email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [5]https://groups.google.com/d/
msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail
to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [7]https://groups.google.com/d/msgid/
pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.

References:

[1] https://q-= syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard= _Libraries/4_-_Basic_UTF-8_Support.htm
[2] https://groups.google.com/d/msgid/pa= ndoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
[3] https://groups.google.com/d/msgid/pandoc-discuss/Y07Vnbu= RsuqUg8US%40localhost
[4] https://groups.google.com/d/msgid/pa= ndoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
[5] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji= 07FFokQdOR%2B%40localhost
[6] mailto:p= andoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
[7] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-= BA96-2E37BA4FB779%40gmail.com?utm_medium=3Demail&utm_source=3Dfooter

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit
https://groups.goog= le.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/p= andoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.
--=_MailMate_2E9185A2-411C-4F76-B87D-127CF30C4D40_=--