From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31592 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "Bernardo C.D.A. Vasconcelos" Newsgroups: gmane.text.pandoc Subject: Re: Glossary Filter for MD2Tex Date: Tue, 18 Oct 2022 18:43:48 -0300 Message-ID: References: <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com> <3307993F-F813-405F-BFEC-F17FAF27BEA5@gmail.com> <7072522D-F2FE-4BAC-A575-93426852FCFB@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=_MailMate_1C1BA517-9A56-443C-BEBE-4B9BD11C8A4D_=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1792"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDUKFWODQ4ARBGV4XSNAMGQE3VVQFQY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Oct 18 23:43:58 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f56.google.com ([209.85.210.56]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1okuNR-0000Ce-NU for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 18 Oct 2022 23:43:57 +0200 Original-Received: by mail-ot1-f56.google.com with SMTP id 104-20020a9d0371000000b00661c7c3f0besf6920549otv.13 for ; Tue, 18 Oct 2022 14:43:57 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666129436; cv=pass; d=google.com; s=arc-20160816; b=ViN/l0AddGj7ZpOgiYeSHp9ZE3WjXaIVqg+YSmiLlzLQDab/z6E47R/xuU+xSjccat +8dYMZ6tKW1dNQyrjtxuHpYhBwVknAHApBDj5P3POSNwYa++OjRiwgesuZT67z8NwCFZ O5Y2Bdw9vwpWcxaUagAZH4prykI5+Yy4fU0aYhWuZ4uyjTrorQT0tycNK0IdLJPjy5j/ 8LDyKoEdNIHnpYr+xsHdk++ZwR7vyapoK/B90RHlCd2WcEfbzQh0GNqicZ3BMYm2iO3Z IoUciQrmklVQ+TzTnDilTZlF+Qjz7ElNSnvJFfA96yY3KW2LnDRuC+79i6hV0IQbdbxY JbZw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:references :in-reply-to:message-id:date:subject:to:from:sender:dkim-signature :dkim-signature; bh=GoqVAM0kJLDnPaeMeokRVAotuaziUWPjpCBWb8YijiY=; b=KbTNij9S6FZ/KPXcExF1M7nIE2M1YztObScRY43yi9inKnMKjWTO2azD7EboOdDiz1 97XcPTsidksgXT2VaLFn9Oaj1fquaCFNLmhnfhmD2b18S1J49b9o1wp65bBy/VMJuvEk H32nkzu+ZjkaEBglvBUHMurXcnXi+ysr1KAB/NNLoIPoEdK6xsgjiNuDUbPzmHpUlwUF nfW6QhFZhGFSOywZc6boYpNQXN+zfVRxx1qEOh169nXdpnrcoyvQK6tX/8p6eUF5UR+C vMea0DC6YYcdR0eqHtNPWh0x2Q3IZbG+j2r3fqQAZ00NWqnL71l2Y8UqemDHD/Z/Eqmy tVzw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Js89QVCe; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2001:4860:4864:20::33 as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :references:in-reply-to:message-id:date:subject:to:from:sender:from :to:cc:subject:date:message-id:reply-to; bh=GoqVAM0kJLDnPaeMeokRVAotuaziUWPjpCBWb8YijiY=; b=T2nJXuS1KdS480hCIp+oO/BrQCca+xZUmE8FuzlcCGcimzeh60JavTLr3/VHuwprq4 Sx6g1qGxssbDaVOEnN5Q9HgjjOR6Em+TdPYZJjx2SChpquPQKiPP+vhjgULSl+qYTC49 DBtgGGkL4pWWjj4B3sRGGjRuv1GaQGDFUsKeQczXMimujFzc0hY7XThx75h06kDB4wI8 TqxZ84SBQccco36FMMHno/9r9MIoVBhgMF3By+gMASoO5a+8OlzGrwJ+HvSKL3wwa9Dm 6Qj5dfltNayE+VNXbknBEAYaFpFDPvFHIP89ervhlKP/FAZEcENYbojjB6MgwuEBfFPK FN/Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :references:in-reply-to:message-id:date:subject:to:from:from:to:cc :subject:date:message-id:reply-to; bh=GoqVAM0kJLDnPaeMeokRVAotuaziUWPjpCBWb8YijiY=; b=nwi++kyL7FTlTjwRR9aiunPn75nEA51bxUiLTlgsXWACNgPRlgvdssm5Qq7ZVR9pgT P4IaWnKzuMpJRbhD4EnMzbzBN5mTaVUKdHTcRB2hCdXw9b4piKRX5itVzIV34BIUQvUF pWgs6ISQZye/zSvF8F8v3tQNbhMdTgvheZd/FXcxEZnii8vu0KBWhqmz9E55hHLDi9wd rlwTokcZrM2XNgVygioInN46n6NDYu4jbNHKMLyofrKZY4PH0idvZmqpAkEpeaPSByup QghNrmnFDUcbiPGuflqDFVqzHI7aAW+Do0MUK+nQUPVNckIlInP1p9QjsMHVw39zShiH v+XA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :references:in-reply-to:message-id:date:subject:to:from :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=GoqVAM0kJLDnPaeMeokRVAotuaziUWPjpCBWb8YijiY=; b=AVdbU/dfsIVPbycjYwukhOfkjIMvTaxVRbQu29BaldjKa6sY/AljqAFpNfHThppKx2 aXVBynX5ZS5gWpoiwQpppBi5I3L61LyDWKzPCkPT74+97exs35OxQ9o0E7gGOioGLgtD vImUV8FvaqA/WNPLuWJMetLXpWNUTI7WVGXo5kOrOpRjdJrrQLuj4MqQ5okq7y8ZXtYI fEXu9d5lGwg8EHAnuMFkNI6vR1Dw42rHjyrV6lehnmRjhTv8u5ZWCDWzo3LpRbu8/1ZG O4f/6T3UgN5ocVc0r+n2JXthh7qC1mjs+qPvXztbuLz2Zk Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf2s+6yJbCVfvV2QcwppGnX/ynhOLjuFh3b23xqf+Xg0rCmEaArn R3j4lZrrHrM2kfh7+9fvjfQ= X-Google-Smtp-Source: AMsMyM6fWEoY1jD5FGlPWBMH1S9wDAt113EI+eJtHcKx/7qXIPoaRw7QR4SIH5yjwZGm58EPchQPcw== X-Received: by 2002:a05:6870:c188:b0:136:faa7:dd66 with SMTP id h8-20020a056870c18800b00136faa7dd66mr2986400oad.47.1666129436561; Tue, 18 Oct 2022 14:43:56 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a54:4816:0:b0:350:a26c:b39c with SMTP id j22-20020a544816000000b00350a26cb39cls4558200oij.4.-pod-prod-gmail; Tue, 18 Oct 2022 14:43:53 -0700 (PDT) X-Received: by 2002:a05:6808:152c:b0:355:2778:cbea with SMTP id u44-20020a056808152c00b003552778cbeamr8837186oiw.289.1666129433688; Tue, 18 Oct 2022 14:43:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666129433; cv=none; d=google.com; s=arc-20160816; b=0a7vmSmWiTkmXkZ6p95apZSDFHfCcZzUDWlCHjsO6fqqZqD2ybzClxBGUDDmQX5SZV rva9oUqv0oDnq2/LsbPdqHQC5zFbR5lOZKhodfYxW6Jk6gUuhCteLVzWFQyKW9EKPRE4 dG60WZgPeWXT/tb2Jbn73m4EOnZLfqNv4h1R8NBkgCTXc1lSRwhP2yUWTK+c6OU6DK8Y kSvE0UhfdtoIdWrcGOrDMZVtAJFdgW9EAhKSZWptXJiolIihhdm+R/fMu2z4WgJcx9gU 3PTUnWFBA3esBZP0xFc3pVQJ9QbSPzLgDa6pADoGl+zxF4MTph1mPkYdUjpPIVWsSg9Y inEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=z6sAJfi/nG6pqBNNYm/VxjjXn5K4/R74tph4z1oTCa0=; b=kRBcNWY1v2kaxi/lgNzqdGtBBJElJ/2fTx0FdQ07l7Hikhr5ZbeEyJnii240sMu8ZT GuDTggcX35ixxjMafBdqpBgSH1AKtx3pl0jaEiGDHnaZ9z/Xd78t9bdtmW7GaB/Dcrep FJ1wnxQMsq+08jthy022zIv468hwipimfyL1CHyy7HCe5QRnLk5wS4dt23Q2s4p0Oz1M 53TILkbIbPBgqodjJJaFXUs69ZHr6gfmGYabyjCKSk6owXrTmynaDAOfFC33Ob6F6lWE MaeXFRQGDufNkk3VqwFKl1ftG4SwW8xh9lVg0CQNRsB+ri+E3JufqlZC1KBkcgBJ8XnB FPGQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Js89QVCe; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2001:4860:4864:20::33 as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com. [2001:4860:4864:20::33]) by gmr-mx.google.com with ESMTPS id u189-20020acaabc6000000b0035446541a0fsi740179oie.5.2022.10.18.14.43.53 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Oct 2022 14:43:53 -0700 (PDT) Received-SPF: pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2001:4860:4864:20::33 as permitted sender) client-ip=2001:4860:4864:20::33; Original-Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-1364357a691so18436993fac.7 for ; Tue, 18 Oct 2022 14:43:53 -0700 (PDT) X-Received: by 2002:a05:6870:4188:b0:12d:484a:2592 with SMTP id y8-20020a056870418800b0012d484a2592mr19940887oac.5.1666129432025; Tue, 18 Oct 2022 14:43:52 -0700 (PDT) Original-Received: from [10.0.3.20] (177-208-40-54.user3p.veloxzone.com.br. [177.208.40.54]) by smtp.gmail.com with ESMTPSA id e18-20020a056808149200b003549397fde4sm6099404oiw.54.2022.10.18.14.43.50 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Oct 2022 14:43:51 -0700 (PDT) X-Mailer: MailMate (1.13.2r5673) In-Reply-To: X-Original-Sender: bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Js89QVCe; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2001:4860:4864:20::33 as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31592 Archived-At: --=_MailMate_1C1BA517-9A56-443C-BEBE-4B9BD11C8A4D_= Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable The data is mostly in database format and could be output in the best=20 format for the task, but I wanted to make it friendly for other people=20 to use as well. Could a YAML metadata block be a solution? ```yaml glossary: glossary_lang: grc entries: - headword: =E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 text: "=E2=96=A1 *pt.* bom; =E2=96=A1 *en.* good; and so on and so fo= rth" match: - =CE=B3=CE=B1=CE=B8=CE=AD - =CE=B3=CE=B1=CE=B8=CE=BF=CE=AF - =CE=BA=E1=BC=80=CE=B3=CE=AC=CE=B8 - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC=CF=82 - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE=CE=BD - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=B1=CE=AF - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CE=AF - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CF=82 - headword: =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD transliteration: agapan text: "=E2=96=A1 *pt.* estar satisfeito, gostar; =E2=96=A1 *en.* be s= atisfied,=20 like;" match: - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1 - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1=CE=B9=CF=82 - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7 - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CE=BD - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CF=82 - =E1=BC=80=CE=B3=CE=AC=CF=80=E1=BF=83 - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6 - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CF=82 ``` On 18 Oct 2022, at 14:34, Bastien DUMONT wrote: > No, citeproc receives a data structure produced by pandoc. Pandoc is=20 > responsible for the parsing. I think that your script would not be so=20 > hard to rewrite in Lua, the main problem is to know if you can achieve=20 > your goals this way. If your main concern is portability, then writing=20 > a Lua filter with no dependancies certainly is a good solution=20 > provided that you feed it with a Lua data structure (or embed the code=20 > responsible for JSON parsing in your script). > > Le Tuesday 18 October 2022 =C3=A0 02:16:16PM, Bernardo C.D.A. Vasconcelos= =20 > a =C3=A9crit : >> Thank you for the suggestions, Bastien. There is technically no need=20 >> for >> regex, as all the forms are spelled out to avoid the need to create=20 >> ad hoc >> regex rules for each term. Now that I think about it, the principle=20 >> is the >> same as Citeproc's: a tagged inline element will be matched against a=20 >> lookup >> table and replaced. I will look at the citeproc code to see if it=20 >> leads >> anywhere or if it could be reused in anyway. >> >> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: >> >>> Yes, but it is limited to this utf8 library. For instance, if=20 >>> perform a >>> regexp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF= =82', '[=CE=B3=CE=B4]')`, it try=20 >>> to match one >>> of the four bytes inside the square brackets against the string >>> '=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it will return the first = byte of =CE=B3, not =CE=B3. To=20 >>> circumvent >>> this limitation, you would be forced to test =CE=B3 and =CE=B4 separate= ly. >>> Nevertheless, if you always perform comparisons between whole=20 >>> strings as >>> you currently do in your script, this should not be a problem. >>> >>> As for your concern with dependancies, you most probably would have=20 >>> to >>> rely on a JSON library such as lunajson. However, if your JSON files=20 >>> are >>> not supposed to change, you could also convert them to a Lua file=20 >>> using >>> a JSON library and a serialization library, so as to be able to=20 >>> import >>> the resulting Lua data structure directly in your filter. >>> >>> Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A.=20 >>> Vasconcelos a >>> =C3=A9crit : >>>>> As for translating the filter note that Lua can't really handle >>>>> UTF-8. >>>>> There is some rudimentary support for converting codepoint >>>>> number =E2=86=94 >>>>> UTF-8 >>>>> byte sequences and for iterating through a string of bytes >>>>> representing >>>>> UTF-8 encoded characters but no concept of chars as opposed to >>>>> bytes. >>>>> This >>>>> may become a show stopper if you need to manipulate strings >>>>> containing >>>>> UTF-8 text. >>>> >>>> >>>> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards >>>> includes >>>> UTF-8 support. Have you seen it? E.g.=20 >>>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_= Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm >>>> >>>>> For Ancient Greek you want grc as the language tag. >>>>> >>>> >>>> Indeed it is (and that is generally what I use), but =E1=BC=80=CE=B3= =CE=B1=CE=B8=CF=8C=CF=82=20 >>>> is just >>>> Polytonic Greek, which is not the same as Ancient Greek. >>>> >>>> --=20 >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, >>>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit=20 >>>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BF= EC-F17FAF27BEA5%40gmail.com. >>> >>> --=20 >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it,=20 >>> send >>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit=20 >>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40loc= alhost. >> >> --=20 >> You received this message because you are subscribed to the Google=20 >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it,=20 >> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575= -93426852FCFB%40gmail.com. > > --=20 > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send=20 > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40loc= alhost. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com. --=_MailMate_1C1BA517-9A56-443C-BEBE-4B9BD11C8A4D_= Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

The data is mostly in database format and could be output i= n the best format for the task, but I wanted to make it friendly for other = people to use as well. Could a YAML metadata block be a solution?

glossary:
  glossary_lang: grc
  entries:
  - headword: =E1=BC=80=CE=B3=CE=B1=
=CE=B8=CF=8C=CF=82
    text: "=E2=96=A1 *pt.* bom;  =E2=96=A1 *en.* good; and so on and =
so forth"=

    match:
    - =CE=B3=CE=B1=CE=B8=CE=AD
    - =CE=B3=CE=B1=CE=B8=CE=BF=CE=AF
    - =CE=BA=E1=BC=80=CE=B3=CE=AC=CE=B8
    - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC
    - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AC=CF=82
    - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE
    - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=AE=CE=BD
    - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=B1=CE=AF
    - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CE=AF
    - =CE=BA=E1=BC=80=CE=B3=CE=B1=CE=B8=CE=BF=CF=82
  - headword: =E1=BC=80=CE=B3=CE=B1=
=CF=80=E1=BE=B6=CE=BD
    transliteration: agapan
    text: "=E2=96=A1 *pt.* estar satisfeito, gostar;  =
=E2=96=A1 *en.=
* be satisfied, like;"
    match:
    - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1
    - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B1=CE=B9=CF=82
    - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7
    - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CE=BD
    - =E1=BC=80=CE=B3=CE=AC=CF=80=CE=B7=CF=82
    - =E1=BC=80=CE=B3=CE=AC=CF=80=E1=BF=83
    - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6
    - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CE=BD
    - =E1=BC=80=CE=B3=CE=B1=CF=80=E1=BE=B6=CF=82

On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:

No, citeproc receives = a data structure produced by pandoc. Pandoc is responsible for the parsing.= I think that your script would not be so hard to rewrite in Lua, the main = problem is to know if you can achieve your goals this way. If your main con= cern is portability, then writing a Lua filter with no dependancies certain= ly is a good solution provided that you feed it with a Lua data structure (= or embed the code responsible for JSON parsing in your script).

Le Tuesday 18 October 2022 =C3=A0 02:16:16PM, Bernardo C.D.A. Vasconcelos a= =C3=A9crit :

Thank you for the suggestions, Bastien. There i= s technically no need for
regex, as all the forms are spelled out to avoid the need to create ad hoc<= br> regex rules for each term. Now that I think about it, the principle is the<= br> same as Citeproc's: a tagged inline element will be matched against a l= ookup
table and replaced. I will look at the citeproc code to see if it leads
anywhere or if it could be reused in anyway.

On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:

Yes, but it is limited to this utf8 library. Fo= r instance, if perform a
regexp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF= =82', '[=CE=B3=CE=B4]')`, it try to match one
of the four bytes inside the square brackets against the string
'=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it will return the fi= rst byte of =CE=B3, not =CE=B3. To circumvent
this limitation, you would be forced to test =CE=B3 and =CE=B4 separately.<= br> Nevertheless, if you always perform comparisons between whole strings as you currently do in your script, this should not be a problem.

As for your concern with dependancies, you most probably would have to
rely on a JSON library such as lunajson. However, if your JSON files are not supposed to change, you could also convert them to a Lua file using
a JSON library and a serialization library, so as to be able to import
the resulting Lua data structure directly in your filter.

Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A. Vasconcelos a=
=C3=A9crit :

As for translating the filter note = that Lua can't really handle
UTF-8.
There is some rudimentary support for converting codepoint
number =E2=86=94
UTF-8
byte sequences and for iterating through a string of bytes
representing
UTF-8 encoded characters but no concept of chars as opposed to
bytes.
This
may become a show stopper if you need to manipulate strings
containing
UTF-8 text.

Thanks, @BPJ, for the explanation. Apparently,= Lua 5.3 onwards
includes
UTF-8 support. Have you seen it? E.g. https://q-syshelp.qsc.com/Content/Control_Scripting= /Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm

For Ancient Greek you want grc as the language = tag.

Indeed it is (and that is generally what I use= ), but =E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82 is just
Polytonic Greek, which is not the same as Ancient Greek.

--
You received this message because you are subscribed to the Google
Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit
https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F= 17FAF27BEA5%40gmail.com.

--
You received this message because you are subscribed to the Google
Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.goog= le.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-9= 3426852FCFB%40gmail.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.go= ogle.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/p= andoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
--=_MailMate_1C1BA517-9A56-443C-BEBE-4B9BD11C8A4D_=--