From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31586 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "Bernardo C.D.A. Vasconcelos" Newsgroups: gmane.text.pandoc Subject: Re: Glossary Filter for MD2Tex Date: Tue, 18 Oct 2022 14:16:16 -0300 Message-ID: <7072522D-F2FE-4BAC-A575-93426852FCFB@gmail.com> References: <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com> <3307993F-F813-405F-BFEC-F17FAF27BEA5@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2711"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDUKFWODQ4ARBZN6XONAMGQEFZPS6VY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Oct 18 19:16:24 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-yw1-f183.google.com ([209.85.128.183]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1okqCW-0000U9-HY for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 18 Oct 2022 19:16:24 +0200 Original-Received: by mail-yw1-f183.google.com with SMTP id 00721157ae682-3606e54636asf145739517b3.16 for ; Tue, 18 Oct 2022 10:16:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666113383; cv=pass; d=google.com; s=arc-20160816; b=PEkGqGEX5dcZipnMym2wBT8Fvm5r1Qofmo7l9JHHWb8HqyA+xyu9QWSzXwBJlkSRto BK795vC3dC26B82mY6oBZqPDFONydCX9rtJp8YQC61Sr5MvTodrjgXuIx4G/LpQyhzFv IYoLa3n9kSCqeKOPAvXus/N02idUdPALIAd33MBjeksPgzMemRwIRyLRU+awmUF0Xsht ZPnKTi9eALnDsS8+dAiF5AkvF4csh7OqA/MjD7D2Helo+Ep3ir2jZQ7to609vtHgYQ+y LxdiMJhf61a95Ymn/XfqoZSOvQ8g6eXRFIUmML0h9YHx0wj4k5tw60Ur9BNakkCsDOlC j6uw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :sender:dkim-signature:dkim-signature; bh=SXDJiiTl3KkHAYiSZt7u5u2tRSnvaAsUGjxa/199Qa4=; b=Gq6R0iI4QtBixRvBhrvaZQihLcRrHEKIKQFyK7AkqXuxEgwCvcFtr2Mb5A0R914v+H 7LYL0dvSUTcw1ltmbnvBdDQH1HFkoYL2hG1+QCp+mv4zMmKJ1eyW9jTGnknZ7aWZwCMo TgzhX0Kvrk9A8KwLhev3C9Z+mZBZHP/omu9LG7Tbang7sVi3xj9HBrJ/0DCLxa8nkUKC HmQvBpTdkrFGhyTiuWUzn43pI1K3RSZ1poR4KSQXxSL9oQMe5qvP6p5U0Kvn1mItBhRj oCT+FsE06Qd3ZPuuCZ2pI7lRq8Fuf/By9dnIP8N5K1IsvC3CaIzE7bGDdHX5vXAzIJht s6nQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=LUH7yZNi; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::c2f as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:sender:from:to:cc:subject:date :message-id:reply-to; bh=SXDJiiTl3KkHAYiSZt7u5u2tRSnvaAsUGjxa/199Qa4=; b=Fn+aZqx04roVuI6TLXK0OrjRObNMgAebpUG/3zEcO9YyqC8aO5eGP9cDknfNq8EAxb cmVU8NNeTFTfqzFhsYYqheA+rhVZEPM2pzZgeYQNWKwbroafWHWn1A+yh2QK6LuMYKS1 8sVI9xQT/f+D6t6fAv0sK8iCZdLFww9x4a4h1SP+ZEjxIHIK+TJW+ZBkGm74Y2X2ZWrP l7jC2kL9rL52trTr3FVcH7kK4suweNS3Q2d+qWKuvjU/4KL5jIHb9ljQEjhLQB91l+nb YvsrsqlaBNYdK6oRdzo6TOCGIkvVTJWZOlVSj+X3pLiWZOePc87heustjQhyGHweL DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=SXDJiiTl3KkHAYiSZt7u5u2tRSnvaAsUGjxa/199Qa4=; b=JU0MlWmv3fdIOIvveMsVsYJ039L2ekOac59Ki9oMKLc1NxpZL66Dkyli4Gq3k0M5ZR 4gYCDdm+cyPGf6yFCSc8cyjwLAaNyUVFE7SGs0ikyEICcwUe9E3FdC5AmHtnwWhi8e7n dJkQWOjOtHvVjfiaKJSXYjb+/RdbrygTtbfpa6g9AXB/i9QYwHgqmEC/NU0zoSHU6Uzd Yofh4jJF6CievEz42DFGq7v55nQKYolNrriCb85ryFOQIMmJpiaxrFyss5Dnj00jOFhQ J9O7YYOytSFzrY306XDRGED89LLvCpIKl5ic7j2XnzKvvTKMr9ecTM5huTl0/r5cRQH/ j X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:sender:from:to :cc:subject:date:message-id:reply-to; bh=SXDJiiTl3KkHAYiSZt7u5u2tRSnvaAsUGjxa/199Qa4=; b=f1RuHh8zXIT47uVpTU57drpjlAAsuaLZ2r86AzflOecwHsZ4WjzcamidreMqEBmCsM O8U5BvdZRYEmrpNNcRyOEEuHOMqDwoZiwG0Sv5oD7tRmYpQc7xWNZNr7+Xopk5uPdDhF lJ+zviML5vfW5tSY4NXCUetc2ris+c33mdGBa+Y6huDwqOQ2y53r9JpH4D4MCwTAfsP1 TOYNBOH7nfttsMb9JDO2nRFXecxmWwD0wISxGSAo69kHBo2kGHguGIcsSmcG27qBhFNP 0NfxPwBKNDJ1gp+YEpZm Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf1RwSnQKrCH8yvwtS0WT4s5M6sdFJYvkbyB5XVQBCPa/g55vzFc VYuuSmEZTx1p9Z9+mbfWs0Y= X-Google-Smtp-Source: AMsMyM5eKUPtNLM5RwG0f96a40WnwMsPOIImvk4zpGdZI07CWWjPwJH7q5YgQgEqUI07nKo0QiuglA== X-Received: by 2002:a0d:ca56:0:b0:355:cc7d:f92f with SMTP id m83-20020a0dca56000000b00355cc7df92fmr3283218ywd.9.1666113383516; Tue, 18 Oct 2022 10:16:23 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a25:b084:0:b0:6c5:5d52:73e7 with SMTP id f4-20020a25b084000000b006c55d5273e7ls1028678ybj.3.-pod-prod-gmail; Tue, 18 Oct 2022 10:16:20 -0700 (PDT) X-Received: by 2002:a25:2593:0:b0:6c2:5a7f:1ad0 with SMTP id l141-20020a252593000000b006c25a7f1ad0mr3187100ybl.421.1666113380665; Tue, 18 Oct 2022 10:16:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666113380; cv=none; d=google.com; s=arc-20160816; b=eaMZ/P9/8wTg/Swn/nNf6/rmD4m7HIyTHfGGnH0fZO60hOr77kXNcVHaRHPjVrn399 tmKvZreyXnXbFwDCgPDvjP+6w1mym1xY74Zp0KxraJ4rG81wU/aWKWrGqnFkz5Kjscjg OxyHDLnvCZ5FkeW2Lh94nVDwDq6n0EolQ6W561Z2mIf+m2LBytrg0FBVFAdrIW5nKrXY 6KEKwt6FdGOY2+ozpxtlyRaummMu1mbWJv7ckSB5KJy+EtNhrFvobXb3XBhNYb5ZJjj+ aC1RETSzDfTt6mzFXO0iyko14tdGzjH4kK+lT4QoGBcjhjbfHkTUmfxwZvaer1afAIUV 6euA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:dkim-signature; bh=jQmwhuj1BXu5mE4KptQaqVWXnf8G5jfHi714J4SmUfM=; b=LUB1PE8X/Rz7Xx5MhhlWq+lVxKRzx7BdbF2o0YQOyQVIvQEmkPL0qth9Ub18M+M2tD 3bdeuwAa9C7xwxtHWJRFlxgRs3+uTFXcHqekYY8/nijq2Qxl9U88Bd5WN4lO1WIWzDJE kA9zWS9ooAuIUjP4MFqhS3muYDA4X66McaT7QsOwKB3VUfUf6XViJvTuS0nukQkLUv5G akJnI3GWL/TXbE9H5f6vXXz2W1aiew4dUZlUIwQz6np/dOp1ektArWfWuyEmuBUoso3g CaO1C6JQ9/HqoboEMLjPhzIv9jOgE2dBon0nLkUKadbuaL9C38f+zGFPLuWklyB1Oi+J 8+8g== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=LUH7yZNi; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::c2f as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-oo1-xc2f.google.com (mail-oo1-xc2f.google.com. [2607:f8b0:4864:20::c2f]) by gmr-mx.google.com with ESMTPS id 64-20020a250743000000b006be4b4f832esi614576ybh.2.2022.10.18.10.16.20 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Oct 2022 10:16:20 -0700 (PDT) Received-SPF: pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::c2f as permitted sender) client-ip=2607:f8b0:4864:20::c2f; Original-Received: by mail-oo1-xc2f.google.com with SMTP id z194-20020a4a49cb000000b00480dc60b905so289028ooa.4 for ; Tue, 18 Oct 2022 10:16:20 -0700 (PDT) X-Received: by 2002:a4a:8dd7:0:b0:47f:6250:4d82 with SMTP id a23-20020a4a8dd7000000b0047f62504d82mr1664861ool.73.1666113379926; Tue, 18 Oct 2022 10:16:19 -0700 (PDT) Original-Received: from [10.0.3.20] (177-208-40-54.user3p.veloxzone.com.br. [177.208.40.54]) by smtp.gmail.com with ESMTPSA id l36-20020a05687106a400b0013669485016sm6357107oao.37.2022.10.18.10.16.18 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Oct 2022 10:16:18 -0700 (PDT) X-Mailer: MailMate (1.13.2r5673) In-Reply-To: X-Original-Sender: bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=LUH7yZNi; spf=pass (google.com: domain of bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::c2f as permitted sender) smtp.mailfrom=bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31586 Archived-At: Thank you for the suggestions, Bastien. There is technically no need for=20 regex, as all the forms are spelled out to avoid the need to create ad=20 hoc regex rules for each term. Now that I think about it, the principle=20 is the same as Citeproc's: a tagged inline element will be matched=20 against a lookup table and replaced. I will look at the citeproc code to=20 see if it leads anywhere or if it could be reused in anyway. On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: > Yes, but it is limited to this utf8 library. For instance, if perform=20 > a regexp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF= =82', '[=CE=B3=CE=B4]')`, it try=20 > to match one of the four bytes inside the square brackets against the=20 > string '=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it will return the f= irst byte of =CE=B3, not=20 > =CE=B3. To circumvent this limitation, you would be forced to test =CE=B3= and=20 > =CE=B4 separately. Nevertheless, if you always perform comparisons betwee= n=20 > whole strings as you currently do in your script, this should not be a=20 > problem. > > As for your concern with dependancies, you most probably would have to=20 > rely on a JSON library such as lunajson. However, if your JSON files=20 > are not supposed to change, you could also convert them to a Lua file=20 > using a JSON library and a serialization library, so as to be able to=20 > import the resulting Lua data structure directly in your filter. > > Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A. Vasconcelos= =20 > a =C3=A9crit : >>> As for translating the filter note that Lua can't really handle=20 >>> UTF-8. >>> There is some rudimentary support for converting codepoint number=20 >>> =E2=86=94 >>> UTF-8 >>> byte sequences and for iterating through a string of bytes=20 >>> representing >>> UTF-8 encoded characters but no concept of chars as opposed to=20 >>> bytes. >>> This >>> may become a show stopper if you need to manipulate strings=20 >>> containing >>> UTF-8 text. >> >> >> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards=20 >> includes >> UTF-8 support. Have you seen it? E.g.=20 >> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Ma= nual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm >> >>> For Ancient Greek you want grc as the language tag. >>> >> >> Indeed it is (and that is generally what I use), but =E1=BC=80=CE=B3=CE= =B1=CE=B8=CF=8C=CF=82 is=20 >> just >> Polytonic Greek, which is not the same as Ancient Greek. >> >> --=20 >> You received this message because you are subscribed to the Google=20 >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it,=20 >> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit=20 >> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC= -F17FAF27BEA5%40gmail.com. > > --=20 > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send=20 > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40local= host. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.