From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31585 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bastien DUMONT Newsgroups: gmane.text.pandoc Subject: Re: Glossary Filter for MD2Tex Date: Tue, 18 Oct 2022 16:34:37 +0000 Message-ID: References: <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com> <3307993F-F813-405F-BFEC-F17FAF27BEA5@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17016"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDCINCES2QJRBIVLXONAMGQEFGFRHNI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Oct 18 18:34:47 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lj1-f186.google.com ([209.85.208.186]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1okpYE-0004EN-FD for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 18 Oct 2022 18:34:46 +0200 Original-Received: by mail-lj1-f186.google.com with SMTP id v4-20020a2ea444000000b00261e0d5bc25sf6370355ljn.19 for ; Tue, 18 Oct 2022 09:34:46 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666110885; cv=pass; d=google.com; s=arc-20160816; b=dFpk4rgZ45dG1gxF0dsr1PeIr5rewr8T3BFzbcdn0W1GA2a2EqfWkBEA1DSdJRne94 C8CsCgU+QuBLg56WSijqpDlxupB8zGh386xe6AVs9e1zSh8q96ZDCrUSe4kI20fUUhF9 rcDRvSJnOm4iUDI3W/tPI6nsvGZIHfTj2MsY1Tsg5OC9N8cLCKvo+VfQdzVvGk6PblOX uo/4mtBRE1cenOtKVmLtYVG1Bzk2lj00xHNONjlxr3OXet0EIjc2q3LUzCynG2satrr+ MwHHM2CmkRlYwjIp6R6tU9WZpNeCwZC7p0x/DFzljBSp0Ra8tITQ1NI3nWpbQNqt2EEm f8FQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:dkim-signature; bh=E8XDRl1wHd8O7XdFEMfVMvkbL/osX7adV28p9Zl50zI=; b=0P9VQZ/NhLJk2VU4XfiMJQv+s/umWb5oxn3AAwWNBycFFyYKaRWslzg9HN7S5dMNn0 MulVnxJnzUCbDKwSIuNAEnXnoKutIOPBI+pbNyJgYgTD9Gg7/Fly/6QsR3iKvLNhptsz MAmiHpBBLEa+/T1HUt9prX1yhYRN3vGnKFNibmG0scEp63PN0nDAxtQvX31xtoqFcVir PkiplcBiayPW0vFaa5bliSFzQXjj3/DSAPt0CtaVp7r1WggXmeu9W+Fw4Rwnc8eAwuSz xN0N9IUsUAApB1XYCF0biM/id1c23dw4fP2lFE566TBZOCheJyKwTZJDs1teIY/RlXE7 h1Gw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=dj52h6zR; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:from:to:cc :subject:date:message-id:reply-to; bh=E8XDRl1wHd8O7XdFEMfVMvkbL/osX7adV28p9Zl50zI=; b=NMPvzDo+pFuvlrBgQsdPnnKwSv6muPTTgGr2SRBiLKZ9cFB3yNs6RXhV8BOkK68h8w dzjvamY6Fpv4nx89PHSROH/T7wpO0/LV+znP5Hd9UjPy+QXQAIdsQpZiqJA0axRtVmjy fHe8Dvlf8COsdjA4OxNLvHLlJ6chiwXQSTTOPh7uJIQ2kxNxPTx7UvVjGpk/E5r2tf8P TOL4pP5dLyK2uGXCdYssfxJqo56QX21iTkDw8RAs7ph732G8wVY9kcQjgj25KLcz8lj/ 6SBHyASKiR/XMLgMwdtrc6mObfhbISW+u+VHh5R9AjjBt X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=E8XDRl1wHd8O7XdFEMfVMvkbL/osX7adV28p9Zl50zI=; b=wsNJpad87JzqfXvLJsQAmGRciqyKy2MuzhByRxQf4tvXqf+9HvnV2qwn5AsmaidmKT YzhaUmMH1mRs6B3/dWtzYl3Y61JsZ9yMCYhY/zujY3qU3SpPvqkj9Vnp+8p2gS7YHIYq uUk9Um5kMFs1YUxJxAO7pQQCFcLN3dy9hA6sqnVmlVpo7Q9hO0TmVVQ9BhSZETsRekDN lmovh/gRs40gMQV31/49vMtCkmdhtendR+kJVN2sekHCwMVAoEZxU7vMhGzDRvyKBI+D Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACrzQf1eLD7h/0cwYfmdYd1u/IWe6/TZ4J3efl1KhFapwf1qgq4KtIaV TFBVQb1wW5HizuwR1IHia6M= X-Google-Smtp-Source: AMsMyM6+mhXZOD3wv49uxJGkhyDN7nx45cbxLS7ty6MZwXUL5ZyXiCOTm0B5CdB4PPYDxbNQEo6mzw== X-Received: by 2002:a05:651c:17a7:b0:261:c0b1:574b with SMTP id bn39-20020a05651c17a700b00261c0b1574bmr1424444ljb.40.1666110885793; Tue, 18 Oct 2022 09:34:45 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6512:2103:b0:4a2:3951:eac8 with SMTP id q3-20020a056512210300b004a23951eac8ls1776280lfr.0.-pod-prod-gmail; Tue, 18 Oct 2022 09:34:41 -0700 (PDT) X-Received: by 2002:a05:6512:3e13:b0:499:1829:5181 with SMTP id i19-20020a0565123e1300b0049918295181mr1343069lfv.71.1666110881049; Tue, 18 Oct 2022 09:34:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666110881; cv=none; d=google.com; s=arc-20160816; b=cxqsY0K8jdRmaNOb11vhg3AIHUEUt9HjB8XlcJxGVEECJeJNR5QQ97ByRVZY4fJFmE 4fz1OChrJ4ylwMWnhB1JC7cvhHFbDOY0GIISv8o8xT0bYIlgwZT0SYq5FbGAbuQRNWSN /3eZiddXVkRujEPIyhBqDYNrvu91ZNTm4/M7lXRTodhYdLD6TaIQ/F9ZKriIM1KzqwDL 4BnZ6tZIJ0hMIRWSmo7Fvqz5obOhIMn5SslTQeHtLjse0R+Hk9jQtgqdYFUgCWD3rTFX vINJhlD7U4Wzlk6Lfou96/JPNJK3GILatb46/3g6BeWxLJIE7maGPNuLyJrBfqAhjVdv o+1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=A3JEUw6WAbuQXSDyvSap+P0HKXSZzhtN9lIrkB0i4S0=; b=DEexf11J2lMhFGQs3p2b9ICspmAYCU6rOoKRhBHBOsD3TlVHOjZ4JJQxyPnWFW+tWl pOMqdIhu9t3L8s6HWtflyQckIj0F+8ecZJomrwDxu+ueOTvwUefIuk60l5zR6/T68byU 6weHCT88WKx57895AiT7OY0Zh9/oZxQfOuWcONS+cLgbwtojCD3Dt++WbTEG8/TzTF5/ 8OV8LtMBLDZqWQe1fpIn7hSLYed57CeJZ9EzS2wDX7t4X+azwy16OQKJ1DXJV/zTVLKy HAIR1NvhP/fESkIT2GHlcOp99xd+srE5XCUUHjX18faBlQhcH1cMLWHdlOJ5WoG0OyjT K34w== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=dj52h6zR; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Original-Received: from mout02.posteo.de (mout02.posteo.de. [185.67.36.66]) by gmr-mx.google.com with ESMTPS id 28-20020ac25f5c000000b0049c8ac119casi508980lfz.5.2022.10.18.09.34.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Oct 2022 09:34:41 -0700 (PDT) Received-SPF: pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) client-ip=185.67.36.66; Original-Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 64398240105 for ; Tue, 18 Oct 2022 18:34:40 +0200 (CEST) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4MsKFp5fKWz9rxw for ; Tue, 18 Oct 2022 18:34:38 +0200 (CEST) Content-Disposition: inline In-Reply-To: <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> X-Original-Sender: bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=dj52h6zR; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.66 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31585 Archived-At: Yes, but it is limited to this utf8 library. For instance, if perform a reg= exp search like `string.match('=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', '[= =CE=B3=CE=B4]')`, it try to match one of the four bytes inside the square b= rackets against the string '=E1=BC=80=CE=B3=CE=B1=CE=B8=CF=8C=CF=82', so it= will return the first byte of =CE=B3, not =CE=B3. To circumvent this limit= ation, you would be forced to test =CE=B3 and =CE=B4 separately. Neverthele= ss, if you always perform comparisons between whole strings as you currentl= y do in your script, this should not be a problem. As for your concern with dependancies, you most probably would have to rely= on a JSON library such as lunajson. However, if your JSON files are not su= pposed to change, you could also convert them to a Lua file using a JSON li= brary and a serialization library, so as to be able to import the resulting= Lua data structure directly in your filter. Le Tuesday 18 October 2022 =C3=A0 12:36:03PM, Bernardo C.D.A. Vasconcelos a= =C3=A9crit : > > As for translating the filter note that Lua can't really handle UTF-8. > > There is some rudimentary support for converting codepoint number =E2= =86=94 > > UTF-8 > > byte sequences and for iterating through a string of bytes representing > > UTF-8 encoded characters but no concept of chars as opposed to bytes. > > This > > may become a show stopper if you need to manipulate strings containing > > UTF-8 text. >=20 >=20 > Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards includes > UTF-8 support. Have you seen it? E.g. https://q-syshelp.qsc.com/Content/C= ontrol_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-= 8_Support.htm >=20 > > For Ancient Greek you want grc as the language tag. > >=20 >=20 > Indeed it is (and that is generally what I use), but =E1=BC=80=CE=B3=CE= =B1=CE=B8=CF=8C=CF=82 is just > Polytonic Greek, which is not the same as Ancient Greek. >=20 > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.