public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "Bernardo C.D.A. Vasconcelos" <bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Glossary Filter for MD2Tex
Date: Tue, 18 Oct 2022 14:16:16 -0300	[thread overview]
Message-ID: <7072522D-F2FE-4BAC-A575-93426852FCFB@gmail.com> (raw)
In-Reply-To: <Y07VnbuRsuqUg8US@localhost>

Thank you for the suggestions, Bastien. There is technically no need for 
regex, as all the forms are spelled out to avoid the need to create ad 
hoc regex rules for each term. Now that I think about it, the principle 
is the same as Citeproc's: a tagged inline element will be matched 
against a lookup table and replaced. I will look at the citeproc code to 
see if it leads anywhere or if it could be reused in anyway.

On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:

> Yes, but it is limited to this utf8 library. For instance, if perform 
> a regexp search like `string.match('ἀγαθός', '[γδ]')`, it try 
> to match one of the four bytes inside the square brackets against the 
> string 'ἀγαθός', so it will return the first byte of γ, not 
> γ. To circumvent this limitation, you would be forced to test γ and 
> δ separately. Nevertheless, if you always perform comparisons between 
> whole strings as you currently do in your script, this should not be a 
> problem.
>
> As for your concern with dependancies, you most probably would have to 
> rely on a JSON library such as lunajson. However, if your JSON files 
> are not supposed to change, you could also convert them to a Lua file 
> using a JSON library and a serialization library, so as to be able to 
> import the resulting Lua data structure directly in your filter.
>
> Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. Vasconcelos 
> a écrit :
>>> As for translating the filter note that Lua can't really handle 
>>> UTF-8.
>>> There is some rudimentary support for converting codepoint number 
>>> ↔
>>> UTF-8
>>> byte sequences and for iterating through a string of bytes 
>>> representing
>>> UTF-8 encoded characters but no concept of chars as opposed to 
>>> bytes.
>>> This
>>> may become a show stopper if you need to manipulate strings 
>>> containing
>>> UTF-8 text.
>>
>>
>> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards 
>> includes
>> UTF-8 support. Have you seen it? E.g. 
>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>>
>>> For Ancient Greek you want grc as the language tag.
>>>
>>
>> Indeed it is (and that is generally what I use), but ἀγαθός is 
>> just
>> Polytonic Greek, which is not the same as Ancient Greek.
>>
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.


  reply	other threads:[~2022-10-18 17:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17 18:25 Bernardo C. D. A. Vasconcelos
     [not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-10-17 18:38   ` BPJ
     [not found]     ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-10-18 15:36       ` Bernardo C.D.A. Vasconcelos
     [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 16:34           ` Bastien DUMONT
2022-10-18 17:16             ` Bernardo C.D.A. Vasconcelos [this message]
     [not found]               ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 17:34                 ` Bastien DUMONT
2022-10-18 21:43                   ` Bernardo C.D.A. Vasconcelos
     [not found]                     ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 22:06                       ` Bastien DUMONT
2022-10-19 19:50                         ` Bernardo C.D.A. Vasconcelos
     [not found]                           ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-19 21:28                             ` Bastien DUMONT
2022-10-19 22:43                               ` Bernardo C.D.A. Vasconcelos
     [not found]                                 ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-20  7:16                                   ` Bastien DUMONT
2022-10-18 18:42           ` BPJ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7072522D-F2FE-4BAC-A575-93426852FCFB@gmail.com \
    --to=bernardovasconcelos-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).