public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "Bernardo C.D.A. Vasconcelos" <bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Glossary Filter for MD2Tex
Date: Wed, 19 Oct 2022 19:43:36 -0300	[thread overview]
Message-ID: <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6@gmail.com> (raw)
In-Reply-To: <Y1BsCdqttFxOi/pa@localhost>

Bastien, the only work that I was left with is to say thank you very 
much. I did some simple testing, and it seems quite elegant. Do I have 
your permission to share it with others later, giving proper 
attribution?

On 19 Oct 2022, at 18:28, Bastien DUMONT wrote:

> I think that the attached script could be a good starting point.
>
> Le Wednesday 19 October 2022 à 04:50:25PM, Bernardo C.D.A. 
> Vasconcelos a écrit :
>> I have found this little script that takes me nearly there:
>>
>> local vars = {}
>>
>> function Meta(meta)
>>     for k, v in pairs(meta) do
>>         vars["%" .. k .. "%"] = v
>>     end
>> end
>>
>> function Str(elem)
>>     if vars[elem.text] then
>>         return vars[elem.text]
>>     else
>>         return elem
>>     end
>> end
>>
>> return {
>>     { Meta = Meta },
>>     { Str  = Str  }
>> }
>>
>>
>> Instead, we would use: meta.glossary.entries. The crux for me is 
>> looping
>> through the list of entries, adding all the values of the to_match 
>> field
>> (a.k.a. known forms) (of each entry) to vars as a key with the 
>> content of some
>> other field (e.g. glslink) as value. E.g. vars[ .. 
>> entry.to_match.each .. ] =
>> entry.glslink.
>>
>> On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:
>>
>>     Yes, it could! You would have access to the corresponding 
>> metadata object
>>     in the AST.
>>
>>     Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. 
>> Vasconcelos a
>>     écrit :
>>
>>         The data is mostly in database format and could be output in 
>> the best
>>         format
>>         for the task, but I wanted to make it friendly for other 
>> people to use
>>         as well.
>>         Could a YAML metadata block be a solution?
>>
>>         glossary:
>>         glossary_lang: grc
>>         entries:
>>         - headword: ἀγαθός
>>         text: "□ *pt.* bom; □ *en.* good; and so on and so forth"
>>         match:
>>         - γαθέ
>>         - γαθοί
>>         - κἀγάθ
>>         - κἀγαθά
>>         - κἀγαθάς
>>         - κἀγαθή
>>         - κἀγαθήν
>>         - κἀγαθαί
>>         - κἀγαθοί
>>         - κἀγαθος
>>         - headword: ἀγαπᾶν
>>         transliteration: agapan
>>         text: "□ *pt.* estar satisfeito, gostar; □ *en.* be 
>> satisfied, like;"
>>         match:
>>         - ἀγάπα
>>         - ἀγάπαις
>>         - ἀγάπη
>>         - ἀγάπην
>>         - ἀγάπης
>>         - ἀγάπῃ
>>         - ἀγαπᾶ
>>         - ἀγαπᾶν
>>         - ἀγαπᾶς
>>
>>         On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
>>
>>         No, citeproc receives a data structure produced by pandoc. 
>> Pandoc is
>>         responsible for the parsing. I think that your script would 
>> not be so
>>         hard
>>         to rewrite in Lua, the main problem is to know if you can 
>> achieve your
>>         goals this way. If your main concern is portability, then 
>> writing a Lua
>>         filter with no dependancies certainly is a good solution 
>> provided that
>>         you
>>         feed it with a Lua data structure (or embed the code 
>> responsible for
>>         JSON
>>         parsing in your script).
>>
>>         Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. 
>> Vasconcelos a
>>         écrit :
>>
>>         Thank you for the suggestions, Bastien. There is technically 
>> no need
>>         for
>>         regex, as all the forms are spelled out to avoid the need to 
>> create ad
>>         hoc
>>         regex rules for each term. Now that I think about it, the 
>> principle is
>>         the
>>         same as Citeproc's: a tagged inline element will be matched 
>> against a
>>         lookup
>>         table and replaced. I will look at the citeproc code to see 
>> if it leads
>>         anywhere or if it could be reused in anyway.
>>
>>         On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
>>
>>         Yes, but it is limited to this utf8 library. For instance, if
>>         perform a
>>         regexp search like `string.match('ἀγαθός', '[γδ]')`, 
>> it try to
>>         match one
>>         of the four bytes inside the square brackets against the 
>> string
>>         'ἀγαθός', so it will return the first byte of γ, not 
>> γ. To
>>         circumvent
>>         this limitation, you would be forced to test γ and δ 
>> separately.
>>         Nevertheless, if you always perform comparisons between whole
>>         strings as
>>         you currently do in your script, this should not be a 
>> problem.
>>
>>         As for your concern with dependancies, you most probably 
>> would have
>>         to
>>         rely on a JSON library such as lunajson. However, if your 
>> JSON
>>         files are
>>         not supposed to change, you could also convert them to a Lua 
>> file
>>         using
>>         a JSON library and a serialization library, so as to be able 
>> to
>>         import
>>         the resulting Lua data structure directly in your filter.
>>
>>         Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
>>         Vasconcelos a
>>         écrit :
>>
>>         As for translating the filter note that Lua can't really
>>         handle
>>         UTF-8.
>>         There is some rudimentary support for converting codepoint
>>         number ↔
>>         UTF-8
>>         byte sequences and for iterating through a string of bytes
>>         representing
>>         UTF-8 encoded characters but no concept of chars as opposed
>>         to
>>         bytes.
>>         This
>>         may become a show stopper if you need to manipulate strings
>>         containing
>>         UTF-8 text.
>>
>>         Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 
>> onwards
>>         includes
>>         UTF-8 support. Have you seen it? E.g. [1]https://
>>         q-syshelp.qsc.com/Content/Control_Scripting/
>>         Lua_5.3_Reference_Manual/Standard_Libraries/
>>         4_-_Basic_UTF-8_Support.htm
>>
>>         For Ancient Greek you want grc as the language tag.
>>
>>         Indeed it is (and that is generally what I use), but 
>> ἀγαθός is
>>         just
>>         Polytonic Greek, which is not the same as Ancient Greek.
>>
>>         --
>>         You received this message because you are subscribed to the
>>         Google
>>         Groups "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from
>>         it,
>>         send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [2]https://
>>         groups.google.com/d/msgid/pandoc-discuss/
>>         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it,
>>         send
>>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [3]https://
>>         groups.google.com/d/msgid/pandoc-discuss/
>>         Y07VnbuRsuqUg8US%40localhost.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it, send
>>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [4][1]https://
>>         groups.google.com/d
>>         /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups
>>         "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it, send
>>         an
>>         email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [5][2]https://
>>         groups.google.com/d/
>>         msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups
>>         "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it, send
>>         an email
>>         to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [7][3]https://
>>         groups.google.com/d/msgid/
>>         pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
>>
>>         References:
>>
>>         [1] [4]https://q-syshelp.qsc.com/Content/Control_Scripting/
>>         Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>>         [2] [5]https://groups.google.com/d/msgid/pandoc-discuss/
>>         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
>>         [3] [6]https://groups.google.com/d/msgid/pandoc-discuss/
>>         Y07VnbuRsuqUg8US%40localhost
>>         [4] [7]https://groups.google.com/d/msgid/pandoc-discuss/
>>         7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
>>         [5] [8]https://groups.google.com/d/msgid/pandoc-discuss/
>>         Y07ji07FFokQdOR%2B%40localhost
>>         [6] [9]mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>         [7] [10]https://groups.google.com/d/msgid/pandoc-discuss/
>>         D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&
>>         utm_source=footer
>>
>>     --
>>     You received this message because you are subscribed to the 
>> Google Groups
>>     "pandoc-discuss" group.
>>     To unsubscribe from this group and stop receiving emails from it, 
>> send an
>>     email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>     To view this discussion on the web visit 
>> [11]https://groups.google.com/d/
>>     msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.
>>
>> --
>> You received this message because you are subscribed to the Google 
>> Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email
>> to [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> [13]https://groups.google.com/d/msgid/
>> pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.
>>
>> References:
>>
>> [1] https://groups.google.com/d
>> [2] https://groups.google.com/d/
>> [3] https://groups.google.com/d/msgid/
>> [4] 
>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>> [5] 
>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
>> [6] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
>> [7] 
>> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
>> [8] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
>> [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> [10] 
>> https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer
>> [11] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost
>> [12] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> [13] 
>> https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com?utm_medium=email&utm_source=footer
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/Y1BsCdqttFxOi/pa%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6%40gmail.com.


  reply	other threads:[~2022-10-19 22:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17 18:25 Bernardo C. D. A. Vasconcelos
     [not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-10-17 18:38   ` BPJ
     [not found]     ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-10-18 15:36       ` Bernardo C.D.A. Vasconcelos
     [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 16:34           ` Bastien DUMONT
2022-10-18 17:16             ` Bernardo C.D.A. Vasconcelos
     [not found]               ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 17:34                 ` Bastien DUMONT
2022-10-18 21:43                   ` Bernardo C.D.A. Vasconcelos
     [not found]                     ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 22:06                       ` Bastien DUMONT
2022-10-19 19:50                         ` Bernardo C.D.A. Vasconcelos
     [not found]                           ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-19 21:28                             ` Bastien DUMONT
2022-10-19 22:43                               ` Bernardo C.D.A. Vasconcelos [this message]
     [not found]                                 ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-20  7:16                                   ` Bastien DUMONT
2022-10-18 18:42           ` BPJ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6@gmail.com \
    --to=bernardovasconcelos-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).