public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Glossary Filter for MD2Tex
Date: Thu, 20 Oct 2022 07:16:20 +0000	[thread overview]
Message-ID: <Y1D1xMX37opBqnii@localhost> (raw)
In-Reply-To: <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Sure! Let's say that it is under the MIT license, like the filters in the official repo.

Le Wednesday 19 October 2022 à 07:43:36PM, Bernardo C.D.A. Vasconcelos a écrit :
> Bastien, the only work that I was left with is to say thank you very much. I
> did some simple testing, and it seems quite elegant. Do I have your
> permission to share it with others later, giving proper attribution?
> 
> On 19 Oct 2022, at 18:28, Bastien DUMONT wrote:
> 
> > I think that the attached script could be a good starting point.
> > 
> > Le Wednesday 19 October 2022 à 04:50:25PM, Bernardo C.D.A. Vasconcelos a
> > écrit :
> > > I have found this little script that takes me nearly there:
> > > 
> > > local vars = {}
> > > 
> > > function Meta(meta)
> > >     for k, v in pairs(meta) do
> > >         vars["%" .. k .. "%"] = v
> > >     end
> > > end
> > > 
> > > function Str(elem)
> > >     if vars[elem.text] then
> > >         return vars[elem.text]
> > >     else
> > >         return elem
> > >     end
> > > end
> > > 
> > > return {
> > >     { Meta = Meta },
> > >     { Str  = Str  }
> > > }
> > > 
> > > 
> > > Instead, we would use: meta.glossary.entries. The crux for me is
> > > looping
> > > through the list of entries, adding all the values of the to_match
> > > field
> > > (a.k.a. known forms) (of each entry) to vars as a key with the
> > > content of some
> > > other field (e.g. glslink) as value. E.g. vars[ ..
> > > entry.to_match.each .. ] =
> > > entry.glslink.
> > > 
> > > On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:
> > > 
> > >     Yes, it could! You would have access to the corresponding
> > > metadata object
> > >     in the AST.
> > > 
> > >     Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A.
> > > Vasconcelos a
> > >     écrit :
> > > 
> > >         The data is mostly in database format and could be output in
> > > the best
> > >         format
> > >         for the task, but I wanted to make it friendly for other
> > > people to use
> > >         as well.
> > >         Could a YAML metadata block be a solution?
> > > 
> > >         glossary:
> > >         glossary_lang: grc
> > >         entries:
> > >         - headword: ἀγαθός
> > >         text: "□ *pt.* bom; □ *en.* good; and so on and so forth"
> > >         match:
> > >         - γαθέ
> > >         - γαθοί
> > >         - κἀγάθ
> > >         - κἀγαθά
> > >         - κἀγαθάς
> > >         - κἀγαθή
> > >         - κἀγαθήν
> > >         - κἀγαθαί
> > >         - κἀγαθοί
> > >         - κἀγαθος
> > >         - headword: ἀγαπᾶν
> > >         transliteration: agapan
> > >         text: "□ *pt.* estar satisfeito, gostar; □ *en.* be
> > > satisfied, like;"
> > >         match:
> > >         - ἀγάπα
> > >         - ἀγάπαις
> > >         - ἀγάπη
> > >         - ἀγάπην
> > >         - ἀγάπης
> > >         - ἀγάπῃ
> > >         - ἀγαπᾶ
> > >         - ἀγαπᾶν
> > >         - ἀγαπᾶς
> > > 
> > >         On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
> > > 
> > >         No, citeproc receives a data structure produced by pandoc.
> > > Pandoc is
> > >         responsible for the parsing. I think that your script would
> > > not be so
> > >         hard
> > >         to rewrite in Lua, the main problem is to know if you can
> > > achieve your
> > >         goals this way. If your main concern is portability, then
> > > writing a Lua
> > >         filter with no dependancies certainly is a good solution
> > > provided that
> > >         you
> > >         feed it with a Lua data structure (or embed the code
> > > responsible for
> > >         JSON
> > >         parsing in your script).
> > > 
> > >         Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A.
> > > Vasconcelos a
> > >         écrit :
> > > 
> > >         Thank you for the suggestions, Bastien. There is technically
> > > no need
> > >         for
> > >         regex, as all the forms are spelled out to avoid the need to
> > > create ad
> > >         hoc
> > >         regex rules for each term. Now that I think about it, the
> > > principle is
> > >         the
> > >         same as Citeproc's: a tagged inline element will be matched
> > > against a
> > >         lookup
> > >         table and replaced. I will look at the citeproc code to see
> > > if it leads
> > >         anywhere or if it could be reused in anyway.
> > > 
> > >         On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
> > > 
> > >         Yes, but it is limited to this utf8 library. For instance, if
> > >         perform a
> > >         regexp search like `string.match('ἀγαθός', '[γδ]')`, it try
> > > to
> > >         match one
> > >         of the four bytes inside the square brackets against the
> > > string
> > >         'ἀγαθός', so it will return the first byte of γ, not γ. To
> > >         circumvent
> > >         this limitation, you would be forced to test γ and δ
> > > separately.
> > >         Nevertheless, if you always perform comparisons between whole
> > >         strings as
> > >         you currently do in your script, this should not be a
> > > problem.
> > > 
> > >         As for your concern with dependancies, you most probably
> > > would have
> > >         to
> > >         rely on a JSON library such as lunajson. However, if your
> > > JSON
> > >         files are
> > >         not supposed to change, you could also convert them to a Lua
> > > file
> > >         using
> > >         a JSON library and a serialization library, so as to be able
> > > to
> > >         import
> > >         the resulting Lua data structure directly in your filter.
> > > 
> > >         Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
> > >         Vasconcelos a
> > >         écrit :
> > > 
> > >         As for translating the filter note that Lua can't really
> > >         handle
> > >         UTF-8.
> > >         There is some rudimentary support for converting codepoint
> > >         number ↔
> > >         UTF-8
> > >         byte sequences and for iterating through a string of bytes
> > >         representing
> > >         UTF-8 encoded characters but no concept of chars as opposed
> > >         to
> > >         bytes.
> > >         This
> > >         may become a show stopper if you need to manipulate strings
> > >         containing
> > >         UTF-8 text.
> > > 
> > >         Thanks, @BPJ, for the explanation. Apparently, Lua 5.3
> > > onwards
> > >         includes
> > >         UTF-8 support. Have you seen it? E.g. [1]https://
> > >         q-syshelp.qsc.com/Content/Control_Scripting/
> > >         Lua_5.3_Reference_Manual/Standard_Libraries/
> > >         4_-_Basic_UTF-8_Support.htm
> > > 
> > >         For Ancient Greek you want grc as the language tag.
> > > 
> > >         Indeed it is (and that is generally what I use), but ἀγαθός
> > > is
> > >         just
> > >         Polytonic Greek, which is not the same as Ancient Greek.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > >         Google
> > >         Groups "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails from
> > >         it,
> > >         send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [2]https://
> > >         groups.google.com/d/msgid/pandoc-discuss/
> > >         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it,
> > >         send
> > >         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [3]https://
> > >         groups.google.com/d/msgid/pandoc-discuss/
> > >         Y07VnbuRsuqUg8US%40localhost.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it, send
> > >         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [4][1]https://
> > >         groups.google.com/d
> > >         /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups
> > >         "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it, send
> > >         an
> > >         email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [5][2]https://
> > >         groups.google.com/d/
> > >         msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups
> > >         "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it, send
> > >         an email
> > >         to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [7][3]https://
> > >         groups.google.com/d/msgid/
> > >         pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
> > > 
> > >         References:
> > > 
> > >         [1] [4]https://q-syshelp.qsc.com/Content/Control_Scripting/
> > >         Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> > >         [2] [5]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
> > >         [3] [6]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         Y07VnbuRsuqUg8US%40localhost
> > >         [4] [7]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
> > >         [5] [8]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         Y07ji07FFokQdOR%2B%40localhost
> > >         [6] [9]mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > >         [7] [10]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&
> > >         utm_source=footer
> > > 
> > >     --
> > >     You received this message because you are subscribed to the
> > > Google Groups
> > >     "pandoc-discuss" group.
> > >     To unsubscribe from this group and stop receiving emails from
> > > it, send an
> > >     email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >     To view this discussion on the web visit
> > > [11]https://groups.google.com/d/
> > >     msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.
> > > 
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups
> > > "pandoc-discuss" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email
> > > to [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > > To view this discussion on the web visit
> > > [13]https://groups.google.com/d/msgid/
> > > pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.
> > > 
> > > References:
> > > 
> > > [1] https://groups.google.com/d
> > > [2] https://groups.google.com/d/
> > > [3] https://groups.google.com/d/msgid/
> > > [4] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> > > [5] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
> > > [6] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
> > > [7] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
> > > [8] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
> > > [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > > [10] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer
> > > [11] https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost
> > > [12] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > > [13] https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com?utm_medium=email&utm_source=footer
> > 
> > -- 
> > You received this message because you are subscribed to the Google
> > Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y1BsCdqttFxOi/pa%40localhost.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6%40gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y1D1xMX37opBqnii%40localhost.


  parent reply	other threads:[~2022-10-20  7:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17 18:25 Bernardo C. D. A. Vasconcelos
     [not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-10-17 18:38   ` BPJ
     [not found]     ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-10-18 15:36       ` Bernardo C.D.A. Vasconcelos
     [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 16:34           ` Bastien DUMONT
2022-10-18 17:16             ` Bernardo C.D.A. Vasconcelos
     [not found]               ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 17:34                 ` Bastien DUMONT
2022-10-18 21:43                   ` Bernardo C.D.A. Vasconcelos
     [not found]                     ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 22:06                       ` Bastien DUMONT
2022-10-19 19:50                         ` Bernardo C.D.A. Vasconcelos
     [not found]                           ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-19 21:28                             ` Bastien DUMONT
2022-10-19 22:43                               ` Bernardo C.D.A. Vasconcelos
     [not found]                                 ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-20  7:16                                   ` Bastien DUMONT [this message]
2022-10-18 18:42           ` BPJ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1D1xMX37opBqnii@localhost \
    --to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).