public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Glossary Filter for MD2Tex
Date: Tue, 18 Oct 2022 17:34:03 +0000	[thread overview]
Message-ID: <Y07ji07FFokQdOR+@localhost> (raw)
In-Reply-To: <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

No, citeproc receives a data structure produced by pandoc. Pandoc is responsible for the parsing. I think that your script would not be so hard to rewrite in Lua, the main problem is to know if you can achieve your goals this way. If your main concern is portability, then writing a Lua filter with no dependancies certainly is a good solution provided that you feed it with a Lua data structure (or embed the code responsible for JSON parsing in your script).

Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos a écrit :
> Thank you for the suggestions, Bastien. There is technically no need for
> regex, as all the forms are spelled out to avoid the need to create ad hoc
> regex rules for each term. Now that I think about it, the principle is the
> same as Citeproc's: a tagged inline element will be matched against a lookup
> table and replaced. I will look at the citeproc code to see if it leads
> anywhere or if it could be reused in anyway.
> 
> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
> 
> > Yes, but it is limited to this utf8 library. For instance, if perform a
> > regexp search like `string.match('ἀγαθός', '[γδ]')`, it try to match one
> > of the four bytes inside the square brackets against the string
> > 'ἀγαθός', so it will return the first byte of γ, not γ. To circumvent
> > this limitation, you would be forced to test γ and δ separately.
> > Nevertheless, if you always perform comparisons between whole strings as
> > you currently do in your script, this should not be a problem.
> > 
> > As for your concern with dependancies, you most probably would have to
> > rely on a JSON library such as lunajson. However, if your JSON files are
> > not supposed to change, you could also convert them to a Lua file using
> > a JSON library and a serialization library, so as to be able to import
> > the resulting Lua data structure directly in your filter.
> > 
> > Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. Vasconcelos a
> > écrit :
> > > > As for translating the filter note that Lua can't really handle
> > > > UTF-8.
> > > > There is some rudimentary support for converting codepoint
> > > > number ↔
> > > > UTF-8
> > > > byte sequences and for iterating through a string of bytes
> > > > representing
> > > > UTF-8 encoded characters but no concept of chars as opposed to
> > > > bytes.
> > > > This
> > > > may become a show stopper if you need to manipulate strings
> > > > containing
> > > > UTF-8 text.
> > > 
> > > 
> > > Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards
> > > includes
> > > UTF-8 support. Have you seen it? E.g. https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> > > 
> > > > For Ancient Greek you want grc as the language tag.
> > > > 
> > > 
> > > Indeed it is (and that is generally what I use), but ἀγαθός is just
> > > Polytonic Greek, which is not the same as Ancient Greek.
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google
> > > Groups "pandoc-discuss" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
> > 
> > -- 
> > You received this message because you are subscribed to the Google
> > Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.


  parent reply	other threads:[~2022-10-18 17:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17 18:25 Bernardo C. D. A. Vasconcelos
     [not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-10-17 18:38   ` BPJ
     [not found]     ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-10-18 15:36       ` Bernardo C.D.A. Vasconcelos
     [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 16:34           ` Bastien DUMONT
2022-10-18 17:16             ` Bernardo C.D.A. Vasconcelos
     [not found]               ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 17:34                 ` Bastien DUMONT [this message]
2022-10-18 21:43                   ` Bernardo C.D.A. Vasconcelos
     [not found]                     ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 22:06                       ` Bastien DUMONT
2022-10-19 19:50                         ` Bernardo C.D.A. Vasconcelos
     [not found]                           ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-19 21:28                             ` Bastien DUMONT
2022-10-19 22:43                               ` Bernardo C.D.A. Vasconcelos
     [not found]                                 ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-20  7:16                                   ` Bastien DUMONT
2022-10-18 18:42           ` BPJ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y07ji07FFokQdOR+@localhost \
    --to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).