From: "Bernardo C.D.A. Vasconcelos" <bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Glossary Filter for MD2Tex
Date: Wed, 19 Oct 2022 16:50:25 -0300 [thread overview]
Message-ID: <B93B3CA7-A461-4056-929D-592B578B184F@gmail.com> (raw)
In-Reply-To: <Y08jckNrIpxbW6nR@localhost>
[-- Attachment #1: Type: text/plain, Size: 10330 bytes --]
I have found this little script that takes me nearly there:
```
local vars = {}
function Meta(meta)
for k, v in pairs(meta) do
vars["%" .. k .. "%"] = v
end
end
function Str(elem)
if vars[elem.text] then
return vars[elem.text]
else
return elem
end
end
return {
{ Meta = Meta },
{ Str = Str }
}
```
Instead, we would use: `meta.glossary.entries`. The crux for me is
looping through the list of entries, adding all the values of the
`to_match` field (a.k.a. known forms) (of each entry) to `vars` as a key
with the content of some other field (e.g. `glslink`) as value. E.g.
`vars[ .. entry.to_match.each .. ] = entry.glslink`.
On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:
> Yes, it could! You would have access to the corresponding metadata
> object in the AST.
>
> Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. Vasconcelos
> a écrit :
>> The data is mostly in database format and could be output in the best
>> format
>> for the task, but I wanted to make it friendly for other people to
>> use as well.
>> Could a YAML metadata block be a solution?
>>
>> glossary:
>> glossary_lang: grc
>> entries:
>> - headword: ἀγαθός
>> text: "□ *pt.* bom; □ *en.* good; and so on and so forth"
>> match:
>> - γαθέ
>> - γαθοί
>> - κἀγάθ
>> - κἀγαθά
>> - κἀγαθάς
>> - κἀγαθή
>> - κἀγαθήν
>> - κἀγαθαί
>> - κἀγαθοί
>> - κἀγαθος
>> - headword: ἀγαπᾶν
>> transliteration: agapan
>> text: "□ *pt.* estar satisfeito, gostar; □ *en.* be
>> satisfied, like;"
>> match:
>> - ἀγάπα
>> - ἀγάπαις
>> - ἀγάπη
>> - ἀγάπην
>> - ἀγάπης
>> - ἀγάπῃ
>> - ἀγαπᾶ
>> - ἀγαπᾶν
>> - ἀγαπᾶς
>>
>> On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
>>
>> No, citeproc receives a data structure produced by pandoc. Pandoc
>> is
>> responsible for the parsing. I think that your script would not
>> be so hard
>> to rewrite in Lua, the main problem is to know if you can achieve
>> your
>> goals this way. If your main concern is portability, then writing
>> a Lua
>> filter with no dependancies certainly is a good solution provided
>> that you
>> feed it with a Lua data structure (or embed the code responsible
>> for JSON
>> parsing in your script).
>>
>> Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A.
>> Vasconcelos a
>> écrit :
>>
>> Thank you for the suggestions, Bastien. There is technically
>> no need
>> for
>> regex, as all the forms are spelled out to avoid the need to
>> create ad
>> hoc
>> regex rules for each term. Now that I think about it, the
>> principle is
>> the
>> same as Citeproc's: a tagged inline element will be matched
>> against a
>> lookup
>> table and replaced. I will look at the citeproc code to see
>> if it leads
>> anywhere or if it could be reused in anyway.
>>
>> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
>>
>> Yes, but it is limited to this utf8 library. For
>> instance, if
>> perform a
>> regexp search like `string.match('ἀγαθός',
>> '[γδ]')`, it try to
>> match one
>> of the four bytes inside the square brackets against the
>> string
>> 'ἀγαθός', so it will return the first byte of γ,
>> not γ. To
>> circumvent
>> this limitation, you would be forced to test γ and δ
>> separately.
>> Nevertheless, if you always perform comparisons between
>> whole
>> strings as
>> you currently do in your script, this should not be a
>> problem.
>>
>> As for your concern with dependancies, you most probably
>> would have
>> to
>> rely on a JSON library such as lunajson. However, if your
>> JSON
>> files are
>> not supposed to change, you could also convert them to a
>> Lua file
>> using
>> a JSON library and a serialization library, so as to be
>> able to
>> import
>> the resulting Lua data structure directly in your filter.
>>
>> Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
>> Vasconcelos a
>> écrit :
>>
>> As for translating the filter note that Lua can't
>> really
>> handle
>> UTF-8.
>> There is some rudimentary support for converting
>> codepoint
>> number ↔
>> UTF-8
>> byte sequences and for iterating through a string
>> of bytes
>> representing
>> UTF-8 encoded characters but no concept of chars
>> as opposed
>> to
>> bytes.
>> This
>> may become a show stopper if you need to
>> manipulate strings
>> containing
>> UTF-8 text.
>>
>> Thanks, @BPJ, for the explanation. Apparently, Lua
>> 5.3 onwards
>> includes
>> UTF-8 support. Have you seen it? E.g. [1]https://
>> q-syshelp.qsc.com/Content/Control_Scripting/
>> Lua_5.3_Reference_Manual/Standard_Libraries/
>> 4_-_Basic_UTF-8_Support.htm
>>
>> For Ancient Greek you want grc as the language
>> tag.
>>
>> Indeed it is (and that is generally what I use), but
>> ἀγαθός is
>> just
>> Polytonic Greek, which is not the same as Ancient
>> Greek.
>>
>> --
>> You received this message because you are subscribed
>> to the
>> Google
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving
>> emails from
>> it,
>> send an email to
>> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit [2]https://
>> groups.google.com/d/msgid/pandoc-discuss/
>> 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>>
>> --
>> You received this message because you are subscribed to
>> the Google
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails
>> from it,
>> send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit [3]https://
>> groups.google.com/d/msgid/pandoc-discuss/
>> Y07VnbuRsuqUg8US%40localhost.
>>
>> --
>> You received this message because you are subscribed to the
>> Google
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from
>> it, send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> [4]https://groups.google.com/d
>> /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> [5]https://groups.google.com/d/
>> msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email
>> to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> [7]https://groups.google.com/d/msgid/
>> pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
>>
>> References:
>>
>> [1]
>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>> [2]
>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
>> [3]
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
>> [4]
>> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
>> [5]
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
>> [6] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> [7]
>> https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer
>
> --
> You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.
[-- Attachment #2: Type: text/html, Size: 12665 bytes --]
next prev parent reply other threads:[~2022-10-19 19:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-17 18:25 Bernardo C. D. A. Vasconcelos
[not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-10-17 18:38 ` BPJ
[not found] ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-10-18 15:36 ` Bernardo C.D.A. Vasconcelos
[not found] ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 16:34 ` Bastien DUMONT
2022-10-18 17:16 ` Bernardo C.D.A. Vasconcelos
[not found] ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 17:34 ` Bastien DUMONT
2022-10-18 21:43 ` Bernardo C.D.A. Vasconcelos
[not found] ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 22:06 ` Bastien DUMONT
2022-10-19 19:50 ` Bernardo C.D.A. Vasconcelos [this message]
[not found] ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-19 21:28 ` Bastien DUMONT
2022-10-19 22:43 ` Bernardo C.D.A. Vasconcelos
[not found] ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-20 7:16 ` Bastien DUMONT
2022-10-18 18:42 ` BPJ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=B93B3CA7-A461-4056-929D-592B578B184F@gmail.com \
--to=bernardovasconcelos-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).