I have found this little script that takes me nearly there: ``` local vars = {} function Meta(meta) for k, v in pairs(meta) do vars["%" .. k .. "%"] = v end end function Str(elem) if vars[elem.text] then return vars[elem.text] else return elem end end return { { Meta = Meta }, { Str = Str } } ``` Instead, we would use: `meta.glossary.entries`. The crux for me is looping through the list of entries, adding all the values of the `to_match` field (a.k.a. known forms) (of each entry) to `vars` as a key with the content of some other field (e.g. `glslink`) as value. E.g. `vars[ .. entry.to_match.each .. ] = entry.glslink`. On 18 Oct 2022, at 19:06, Bastien DUMONT wrote: > Yes, it could! You would have access to the corresponding metadata > object in the AST. > > Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. Vasconcelos > a écrit : >> The data is mostly in database format and could be output in the best >> format >> for the task, but I wanted to make it friendly for other people to >> use as well. >> Could a YAML metadata block be a solution? >> >> glossary: >> glossary_lang: grc >> entries: >> - headword: ἀγαθός >> text: "□ *pt.* bom; □ *en.* good; and so on and so forth" >> match: >> - γαθέ >> - γαθοί >> - κἀγάθ >> - κἀγαθά >> - κἀγαθάς >> - κἀγαθή >> - κἀγαθήν >> - κἀγαθαί >> - κἀγαθοί >> - κἀγαθος >> - headword: ἀγαπᾶν >> transliteration: agapan >> text: "□ *pt.* estar satisfeito, gostar; □ *en.* be >> satisfied, like;" >> match: >> - ἀγάπα >> - ἀγάπαις >> - ἀγάπη >> - ἀγάπην >> - ἀγάπης >> - ἀγάπῃ >> - ἀγαπᾶ >> - ἀγαπᾶν >> - ἀγαπᾶς >> >> On 18 Oct 2022, at 14:34, Bastien DUMONT wrote: >> >> No, citeproc receives a data structure produced by pandoc. Pandoc >> is >> responsible for the parsing. I think that your script would not >> be so hard >> to rewrite in Lua, the main problem is to know if you can achieve >> your >> goals this way. If your main concern is portability, then writing >> a Lua >> filter with no dependancies certainly is a good solution provided >> that you >> feed it with a Lua data structure (or embed the code responsible >> for JSON >> parsing in your script). >> >> Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. >> Vasconcelos a >> écrit : >> >> Thank you for the suggestions, Bastien. There is technically >> no need >> for >> regex, as all the forms are spelled out to avoid the need to >> create ad >> hoc >> regex rules for each term. Now that I think about it, the >> principle is >> the >> same as Citeproc's: a tagged inline element will be matched >> against a >> lookup >> table and replaced. I will look at the citeproc code to see >> if it leads >> anywhere or if it could be reused in anyway. >> >> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: >> >> Yes, but it is limited to this utf8 library. For >> instance, if >> perform a >> regexp search like `string.match('ἀγαθός', >> '[γδ]')`, it try to >> match one >> of the four bytes inside the square brackets against the >> string >> 'ἀγαθός', so it will return the first byte of γ, >> not γ. To >> circumvent >> this limitation, you would be forced to test γ and δ >> separately. >> Nevertheless, if you always perform comparisons between >> whole >> strings as >> you currently do in your script, this should not be a >> problem. >> >> As for your concern with dependancies, you most probably >> would have >> to >> rely on a JSON library such as lunajson. However, if your >> JSON >> files are >> not supposed to change, you could also convert them to a >> Lua file >> using >> a JSON library and a serialization library, so as to be >> able to >> import >> the resulting Lua data structure directly in your filter. >> >> Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. >> Vasconcelos a >> écrit : >> >> As for translating the filter note that Lua can't >> really >> handle >> UTF-8. >> There is some rudimentary support for converting >> codepoint >> number ↔ >> UTF-8 >> byte sequences and for iterating through a string >> of bytes >> representing >> UTF-8 encoded characters but no concept of chars >> as opposed >> to >> bytes. >> This >> may become a show stopper if you need to >> manipulate strings >> containing >> UTF-8 text. >> >> Thanks, @BPJ, for the explanation. Apparently, Lua >> 5.3 onwards >> includes >> UTF-8 support. Have you seen it? E.g. [1]https:// >> q-syshelp.qsc.com/Content/Control_Scripting/ >> Lua_5.3_Reference_Manual/Standard_Libraries/ >> 4_-_Basic_UTF-8_Support.htm >> >> For Ancient Greek you want grc as the language >> tag. >> >> Indeed it is (and that is generally what I use), but >> ἀγαθός is >> just >> Polytonic Greek, which is not the same as Ancient >> Greek. >> >> -- >> You received this message because you are subscribed >> to the >> Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving >> emails from >> it, >> send an email to >> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit [2]https:// >> groups.google.com/d/msgid/pandoc-discuss/ >> 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com. >> >> -- >> You received this message because you are subscribed to >> the Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails >> from it, >> send >> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit [3]https:// >> groups.google.com/d/msgid/pandoc-discuss/ >> Y07VnbuRsuqUg8US%40localhost. >> >> -- >> You received this message because you are subscribed to the >> Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from >> it, send >> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> [4]https://groups.google.com/d >> /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com. >> >> -- >> You received this message because you are subscribed to the >> Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, >> send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> [5]https://groups.google.com/d/ >> msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost. >> >> -- >> You received this message because you are subscribed to the Google >> Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, >> send an email >> to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> [7]https://groups.google.com/d/msgid/ >> pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com. >> >> References: >> >> [1] >> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm >> [2] >> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com >> [3] >> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost >> [4] >> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com >> [5] >> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost >> [6] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org >> [7] >> https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.