The data is mostly in database format and could be output in the best format for the task, but I wanted to make it friendly for other people to use as well. Could a YAML metadata block be a solution? ```yaml glossary: glossary_lang: grc entries: - headword: ἀγαθός text: "□ *pt.* bom; □ *en.* good; and so on and so forth" match: - γαθέ - γαθοί - κἀγάθ - κἀγαθά - κἀγαθάς - κἀγαθή - κἀγαθήν - κἀγαθαί - κἀγαθοί - κἀγαθος - headword: ἀγαπᾶν transliteration: agapan text: "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied, like;" match: - ἀγάπα - ἀγάπαις - ἀγάπη - ἀγάπην - ἀγάπης - ἀγάπῃ - ἀγαπᾶ - ἀγαπᾶν - ἀγαπᾶς ``` On 18 Oct 2022, at 14:34, Bastien DUMONT wrote: > No, citeproc receives a data structure produced by pandoc. Pandoc is > responsible for the parsing. I think that your script would not be so > hard to rewrite in Lua, the main problem is to know if you can achieve > your goals this way. If your main concern is portability, then writing > a Lua filter with no dependancies certainly is a good solution > provided that you feed it with a Lua data structure (or embed the code > responsible for JSON parsing in your script). > > Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos > a écrit : >> Thank you for the suggestions, Bastien. There is technically no need >> for >> regex, as all the forms are spelled out to avoid the need to create >> ad hoc >> regex rules for each term. Now that I think about it, the principle >> is the >> same as Citeproc's: a tagged inline element will be matched against a >> lookup >> table and replaced. I will look at the citeproc code to see if it >> leads >> anywhere or if it could be reused in anyway. >> >> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: >> >>> Yes, but it is limited to this utf8 library. For instance, if >>> perform a >>> regexp search like `string.match('ἀγαθός', '[γδ]')`, it try >>> to match one >>> of the four bytes inside the square brackets against the string >>> 'ἀγαθός', so it will return the first byte of γ, not γ. To >>> circumvent >>> this limitation, you would be forced to test γ and δ separately. >>> Nevertheless, if you always perform comparisons between whole >>> strings as >>> you currently do in your script, this should not be a problem. >>> >>> As for your concern with dependancies, you most probably would have >>> to >>> rely on a JSON library such as lunajson. However, if your JSON files >>> are >>> not supposed to change, you could also convert them to a Lua file >>> using >>> a JSON library and a serialization library, so as to be able to >>> import >>> the resulting Lua data structure directly in your filter. >>> >>> Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. >>> Vasconcelos a >>> écrit : >>>>> As for translating the filter note that Lua can't really handle >>>>> UTF-8. >>>>> There is some rudimentary support for converting codepoint >>>>> number ↔ >>>>> UTF-8 >>>>> byte sequences and for iterating through a string of bytes >>>>> representing >>>>> UTF-8 encoded characters but no concept of chars as opposed to >>>>> bytes. >>>>> This >>>>> may become a show stopper if you need to manipulate strings >>>>> containing >>>>> UTF-8 text. >>>> >>>> >>>> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards >>>> includes >>>> UTF-8 support. Have you seen it? E.g. >>>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm >>>> >>>>> For Ancient Greek you want grc as the language tag. >>>>> >>>> >>>> Indeed it is (and that is generally what I use), but ἀγαθός >>>> is just >>>> Polytonic Greek, which is not the same as Ancient Greek. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, >>>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, >>> send >>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost. >> >> -- >> You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, >> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com. > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.