The data is mostly in database format and could be output in the best 
format for the task, but I wanted to make it friendly for other people 
to use as well. Could a YAML metadata block be a solution?

```yaml
glossary:
   glossary_lang: grc
   entries:
   - headword: ἀγαθός
     text: "□ *pt.* bom;  □ *en.* good; and so on and so forth"
     match:
     - γαθέ
     - γαθοί
     - κἀγάθ
     - κἀγαθά
     - κἀγαθάς
     - κἀγαθή
     - κἀγαθήν
     - κἀγαθαί
     - κἀγαθοί
     - κἀγαθος
   - headword: ἀγαπᾶν
     transliteration: agapan
     text: "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied, 
like;"
     match:
     - ἀγάπα
     - ἀγάπαις
     - ἀγάπη
     - ἀγάπην
     - ἀγάπης
     - ἀγάπῃ
     - ἀγαπᾶ
     - ἀγαπᾶν
     - ἀγαπᾶς
```



On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:

> No, citeproc receives a data structure produced by pandoc. Pandoc is 
> responsible for the parsing. I think that your script would not be so 
> hard to rewrite in Lua, the main problem is to know if you can achieve 
> your goals this way. If your main concern is portability, then writing 
> a Lua filter with no dependancies certainly is a good solution 
> provided that you feed it with a Lua data structure (or embed the code 
> responsible for JSON parsing in your script).
>
> Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos 
> a écrit :
>> Thank you for the suggestions, Bastien. There is technically no need 
>> for
>> regex, as all the forms are spelled out to avoid the need to create 
>> ad hoc
>> regex rules for each term. Now that I think about it, the principle 
>> is the
>> same as Citeproc's: a tagged inline element will be matched against a 
>> lookup
>> table and replaced. I will look at the citeproc code to see if it 
>> leads
>> anywhere or if it could be reused in anyway.
>>
>> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
>>
>>> Yes, but it is limited to this utf8 library. For instance, if 
>>> perform a
>>> regexp search like `string.match('ἀγαθός', '[γδ]')`, it try 
>>> to match one
>>> of the four bytes inside the square brackets against the string
>>> 'ἀγαθός', so it will return the first byte of γ, not γ. To 
>>> circumvent
>>> this limitation, you would be forced to test γ and δ separately.
>>> Nevertheless, if you always perform comparisons between whole 
>>> strings as
>>> you currently do in your script, this should not be a problem.
>>>
>>> As for your concern with dependancies, you most probably would have 
>>> to
>>> rely on a JSON library such as lunajson. However, if your JSON files 
>>> are
>>> not supposed to change, you could also convert them to a Lua file 
>>> using
>>> a JSON library and a serialization library, so as to be able to 
>>> import
>>> the resulting Lua data structure directly in your filter.
>>>
>>> Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. 
>>> Vasconcelos a
>>> écrit :
>>>>> As for translating the filter note that Lua can't really handle
>>>>> UTF-8.
>>>>> There is some rudimentary support for converting codepoint
>>>>> number ↔
>>>>> UTF-8
>>>>> byte sequences and for iterating through a string of bytes
>>>>> representing
>>>>> UTF-8 encoded characters but no concept of chars as opposed to
>>>>> bytes.
>>>>> This
>>>>> may become a show stopper if you need to manipulate strings
>>>>> containing
>>>>> UTF-8 text.
>>>>
>>>>
>>>> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards
>>>> includes
>>>> UTF-8 support. Have you seen it? E.g. 
>>>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>>>>
>>>>> For Ancient Greek you want grc as the language tag.
>>>>>
>>>>
>>>> Indeed it is (and that is generally what I use), but ἀγαθός 
>>>> is just
>>>> Polytonic Greek, which is not the same as Ancient Greek.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "pandoc-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it,
>>>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, 
>>> send
>>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.
>>
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.