I have found this little script that takes me nearly there:

local vars = {}

function Meta(meta)
    for k, v in pairs(meta) do
        vars["%" .. k .. "%"] = v
    end
end

function Str(elem)
    if vars[elem.text] then
        return vars[elem.text]
    else
        return elem
    end
end

return {
    { Meta = Meta },
    { Str  = Str  }
}

Instead, we would use: meta.glossary.entries. The crux for me is looping through the list of entries, adding all the values of the to_match field (a.k.a. known forms) (of each entry) to vars as a key with the content of some other field (e.g. glslink) as value. E.g. vars[ .. entry.to_match.each .. ] = entry.glslink.

On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:

Yes, it could! You would have access to the corresponding metadata object in the AST.

Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. Vasconcelos a écrit :

The data is mostly in database format and could be output in the best format
for the task, but I wanted to make it friendly for other people to use as well.
Could a YAML metadata block be a solution?

glossary:
glossary_lang: grc
entries:
- headword: ἀγαθός
text: "□ *pt.* bom; □ *en.* good; and so on and so forth"
match:
- γαθέ
- γαθοί
- κἀγάθ
- κἀγαθά
- κἀγαθάς
- κἀγαθή
- κἀγαθήν
- κἀγαθαί
- κἀγαθοί
- κἀγαθος
- headword: ἀγαπᾶν
transliteration: agapan
text: "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied, like;"
match:
- ἀγάπα
- ἀγάπαις
- ἀγάπη
- ἀγάπην
- ἀγάπης
- ἀγάπῃ
- ἀγαπᾶ
- ἀγαπᾶν
- ἀγαπᾶς

On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:

No, citeproc receives a data structure produced by pandoc. Pandoc is
responsible for the parsing. I think that your script would not be so hard
to rewrite in Lua, the main problem is to know if you can achieve your
goals this way. If your main concern is portability, then writing a Lua
filter with no dependancies certainly is a good solution provided that you
feed it with a Lua data structure (or embed the code responsible for JSON
parsing in your script).

Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos a
écrit :

Thank you for the suggestions, Bastien. There is technically no need
for
regex, as all the forms are spelled out to avoid the need to create ad
hoc
regex rules for each term. Now that I think about it, the principle is
the
same as Citeproc's: a tagged inline element will be matched against a
lookup
table and replaced. I will look at the citeproc code to see if it leads
anywhere or if it could be reused in anyway.

On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:

Yes, but it is limited to this utf8 library. For instance, if
perform a
regexp search like `string.match('ἀγαθός', '[γδ]')`, it try to
match one
of the four bytes inside the square brackets against the string
'ἀγαθός', so it will return the first byte of γ, not γ. To
circumvent
this limitation, you would be forced to test γ and δ separately.
Nevertheless, if you always perform comparisons between whole
strings as
you currently do in your script, this should not be a problem.

As for your concern with dependancies, you most probably would have
to
rely on a JSON library such as lunajson. However, if your JSON
files are
not supposed to change, you could also convert them to a Lua file
using
a JSON library and a serialization library, so as to be able to
import
the resulting Lua data structure directly in your filter.

Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
Vasconcelos a
écrit :

As for translating the filter note that Lua can't really
handle
UTF-8.
There is some rudimentary support for converting codepoint
number ↔
UTF-8
byte sequences and for iterating through a string of bytes
representing
UTF-8 encoded characters but no concept of chars as opposed
to
bytes.
This
may become a show stopper if you need to manipulate strings
containing
UTF-8 text.

Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards
includes
UTF-8 support. Have you seen it? E.g. [1]https://
q-syshelp.qsc.com/Content/Control_Scripting/
Lua_5.3_Reference_Manual/Standard_Libraries/
4_-_Basic_UTF-8_Support.htm

For Ancient Greek you want grc as the language tag.

Indeed it is (and that is generally what I use), but ἀγαθός is
just
Polytonic Greek, which is not the same as Ancient Greek.

--
You received this message because you are subscribed to the
Google
Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from
it,
send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.orgm.
To view this discussion on the web visit [2]https://
groups.google.com/d/msgid/pandoc-discuss/
3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.

--
You received this message because you are subscribed to the Google
Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it,
send
an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [3]https://
groups.google.com/d/msgid/pandoc-discuss/
Y07VnbuRsuqUg8US%40localhost.

--
You received this message because you are subscribed to the Google
Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [4]https://groups.google.com/d
/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.

--
You received this message because you are subscribed to the Google Groups
"pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [5]https://groups.google.com/d/
msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.

--
You received this message because you are subscribed to the Google Groups
"pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit [7]https://groups.google.com/d/msgid/
pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.

References:

[1] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
[2] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
[3] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
[4] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
[5] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
[6] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
[7] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.