For Ancient Greek you want grc as the language tag.

As for translating the filter note that Lua can't really handle UTF-8.
There is some rudimentary support for converting codepoint number ↔ UTF-8
byte sequences and for iterating through a string of bytes representing
UTF-8 encoded characters but no concept of chars as opposed to bytes. This
may become a show stopper if you need to manipulate strings containing
UTF-8 text.

Den mån 17 okt. 2022 20:26Bernardo C. D. A. Vasconcelos <
bernardovasconcelos@gmail.com> skrev:

> Hello everyone,
>
> I am curious if anyone would be willing to lend me a hand in (or give me
> directions) translating a small script from Ruby to Lua. The idea is this:
> we feed the filter a JSON string with the glossary data. The filter will
> check the JSON for each entry's `filter_match` and tag these accordingly in
> the text, pointing them to the correct glossary entry. It works as it is,
> but it has dependencies (which makes it harder to share), and it seems a
> bit slow (perhaps the logic I am applying is faulty).
>
> *JSON Example*
>
> ```
> {
>   "entries": [
>     {
>     "title": "ἀγαθός",
>     "subtitle": "□ *pt.* bom;  □ *en.* good",
>     "filter_match": ["γαθέ", "γαθοί", "κἀγάθ", "κἀγαθά", "κἀγαθάς",
> "κἀγαθή", "κἀγαθήν", "κἀγαθαί", "κἀγαθοί", "κἀγαθος", "κἀγαθούς",
> "κἀγαθοῖς", "κἀγαθοῦ", "κἀγαθόν", "κἀγαθός", "κἀγαθώ", "κἀγαθῆς",
> "κἀγαθῶν", "κἀγαθῶς", "κἀγαθῷ", "τἀγάθ", "τἀγαθά", "τἀγαθοῦ", "τἀγαθόν",
> "τἀγαθῇ", "τἀγαθῷ", "τὠγαθοῦ", "τὠγαθόν", "ἀγάθ", "ἀγάθων", "ἀγαθά",
> "ἀγαθάν", "ἀγαθάς", "ἀγαθέ", "ἀγαθή", "ἀγαθήν", "ἀγαθαί", "ἀγαθαῖν",
> "ἀγαθαῖς", "ἀγαθαῖσιν", "ἀγαθοί", "ἀγαθούς", "ἀγαθοῖν", "ἀγαθοῖο",
> "ἀγαθοῖς", "ἀγαθοῖσι", "ἀγαθοῖσιν", "ἀγαθοῦ", "ἀγαθόν", "ἀγαθός", "ἀγαθώ",
> "ἀγαθᾶν", "ἀγαθᾶς", "ἀγαθᾷ", "ἀγαθῆισι", "ἀγαθῆισιν", "ἀγαθῆς", "ἀγαθῇ",
> "ἀγαθῇσι", "ἀγαθῇσιν", "ἀγαθῶ", "ἀγαθῶι", "ἀγαθῶν", "ἀγαθῶς", "ἀγαθῷ",
> "ἁγαθή", "ἁγαθαί", "ἁγαθοί", "ἁγαθός", "ὠγαθέ", "ὦγαθ", "ὦγαθε"],
>     "transliteration": "agathos",
>     },
>     {
>     "title": "ἀγαπᾶν",
>     "subtitle": "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied,
> like;",
>     "filter_match": ["ἀγάπα", "ἀγάπαις", "ἀγάπη", "ἀγάπην", "ἀγάπης",
> "ἀγάπῃ", "ἀγαπᾶ", "ἀγαπᾶν", "ἀγαπᾶς", "ἀγαπᾷ", "ἀγαπᾷν", "ἀγαπᾷς", "ἀγαπῇ",
> "ἀγαπῶν"],
>     "transliteration": "agapan",
>     }
>   ]
> }
> ```
> (I am using JSON here just because it seemed to make sense. Perhaps it
> would be interesting if we were pulling this data from the definitions list
> (with extended attributes) in the same document?)
>
> *The Ruby script*
>
> ```
> #!/usr/bin/env ruby
>
> Encoding.default_internal = Encoding::UTF_8
> Encoding.default_external = Encoding::UTF_8
>
> require 'paru/filter'
> require 'json'
>
> GLOSSARY = JSON.parse(File.read("#{__dir__}/data.json"))['items']
>
> Paru::Filter.run do
>   with 'Span' do |p|
>     next unless p.attr['lang'] == 'el'
>
>     span_content = p.inner_markdown.nil? ? '' : p.inner_markdown.chomp
>     result = GLOSSARY.select { |g| g['match'].include?(span_content) }
> unless span_content.nil?
>
>     next unless result != []
>
>     p.inner_markdown =
> "\\index{#{result[0]['transliteration']}@#{result[0]['headword']}}\\glslink{#{result[0]['transliteration']}}{#{p.inner_markdown.chomp}}"
>     log << result[0]['headword']
>   end
> end
>
> log_file.puts "Paru::Filter took #{Time.now - start_time}s.\n\n"
> log_file.puts "#{log.length} total entries (#{log.uniq.length} unique)
> were tagged:\n#{log.uniq.sort.join("\n")}\n\n"
> ```
>
> So if my markdown input were:
>
> ```
> Lorem, etc. [ἀγαθὸς]{lang=el} is a greek word.
> ```
>
> The LaTeX output would be:
>
> ```
> Lorem, etc.\\index{agathos@ἀγαθὸς}\\glslink{agathos}{ἀγαθὸς} is a greek
> word.
> ```
>
> Please note that the glossary headword must be *agathos*, the
> transliterated form, instead of ἀγαθός, due to weird sorting issues with
> LaTeX.
>
> Any input is appreciated.
>
> Bernardo
> https://github.com/bcdavasconcelos
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN%2BCF3SGA5mTLrc2As%2BR6rw%40mail.gmail.com.