For Ancient Greek you want grc as the language tag. As for translating the filter note that Lua can't really handle UTF-8. There is some rudimentary support for converting codepoint number ↔ UTF-8 byte sequences and for iterating through a string of bytes representing UTF-8 encoded characters but no concept of chars as opposed to bytes. This may become a show stopper if you need to manipulate strings containing UTF-8 text. Den mån 17 okt. 2022 20:26Bernardo C. D. A. Vasconcelos < bernardovasconcelos@gmail.com> skrev: > Hello everyone, > > I am curious if anyone would be willing to lend me a hand in (or give me > directions) translating a small script from Ruby to Lua. The idea is this: > we feed the filter a JSON string with the glossary data. The filter will > check the JSON for each entry's `filter_match` and tag these accordingly in > the text, pointing them to the correct glossary entry. It works as it is, > but it has dependencies (which makes it harder to share), and it seems a > bit slow (perhaps the logic I am applying is faulty). > > *JSON Example* > > ``` > { > "entries": [ > { > "title": "ἀγαθός", > "subtitle": "□ *pt.* bom; □ *en.* good", > "filter_match": ["γαθέ", "γαθοί", "κἀγάθ", "κἀγαθά", "κἀγαθάς", > "κἀγαθή", "κἀγαθήν", "κἀγαθαί", "κἀγαθοί", "κἀγαθος", "κἀγαθούς", > "κἀγαθοῖς", "κἀγαθοῦ", "κἀγαθόν", "κἀγαθός", "κἀγαθώ", "κἀγαθῆς", > "κἀγαθῶν", "κἀγαθῶς", "κἀγαθῷ", "τἀγάθ", "τἀγαθά", "τἀγαθοῦ", "τἀγαθόν", > "τἀγαθῇ", "τἀγαθῷ", "τὠγαθοῦ", "τὠγαθόν", "ἀγάθ", "ἀγάθων", "ἀγαθά", > "ἀγαθάν", "ἀγαθάς", "ἀγαθέ", "ἀγαθή", "ἀγαθήν", "ἀγαθαί", "ἀγαθαῖν", > "ἀγαθαῖς", "ἀγαθαῖσιν", "ἀγαθοί", "ἀγαθούς", "ἀγαθοῖν", "ἀγαθοῖο", > "ἀγαθοῖς", "ἀγαθοῖσι", "ἀγαθοῖσιν", "ἀγαθοῦ", "ἀγαθόν", "ἀγαθός", "ἀγαθώ", > "ἀγαθᾶν", "ἀγαθᾶς", "ἀγαθᾷ", "ἀγαθῆισι", "ἀγαθῆισιν", "ἀγαθῆς", "ἀγαθῇ", > "ἀγαθῇσι", "ἀγαθῇσιν", "ἀγαθῶ", "ἀγαθῶι", "ἀγαθῶν", "ἀγαθῶς", "ἀγαθῷ", > "ἁγαθή", "ἁγαθαί", "ἁγαθοί", "ἁγαθός", "ὠγαθέ", "ὦγαθ", "ὦγαθε"], > "transliteration": "agathos", > }, > { > "title": "ἀγαπᾶν", > "subtitle": "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied, > like;", > "filter_match": ["ἀγάπα", "ἀγάπαις", "ἀγάπη", "ἀγάπην", "ἀγάπης", > "ἀγάπῃ", "ἀγαπᾶ", "ἀγαπᾶν", "ἀγαπᾶς", "ἀγαπᾷ", "ἀγαπᾷν", "ἀγαπᾷς", "ἀγαπῇ", > "ἀγαπῶν"], > "transliteration": "agapan", > } > ] > } > ``` > (I am using JSON here just because it seemed to make sense. Perhaps it > would be interesting if we were pulling this data from the definitions list > (with extended attributes) in the same document?) > > *The Ruby script* > > ``` > #!/usr/bin/env ruby > > Encoding.default_internal = Encoding::UTF_8 > Encoding.default_external = Encoding::UTF_8 > > require 'paru/filter' > require 'json' > > GLOSSARY = JSON.parse(File.read("#{__dir__}/data.json"))['items'] > > Paru::Filter.run do > with 'Span' do |p| > next unless p.attr['lang'] == 'el' > > span_content = p.inner_markdown.nil? ? '' : p.inner_markdown.chomp > result = GLOSSARY.select { |g| g['match'].include?(span_content) } > unless span_content.nil? > > next unless result != [] > > p.inner_markdown = > "\\index{#{result[0]['transliteration']}@#{result[0]['headword']}}\\glslink{#{result[0]['transliteration']}}{#{p.inner_markdown.chomp}}" > log << result[0]['headword'] > end > end > > log_file.puts "Paru::Filter took #{Time.now - start_time}s.\n\n" > log_file.puts "#{log.length} total entries (#{log.uniq.length} unique) > were tagged:\n#{log.uniq.sort.join("\n")}\n\n" > ``` > > So if my markdown input were: > > ``` > Lorem, etc. [ἀγαθὸς]{lang=el} is a greek word. > ``` > > The LaTeX output would be: > > ``` > Lorem, etc.\\index{agathos@ἀγαθὸς}\\glslink{agathos}{ἀγαθὸς} is a greek > word. > ``` > > Please note that the glossary headword must be *agathos*, the > transliterated form, instead of ἀγαθός, due to weird sorting issues with > LaTeX. > > Any input is appreciated. > > Bernardo > https://github.com/bcdavasconcelos > > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN%2BCF3SGA5mTLrc2As%2BR6rw%40mail.gmail.com.