Hello everyone, I am curious if anyone would be willing to lend me a hand in (or give me directions) translating a small script from Ruby to Lua. The idea is this: we feed the filter a JSON string with the glossary data. The filter will check the JSON for each entry's `filter_match` and tag these accordingly in the text, pointing them to the correct glossary entry. It works as it is, but it has dependencies (which makes it harder to share), and it seems a bit slow (perhaps the logic I am applying is faulty). *JSON Example* ``` { "entries": [ { "title": "ἀγαθός", "subtitle": "□ *pt.* bom; □ *en.* good", "filter_match": ["γαθέ", "γαθοί", "κἀγάθ", "κἀγαθά", "κἀγαθάς", "κἀγαθή", "κἀγαθήν", "κἀγαθαί", "κἀγαθοί", "κἀγαθος", "κἀγαθούς", "κἀγαθοῖς", "κἀγαθοῦ", "κἀγαθόν", "κἀγαθός", "κἀγαθώ", "κἀγαθῆς", "κἀγαθῶν", "κἀγαθῶς", "κἀγαθῷ", "τἀγάθ", "τἀγαθά", "τἀγαθοῦ", "τἀγαθόν", "τἀγαθῇ", "τἀγαθῷ", "τὠγαθοῦ", "τὠγαθόν", "ἀγάθ", "ἀγάθων", "ἀγαθά", "ἀγαθάν", "ἀγαθάς", "ἀγαθέ", "ἀγαθή", "ἀγαθήν", "ἀγαθαί", "ἀγαθαῖν", "ἀγαθαῖς", "ἀγαθαῖσιν", "ἀγαθοί", "ἀγαθούς", "ἀγαθοῖν", "ἀγαθοῖο", "ἀγαθοῖς", "ἀγαθοῖσι", "ἀγαθοῖσιν", "ἀγαθοῦ", "ἀγαθόν", "ἀγαθός", "ἀγαθώ", "ἀγαθᾶν", "ἀγαθᾶς", "ἀγαθᾷ", "ἀγαθῆισι", "ἀγαθῆισιν", "ἀγαθῆς", "ἀγαθῇ", "ἀγαθῇσι", "ἀγαθῇσιν", "ἀγαθῶ", "ἀγαθῶι", "ἀγαθῶν", "ἀγαθῶς", "ἀγαθῷ", "ἁγαθή", "ἁγαθαί", "ἁγαθοί", "ἁγαθός", "ὠγαθέ", "ὦγαθ", "ὦγαθε"], "transliteration": "agathos", }, { "title": "ἀγαπᾶν", "subtitle": "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied, like;", "filter_match": ["ἀγάπα", "ἀγάπαις", "ἀγάπη", "ἀγάπην", "ἀγάπης", "ἀγάπῃ", "ἀγαπᾶ", "ἀγαπᾶν", "ἀγαπᾶς", "ἀγαπᾷ", "ἀγαπᾷν", "ἀγαπᾷς", "ἀγαπῇ", "ἀγαπῶν"], "transliteration": "agapan", } ] } ``` (I am using JSON here just because it seemed to make sense. Perhaps it would be interesting if we were pulling this data from the definitions list (with extended attributes) in the same document?) *The Ruby script* ``` #!/usr/bin/env ruby Encoding.default_internal = Encoding::UTF_8 Encoding.default_external = Encoding::UTF_8 require 'paru/filter' require 'json' GLOSSARY = JSON.parse(File.read("#{__dir__}/data.json"))['items'] Paru::Filter.run do with 'Span' do |p| next unless p.attr['lang'] == 'el' span_content = p.inner_markdown.nil? ? '' : p.inner_markdown.chomp result = GLOSSARY.select { |g| g['match'].include?(span_content) } unless span_content.nil? next unless result != [] p.inner_markdown = "\\index{#{result[0]['transliteration']}@#{result[0]['headword']}}\\glslink{#{result[0]['transliteration']}}{#{p.inner_markdown.chomp}}" log << result[0]['headword'] end end log_file.puts "Paru::Filter took #{Time.now - start_time}s.\n\n" log_file.puts "#{log.length} total entries (#{log.uniq.length} unique) were tagged:\n#{log.uniq.sort.join("\n")}\n\n" ``` So if my markdown input were: ``` Lorem, etc. [ἀγαθὸς]{lang=el} is a greek word. ``` The LaTeX output would be: ``` Lorem, etc.\\index{agathos@ἀγαθὸς}\\glslink{agathos}{ἀγαθὸς} is a greek word. ``` Please note that the glossary headword must be *agathos*, the transliterated form, instead of ἀγαθός, due to weird sorting issues with LaTeX. Any input is appreciated. Bernardo https://github.com/bcdavasconcelos -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com.