Hello everyone,

I am curious if anyone would be willing to lend me a hand in (or give me 
directions) translating a small script from Ruby to Lua. The idea is this: 
we feed the filter a JSON string with the glossary data. The filter will 
check the JSON for each entry's `filter_match` and tag these accordingly in 
the text, pointing them to the correct glossary entry. It works as it is, 
but it has dependencies (which makes it harder to share), and it seems a 
bit slow (perhaps the logic I am applying is faulty). 

*JSON Example*

```
{
  "entries": [
    {
    "title": "ἀγαθός",
    "subtitle": "□ *pt.* bom;  □ *en.* good",
    "filter_match": ["γαθέ", "γαθοί", "κἀγάθ", "κἀγαθά", "κἀγαθάς", 
"κἀγαθή", "κἀγαθήν", "κἀγαθαί", "κἀγαθοί", "κἀγαθος", "κἀγαθούς", 
"κἀγαθοῖς", "κἀγαθοῦ", "κἀγαθόν", "κἀγαθός", "κἀγαθώ", "κἀγαθῆς", 
"κἀγαθῶν", "κἀγαθῶς", "κἀγαθῷ", "τἀγάθ", "τἀγαθά", "τἀγαθοῦ", "τἀγαθόν", 
"τἀγαθῇ", "τἀγαθῷ", "τὠγαθοῦ", "τὠγαθόν", "ἀγάθ", "ἀγάθων", "ἀγαθά", 
"ἀγαθάν", "ἀγαθάς", "ἀγαθέ", "ἀγαθή", "ἀγαθήν", "ἀγαθαί", "ἀγαθαῖν", 
"ἀγαθαῖς", "ἀγαθαῖσιν", "ἀγαθοί", "ἀγαθούς", "ἀγαθοῖν", "ἀγαθοῖο", 
"ἀγαθοῖς", "ἀγαθοῖσι", "ἀγαθοῖσιν", "ἀγαθοῦ", "ἀγαθόν", "ἀγαθός", "ἀγαθώ", 
"ἀγαθᾶν", "ἀγαθᾶς", "ἀγαθᾷ", "ἀγαθῆισι", "ἀγαθῆισιν", "ἀγαθῆς", "ἀγαθῇ", 
"ἀγαθῇσι", "ἀγαθῇσιν", "ἀγαθῶ", "ἀγαθῶι", "ἀγαθῶν", "ἀγαθῶς", "ἀγαθῷ", 
"ἁγαθή", "ἁγαθαί", "ἁγαθοί", "ἁγαθός", "ὠγαθέ", "ὦγαθ", "ὦγαθε"], 
    "transliteration": "agathos",
    },
    {
    "title": "ἀγαπᾶν",
    "subtitle": "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied, 
like;",
    "filter_match": ["ἀγάπα", "ἀγάπαις", "ἀγάπη", "ἀγάπην", "ἀγάπης", 
"ἀγάπῃ", "ἀγαπᾶ", "ἀγαπᾶν", "ἀγαπᾶς", "ἀγαπᾷ", "ἀγαπᾷν", "ἀγαπᾷς", "ἀγαπῇ", 
"ἀγαπῶν"], 
    "transliteration": "agapan",
    }
  ]
}
```
(I am using JSON here just because it seemed to make sense. Perhaps it 
would be interesting if we were pulling this data from the definitions list 
(with extended attributes) in the same document?)

*The Ruby script*

```
#!/usr/bin/env ruby

Encoding.default_internal = Encoding::UTF_8
Encoding.default_external = Encoding::UTF_8

require 'paru/filter'
require 'json'

GLOSSARY = JSON.parse(File.read("#{__dir__}/data.json"))['items']

Paru::Filter.run do
  with 'Span' do |p|
    next unless p.attr['lang'] == 'el'

    span_content = p.inner_markdown.nil? ? '' : p.inner_markdown.chomp
    result = GLOSSARY.select { |g| g['match'].include?(span_content) } 
unless span_content.nil?

    next unless result != []

    p.inner_markdown = 
"\\index{#{result[0]['transliteration']}@#{result[0]['headword']}}\\glslink{#{result[0]['transliteration']}}{#{p.inner_markdown.chomp}}"
    log << result[0]['headword']
  end
end

log_file.puts "Paru::Filter took #{Time.now - start_time}s.\n\n"
log_file.puts "#{log.length} total entries (#{log.uniq.length} unique) were 
tagged:\n#{log.uniq.sort.join("\n")}\n\n"
```

So if my markdown input were:

```
Lorem, etc. [ἀγαθὸς]{lang=el} is a greek word.
```

The LaTeX output would be:

```
Lorem, etc.\\index{agathos@ἀγαθὸς}\\glslink{agathos}{ἀγαθὸς} is a greek 
word.
```

Please note that the glossary headword must be *agathos*, the 
transliterated form, instead of ἀγαθός, due to weird sorting issues with 
LaTeX.

Any input is appreciated.

Bernardo
https://github.com/bcdavasconcelos


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com.