From: "Bernardo C. D. A. Vasconcelos" <bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Glossary Filter for MD2Tex
Date: Mon, 17 Oct 2022 11:25:13 -0700 (PDT) [thread overview]
Message-ID: <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 4498 bytes --]
Hello everyone,
I am curious if anyone would be willing to lend me a hand in (or give me
directions) translating a small script from Ruby to Lua. The idea is this:
we feed the filter a JSON string with the glossary data. The filter will
check the JSON for each entry's `filter_match` and tag these accordingly in
the text, pointing them to the correct glossary entry. It works as it is,
but it has dependencies (which makes it harder to share), and it seems a
bit slow (perhaps the logic I am applying is faulty).
*JSON Example*
```
{
"entries": [
{
"title": "ἀγαθός",
"subtitle": "□ *pt.* bom; □ *en.* good",
"filter_match": ["γαθέ", "γαθοί", "κἀγάθ", "κἀγαθά", "κἀγαθάς",
"κἀγαθή", "κἀγαθήν", "κἀγαθαί", "κἀγαθοί", "κἀγαθος", "κἀγαθούς",
"κἀγαθοῖς", "κἀγαθοῦ", "κἀγαθόν", "κἀγαθός", "κἀγαθώ", "κἀγαθῆς",
"κἀγαθῶν", "κἀγαθῶς", "κἀγαθῷ", "τἀγάθ", "τἀγαθά", "τἀγαθοῦ", "τἀγαθόν",
"τἀγαθῇ", "τἀγαθῷ", "τὠγαθοῦ", "τὠγαθόν", "ἀγάθ", "ἀγάθων", "ἀγαθά",
"ἀγαθάν", "ἀγαθάς", "ἀγαθέ", "ἀγαθή", "ἀγαθήν", "ἀγαθαί", "ἀγαθαῖν",
"ἀγαθαῖς", "ἀγαθαῖσιν", "ἀγαθοί", "ἀγαθούς", "ἀγαθοῖν", "ἀγαθοῖο",
"ἀγαθοῖς", "ἀγαθοῖσι", "ἀγαθοῖσιν", "ἀγαθοῦ", "ἀγαθόν", "ἀγαθός", "ἀγαθώ",
"ἀγαθᾶν", "ἀγαθᾶς", "ἀγαθᾷ", "ἀγαθῆισι", "ἀγαθῆισιν", "ἀγαθῆς", "ἀγαθῇ",
"ἀγαθῇσι", "ἀγαθῇσιν", "ἀγαθῶ", "ἀγαθῶι", "ἀγαθῶν", "ἀγαθῶς", "ἀγαθῷ",
"ἁγαθή", "ἁγαθαί", "ἁγαθοί", "ἁγαθός", "ὠγαθέ", "ὦγαθ", "ὦγαθε"],
"transliteration": "agathos",
},
{
"title": "ἀγαπᾶν",
"subtitle": "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied,
like;",
"filter_match": ["ἀγάπα", "ἀγάπαις", "ἀγάπη", "ἀγάπην", "ἀγάπης",
"ἀγάπῃ", "ἀγαπᾶ", "ἀγαπᾶν", "ἀγαπᾶς", "ἀγαπᾷ", "ἀγαπᾷν", "ἀγαπᾷς", "ἀγαπῇ",
"ἀγαπῶν"],
"transliteration": "agapan",
}
]
}
```
(I am using JSON here just because it seemed to make sense. Perhaps it
would be interesting if we were pulling this data from the definitions list
(with extended attributes) in the same document?)
*The Ruby script*
```
#!/usr/bin/env ruby
Encoding.default_internal = Encoding::UTF_8
Encoding.default_external = Encoding::UTF_8
require 'paru/filter'
require 'json'
GLOSSARY = JSON.parse(File.read("#{__dir__}/data.json"))['items']
Paru::Filter.run do
with 'Span' do |p|
next unless p.attr['lang'] == 'el'
span_content = p.inner_markdown.nil? ? '' : p.inner_markdown.chomp
result = GLOSSARY.select { |g| g['match'].include?(span_content) }
unless span_content.nil?
next unless result != []
p.inner_markdown =
"\\index{#{result[0]['transliteration']}@#{result[0]['headword']}}\\glslink{#{result[0]['transliteration']}}{#{p.inner_markdown.chomp}}"
log << result[0]['headword']
end
end
log_file.puts "Paru::Filter took #{Time.now - start_time}s.\n\n"
log_file.puts "#{log.length} total entries (#{log.uniq.length} unique) were
tagged:\n#{log.uniq.sort.join("\n")}\n\n"
```
So if my markdown input were:
```
Lorem, etc. [ἀγαθὸς]{lang=el} is a greek word.
```
The LaTeX output would be:
```
Lorem, etc.\\index{agathos@ἀγαθὸς}\\glslink{agathos}{ἀγαθὸς} is a greek
word.
```
Please note that the glossary headword must be *agathos*, the
transliterated form, instead of ἀγαθός, due to weird sorting issues with
LaTeX.
Any input is appreciated.
Bernardo
https://github.com/bcdavasconcelos
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 5465 bytes --]
next reply other threads:[~2022-10-17 18:25 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-17 18:25 Bernardo C. D. A. Vasconcelos [this message]
[not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-10-17 18:38 ` BPJ
[not found] ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-10-18 15:36 ` Bernardo C.D.A. Vasconcelos
[not found] ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 16:34 ` Bastien DUMONT
2022-10-18 17:16 ` Bernardo C.D.A. Vasconcelos
[not found] ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 17:34 ` Bastien DUMONT
2022-10-18 21:43 ` Bernardo C.D.A. Vasconcelos
[not found] ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 22:06 ` Bastien DUMONT
2022-10-19 19:50 ` Bernardo C.D.A. Vasconcelos
[not found] ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-19 21:28 ` Bastien DUMONT
2022-10-19 22:43 ` Bernardo C.D.A. Vasconcelos
[not found] ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-20 7:16 ` Bastien DUMONT
2022-10-18 18:42 ` BPJ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88a14108-f2e4-40d0-a98e-5c6f84b8ff41n@googlegroups.com \
--to=bernardovasconcelos-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).