public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Glossary Filter for MD2Tex
@ 2022-10-17 18:25 Bernardo C. D. A. Vasconcelos
       [not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Bernardo C. D. A. Vasconcelos @ 2022-10-17 18:25 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4498 bytes --]

Hello everyone,

I am curious if anyone would be willing to lend me a hand in (or give me 
directions) translating a small script from Ruby to Lua. The idea is this: 
we feed the filter a JSON string with the glossary data. The filter will 
check the JSON for each entry's `filter_match` and tag these accordingly in 
the text, pointing them to the correct glossary entry. It works as it is, 
but it has dependencies (which makes it harder to share), and it seems a 
bit slow (perhaps the logic I am applying is faulty). 

*JSON Example*

```
{
  "entries": [
    {
    "title": "ἀγαθός",
    "subtitle": "□ *pt.* bom;  □ *en.* good",
    "filter_match": ["γαθέ", "γαθοί", "κἀγάθ", "κἀγαθά", "κἀγαθάς", 
"κἀγαθή", "κἀγαθήν", "κἀγαθαί", "κἀγαθοί", "κἀγαθος", "κἀγαθούς", 
"κἀγαθοῖς", "κἀγαθοῦ", "κἀγαθόν", "κἀγαθός", "κἀγαθώ", "κἀγαθῆς", 
"κἀγαθῶν", "κἀγαθῶς", "κἀγαθῷ", "τἀγάθ", "τἀγαθά", "τἀγαθοῦ", "τἀγαθόν", 
"τἀγαθῇ", "τἀγαθῷ", "τὠγαθοῦ", "τὠγαθόν", "ἀγάθ", "ἀγάθων", "ἀγαθά", 
"ἀγαθάν", "ἀγαθάς", "ἀγαθέ", "ἀγαθή", "ἀγαθήν", "ἀγαθαί", "ἀγαθαῖν", 
"ἀγαθαῖς", "ἀγαθαῖσιν", "ἀγαθοί", "ἀγαθούς", "ἀγαθοῖν", "ἀγαθοῖο", 
"ἀγαθοῖς", "ἀγαθοῖσι", "ἀγαθοῖσιν", "ἀγαθοῦ", "ἀγαθόν", "ἀγαθός", "ἀγαθώ", 
"ἀγαθᾶν", "ἀγαθᾶς", "ἀγαθᾷ", "ἀγαθῆισι", "ἀγαθῆισιν", "ἀγαθῆς", "ἀγαθῇ", 
"ἀγαθῇσι", "ἀγαθῇσιν", "ἀγαθῶ", "ἀγαθῶι", "ἀγαθῶν", "ἀγαθῶς", "ἀγαθῷ", 
"ἁγαθή", "ἁγαθαί", "ἁγαθοί", "ἁγαθός", "ὠγαθέ", "ὦγαθ", "ὦγαθε"], 
    "transliteration": "agathos",
    },
    {
    "title": "ἀγαπᾶν",
    "subtitle": "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied, 
like;",
    "filter_match": ["ἀγάπα", "ἀγάπαις", "ἀγάπη", "ἀγάπην", "ἀγάπης", 
"ἀγάπῃ", "ἀγαπᾶ", "ἀγαπᾶν", "ἀγαπᾶς", "ἀγαπᾷ", "ἀγαπᾷν", "ἀγαπᾷς", "ἀγαπῇ", 
"ἀγαπῶν"], 
    "transliteration": "agapan",
    }
  ]
}
```
(I am using JSON here just because it seemed to make sense. Perhaps it 
would be interesting if we were pulling this data from the definitions list 
(with extended attributes) in the same document?)

*The Ruby script*

```
#!/usr/bin/env ruby

Encoding.default_internal = Encoding::UTF_8
Encoding.default_external = Encoding::UTF_8

require 'paru/filter'
require 'json'

GLOSSARY = JSON.parse(File.read("#{__dir__}/data.json"))['items']

Paru::Filter.run do
  with 'Span' do |p|
    next unless p.attr['lang'] == 'el'

    span_content = p.inner_markdown.nil? ? '' : p.inner_markdown.chomp
    result = GLOSSARY.select { |g| g['match'].include?(span_content) } 
unless span_content.nil?

    next unless result != []

    p.inner_markdown = 
"\\index{#{result[0]['transliteration']}@#{result[0]['headword']}}\\glslink{#{result[0]['transliteration']}}{#{p.inner_markdown.chomp}}"
    log << result[0]['headword']
  end
end

log_file.puts "Paru::Filter took #{Time.now - start_time}s.\n\n"
log_file.puts "#{log.length} total entries (#{log.uniq.length} unique) were 
tagged:\n#{log.uniq.sort.join("\n")}\n\n"
```

So if my markdown input were:

```
Lorem, etc. [ἀγαθὸς]{lang=el} is a greek word.
```

The LaTeX output would be:

```
Lorem, etc.\\index{agathos@ἀγαθὸς}\\glslink{agathos}{ἀγαθὸς} is a greek 
word.
```

Please note that the glossary headword must be *agathos*, the 
transliterated form, instead of ἀγαθός, due to weird sorting issues with 
LaTeX.

Any input is appreciated.

Bernardo
https://github.com/bcdavasconcelos


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5465 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-10-17 18:38   ` BPJ
       [not found]     ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: BPJ @ 2022-10-17 18:38 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 5764 bytes --]

For Ancient Greek you want grc as the language tag.

As for translating the filter note that Lua can't really handle UTF-8.
There is some rudimentary support for converting codepoint number ↔ UTF-8
byte sequences and for iterating through a string of bytes representing
UTF-8 encoded characters but no concept of chars as opposed to bytes. This
may become a show stopper if you need to manipulate strings containing
UTF-8 text.

Den mån 17 okt. 2022 20:26Bernardo C. D. A. Vasconcelos <
bernardovasconcelos@gmail.com> skrev:

> Hello everyone,
>
> I am curious if anyone would be willing to lend me a hand in (or give me
> directions) translating a small script from Ruby to Lua. The idea is this:
> we feed the filter a JSON string with the glossary data. The filter will
> check the JSON for each entry's `filter_match` and tag these accordingly in
> the text, pointing them to the correct glossary entry. It works as it is,
> but it has dependencies (which makes it harder to share), and it seems a
> bit slow (perhaps the logic I am applying is faulty).
>
> *JSON Example*
>
> ```
> {
>   "entries": [
>     {
>     "title": "ἀγαθός",
>     "subtitle": "□ *pt.* bom;  □ *en.* good",
>     "filter_match": ["γαθέ", "γαθοί", "κἀγάθ", "κἀγαθά", "κἀγαθάς",
> "κἀγαθή", "κἀγαθήν", "κἀγαθαί", "κἀγαθοί", "κἀγαθος", "κἀγαθούς",
> "κἀγαθοῖς", "κἀγαθοῦ", "κἀγαθόν", "κἀγαθός", "κἀγαθώ", "κἀγαθῆς",
> "κἀγαθῶν", "κἀγαθῶς", "κἀγαθῷ", "τἀγάθ", "τἀγαθά", "τἀγαθοῦ", "τἀγαθόν",
> "τἀγαθῇ", "τἀγαθῷ", "τὠγαθοῦ", "τὠγαθόν", "ἀγάθ", "ἀγάθων", "ἀγαθά",
> "ἀγαθάν", "ἀγαθάς", "ἀγαθέ", "ἀγαθή", "ἀγαθήν", "ἀγαθαί", "ἀγαθαῖν",
> "ἀγαθαῖς", "ἀγαθαῖσιν", "ἀγαθοί", "ἀγαθούς", "ἀγαθοῖν", "ἀγαθοῖο",
> "ἀγαθοῖς", "ἀγαθοῖσι", "ἀγαθοῖσιν", "ἀγαθοῦ", "ἀγαθόν", "ἀγαθός", "ἀγαθώ",
> "ἀγαθᾶν", "ἀγαθᾶς", "ἀγαθᾷ", "ἀγαθῆισι", "ἀγαθῆισιν", "ἀγαθῆς", "ἀγαθῇ",
> "ἀγαθῇσι", "ἀγαθῇσιν", "ἀγαθῶ", "ἀγαθῶι", "ἀγαθῶν", "ἀγαθῶς", "ἀγαθῷ",
> "ἁγαθή", "ἁγαθαί", "ἁγαθοί", "ἁγαθός", "ὠγαθέ", "ὦγαθ", "ὦγαθε"],
>     "transliteration": "agathos",
>     },
>     {
>     "title": "ἀγαπᾶν",
>     "subtitle": "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied,
> like;",
>     "filter_match": ["ἀγάπα", "ἀγάπαις", "ἀγάπη", "ἀγάπην", "ἀγάπης",
> "ἀγάπῃ", "ἀγαπᾶ", "ἀγαπᾶν", "ἀγαπᾶς", "ἀγαπᾷ", "ἀγαπᾷν", "ἀγαπᾷς", "ἀγαπῇ",
> "ἀγαπῶν"],
>     "transliteration": "agapan",
>     }
>   ]
> }
> ```
> (I am using JSON here just because it seemed to make sense. Perhaps it
> would be interesting if we were pulling this data from the definitions list
> (with extended attributes) in the same document?)
>
> *The Ruby script*
>
> ```
> #!/usr/bin/env ruby
>
> Encoding.default_internal = Encoding::UTF_8
> Encoding.default_external = Encoding::UTF_8
>
> require 'paru/filter'
> require 'json'
>
> GLOSSARY = JSON.parse(File.read("#{__dir__}/data.json"))['items']
>
> Paru::Filter.run do
>   with 'Span' do |p|
>     next unless p.attr['lang'] == 'el'
>
>     span_content = p.inner_markdown.nil? ? '' : p.inner_markdown.chomp
>     result = GLOSSARY.select { |g| g['match'].include?(span_content) }
> unless span_content.nil?
>
>     next unless result != []
>
>     p.inner_markdown =
> "\\index{#{result[0]['transliteration']}@#{result[0]['headword']}}\\glslink{#{result[0]['transliteration']}}{#{p.inner_markdown.chomp}}"
>     log << result[0]['headword']
>   end
> end
>
> log_file.puts "Paru::Filter took #{Time.now - start_time}s.\n\n"
> log_file.puts "#{log.length} total entries (#{log.uniq.length} unique)
> were tagged:\n#{log.uniq.sort.join("\n")}\n\n"
> ```
>
> So if my markdown input were:
>
> ```
> Lorem, etc. [ἀγαθὸς]{lang=el} is a greek word.
> ```
>
> The LaTeX output would be:
>
> ```
> Lorem, etc.\\index{agathos@ἀγαθὸς}\\glslink{agathos}{ἀγαθὸς} is a greek
> word.
> ```
>
> Please note that the glossary headword must be *agathos*, the
> transliterated form, instead of ἀγαθός, due to weird sorting issues with
> LaTeX.
>
> Any input is appreciated.
>
> Bernardo
> https://github.com/bcdavasconcelos
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/88a14108-f2e4-40d0-a98e-5c6f84b8ff41n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN%2BCF3SGA5mTLrc2As%2BR6rw%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 8282 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found]     ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2022-10-18 15:36       ` Bernardo C.D.A. Vasconcelos
       [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Bernardo C.D.A. Vasconcelos @ 2022-10-18 15:36 UTC (permalink / raw)
  To: pandoc-discuss

> As for translating the filter note that Lua can't really handle UTF-8.
> There is some rudimentary support for converting codepoint number ↔ 
> UTF-8
> byte sequences and for iterating through a string of bytes 
> representing
> UTF-8 encoded characters but no concept of chars as opposed to bytes. 
> This
> may become a show stopper if you need to manipulate strings containing
> UTF-8 text.


Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards includes 
UTF-8 support. Have you seen it? E.g. 
https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm

> For Ancient Greek you want grc as the language tag.
>

Indeed it is (and that is generally what I use), but ἀγαθός is 
just Polytonic Greek, which is not the same as Ancient Greek.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-10-18 16:34           ` Bastien DUMONT
  2022-10-18 17:16             ` Bernardo C.D.A. Vasconcelos
  2022-10-18 18:42           ` BPJ
  1 sibling, 1 reply; 13+ messages in thread
From: Bastien DUMONT @ 2022-10-18 16:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Yes, but it is limited to this utf8 library. For instance, if perform a regexp search like `string.match('ἀγαθός', '[γδ]')`, it try to match one of the four bytes inside the square brackets against the string 'ἀγαθός', so it will return the first byte of γ, not γ. To circumvent this limitation, you would be forced to test γ and δ separately. Nevertheless, if you always perform comparisons between whole strings as you currently do in your script, this should not be a problem.

As for your concern with dependancies, you most probably would have to rely on a JSON library such as lunajson. However, if your JSON files are not supposed to change, you could also convert them to a Lua file using a JSON library and a serialization library, so as to be able to import the resulting Lua data structure directly in your filter.

Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. Vasconcelos a écrit :
> > As for translating the filter note that Lua can't really handle UTF-8.
> > There is some rudimentary support for converting codepoint number ↔
> > UTF-8
> > byte sequences and for iterating through a string of bytes representing
> > UTF-8 encoded characters but no concept of chars as opposed to bytes.
> > This
> > may become a show stopper if you need to manipulate strings containing
> > UTF-8 text.
> 
> 
> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards includes
> UTF-8 support. Have you seen it? E.g. https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> 
> > For Ancient Greek you want grc as the language tag.
> > 
> 
> Indeed it is (and that is generally what I use), but ἀγαθός is just
> Polytonic Greek, which is not the same as Ancient Greek.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
  2022-10-18 16:34           ` Bastien DUMONT
@ 2022-10-18 17:16             ` Bernardo C.D.A. Vasconcelos
       [not found]               ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Bernardo C.D.A. Vasconcelos @ 2022-10-18 17:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Thank you for the suggestions, Bastien. There is technically no need for 
regex, as all the forms are spelled out to avoid the need to create ad 
hoc regex rules for each term. Now that I think about it, the principle 
is the same as Citeproc's: a tagged inline element will be matched 
against a lookup table and replaced. I will look at the citeproc code to 
see if it leads anywhere or if it could be reused in anyway.

On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:

> Yes, but it is limited to this utf8 library. For instance, if perform 
> a regexp search like `string.match('ἀγαθός', '[γδ]')`, it try 
> to match one of the four bytes inside the square brackets against the 
> string 'ἀγαθός', so it will return the first byte of γ, not 
> γ. To circumvent this limitation, you would be forced to test γ and 
> δ separately. Nevertheless, if you always perform comparisons between 
> whole strings as you currently do in your script, this should not be a 
> problem.
>
> As for your concern with dependancies, you most probably would have to 
> rely on a JSON library such as lunajson. However, if your JSON files 
> are not supposed to change, you could also convert them to a Lua file 
> using a JSON library and a serialization library, so as to be able to 
> import the resulting Lua data structure directly in your filter.
>
> Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. Vasconcelos 
> a écrit :
>>> As for translating the filter note that Lua can't really handle 
>>> UTF-8.
>>> There is some rudimentary support for converting codepoint number 
>>> ↔
>>> UTF-8
>>> byte sequences and for iterating through a string of bytes 
>>> representing
>>> UTF-8 encoded characters but no concept of chars as opposed to 
>>> bytes.
>>> This
>>> may become a show stopper if you need to manipulate strings 
>>> containing
>>> UTF-8 text.
>>
>>
>> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards 
>> includes
>> UTF-8 support. Have you seen it? E.g. 
>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>>
>>> For Ancient Greek you want grc as the language tag.
>>>
>>
>> Indeed it is (and that is generally what I use), but ἀγαθός is 
>> just
>> Polytonic Greek, which is not the same as Ancient Greek.
>>
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found]               ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-10-18 17:34                 ` Bastien DUMONT
  2022-10-18 21:43                   ` Bernardo C.D.A. Vasconcelos
  0 siblings, 1 reply; 13+ messages in thread
From: Bastien DUMONT @ 2022-10-18 17:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

No, citeproc receives a data structure produced by pandoc. Pandoc is responsible for the parsing. I think that your script would not be so hard to rewrite in Lua, the main problem is to know if you can achieve your goals this way. If your main concern is portability, then writing a Lua filter with no dependancies certainly is a good solution provided that you feed it with a Lua data structure (or embed the code responsible for JSON parsing in your script).

Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos a écrit :
> Thank you for the suggestions, Bastien. There is technically no need for
> regex, as all the forms are spelled out to avoid the need to create ad hoc
> regex rules for each term. Now that I think about it, the principle is the
> same as Citeproc's: a tagged inline element will be matched against a lookup
> table and replaced. I will look at the citeproc code to see if it leads
> anywhere or if it could be reused in anyway.
> 
> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
> 
> > Yes, but it is limited to this utf8 library. For instance, if perform a
> > regexp search like `string.match('ἀγαθός', '[γδ]')`, it try to match one
> > of the four bytes inside the square brackets against the string
> > 'ἀγαθός', so it will return the first byte of γ, not γ. To circumvent
> > this limitation, you would be forced to test γ and δ separately.
> > Nevertheless, if you always perform comparisons between whole strings as
> > you currently do in your script, this should not be a problem.
> > 
> > As for your concern with dependancies, you most probably would have to
> > rely on a JSON library such as lunajson. However, if your JSON files are
> > not supposed to change, you could also convert them to a Lua file using
> > a JSON library and a serialization library, so as to be able to import
> > the resulting Lua data structure directly in your filter.
> > 
> > Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. Vasconcelos a
> > écrit :
> > > > As for translating the filter note that Lua can't really handle
> > > > UTF-8.
> > > > There is some rudimentary support for converting codepoint
> > > > number ↔
> > > > UTF-8
> > > > byte sequences and for iterating through a string of bytes
> > > > representing
> > > > UTF-8 encoded characters but no concept of chars as opposed to
> > > > bytes.
> > > > This
> > > > may become a show stopper if you need to manipulate strings
> > > > containing
> > > > UTF-8 text.
> > > 
> > > 
> > > Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards
> > > includes
> > > UTF-8 support. Have you seen it? E.g. https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> > > 
> > > > For Ancient Greek you want grc as the language tag.
> > > > 
> > > 
> > > Indeed it is (and that is generally what I use), but ἀγαθός is just
> > > Polytonic Greek, which is not the same as Ancient Greek.
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google
> > > Groups "pandoc-discuss" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
> > 
> > -- 
> > You received this message because you are subscribed to the Google
> > Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2022-10-18 16:34           ` Bastien DUMONT
@ 2022-10-18 18:42           ` BPJ
  1 sibling, 0 replies; 13+ messages in thread
From: BPJ @ 2022-10-18 18:42 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]

Den tis 18 okt. 2022 17:36Bernardo C.D.A. Vasconcelos <
bernardovasconcelos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> > As for translating the filter note that Lua can't really handle UTF-8.
> > There is some rudimentary support for converting codepoint number ↔
> > UTF-8
> > byte sequences and for iterating through a string of bytes
> > representing
> > UTF-8 encoded characters but no concept of chars as opposed to bytes.
> > This
> > may become a show stopper if you need to manipulate strings containing
> > UTF-8 text.
>
>
> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards includes
> UTF-8 support. Have you seen it? E.g.
>
> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm


Yes, that is what I meant. It's very very basic. Notably pattern matching
is still entirely byte oriented, except for the pattern `utf8.charpattern`
which will match the bytes of any UTF-8 character. Pandoc adds some UTF-8
oriented functions, notably case changing functions, in the `pandoc.text`
library, but that is all.



>
>
>
> > For Ancient Greek you want grc as the language tag.
> >
>
> Indeed it is (and that is generally what I use), but ἀγαθός is
> just Polytonic Greek, which is not the same as Ancient Greek.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBVNnb9LTK5jvnDZbhqbP--BFzgc3fQgw2Lw4VBZ-fH7A%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 3564 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
  2022-10-18 17:34                 ` Bastien DUMONT
@ 2022-10-18 21:43                   ` Bernardo C.D.A. Vasconcelos
       [not found]                     ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Bernardo C.D.A. Vasconcelos @ 2022-10-18 21:43 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 6281 bytes --]


The data is mostly in database format and could be output in the best 
format for the task, but I wanted to make it friendly for other people 
to use as well. Could a YAML metadata block be a solution?

```yaml
glossary:
   glossary_lang: grc
   entries:
   - headword: ἀγαθός
     text: "□ *pt.* bom;  □ *en.* good; and so on and so forth"
     match:
     - γαθέ
     - γαθοί
     - κἀγάθ
     - κἀγαθά
     - κἀγαθάς
     - κἀγαθή
     - κἀγαθήν
     - κἀγαθαί
     - κἀγαθοί
     - κἀγαθος
   - headword: ἀγαπᾶν
     transliteration: agapan
     text: "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied, 
like;"
     match:
     - ἀγάπα
     - ἀγάπαις
     - ἀγάπη
     - ἀγάπην
     - ἀγάπης
     - ἀγάπῃ
     - ἀγαπᾶ
     - ἀγαπᾶν
     - ἀγαπᾶς
```



On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:

> No, citeproc receives a data structure produced by pandoc. Pandoc is 
> responsible for the parsing. I think that your script would not be so 
> hard to rewrite in Lua, the main problem is to know if you can achieve 
> your goals this way. If your main concern is portability, then writing 
> a Lua filter with no dependancies certainly is a good solution 
> provided that you feed it with a Lua data structure (or embed the code 
> responsible for JSON parsing in your script).
>
> Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos 
> a écrit :
>> Thank you for the suggestions, Bastien. There is technically no need 
>> for
>> regex, as all the forms are spelled out to avoid the need to create 
>> ad hoc
>> regex rules for each term. Now that I think about it, the principle 
>> is the
>> same as Citeproc's: a tagged inline element will be matched against a 
>> lookup
>> table and replaced. I will look at the citeproc code to see if it 
>> leads
>> anywhere or if it could be reused in anyway.
>>
>> On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
>>
>>> Yes, but it is limited to this utf8 library. For instance, if 
>>> perform a
>>> regexp search like `string.match('ἀγαθός', '[γδ]')`, it try 
>>> to match one
>>> of the four bytes inside the square brackets against the string
>>> 'ἀγαθός', so it will return the first byte of γ, not γ. To 
>>> circumvent
>>> this limitation, you would be forced to test γ and δ separately.
>>> Nevertheless, if you always perform comparisons between whole 
>>> strings as
>>> you currently do in your script, this should not be a problem.
>>>
>>> As for your concern with dependancies, you most probably would have 
>>> to
>>> rely on a JSON library such as lunajson. However, if your JSON files 
>>> are
>>> not supposed to change, you could also convert them to a Lua file 
>>> using
>>> a JSON library and a serialization library, so as to be able to 
>>> import
>>> the resulting Lua data structure directly in your filter.
>>>
>>> Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. 
>>> Vasconcelos a
>>> écrit :
>>>>> As for translating the filter note that Lua can't really handle
>>>>> UTF-8.
>>>>> There is some rudimentary support for converting codepoint
>>>>> number ↔
>>>>> UTF-8
>>>>> byte sequences and for iterating through a string of bytes
>>>>> representing
>>>>> UTF-8 encoded characters but no concept of chars as opposed to
>>>>> bytes.
>>>>> This
>>>>> may become a show stopper if you need to manipulate strings
>>>>> containing
>>>>> UTF-8 text.
>>>>
>>>>
>>>> Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards
>>>> includes
>>>> UTF-8 support. Have you seen it? E.g. 
>>>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>>>>
>>>>> For Ancient Greek you want grc as the language tag.
>>>>>
>>>>
>>>> Indeed it is (and that is generally what I use), but ἀγαθός 
>>>> is just
>>>> Polytonic Greek, which is not the same as Ancient Greek.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "pandoc-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it,
>>>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, 
>>> send
>>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost.
>>
>> -- 
>> You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.

[-- Attachment #2: Type: text/html, Size: 10302 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found]                     ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-10-18 22:06                       ` Bastien DUMONT
  2022-10-19 19:50                         ` Bernardo C.D.A. Vasconcelos
  0 siblings, 1 reply; 13+ messages in thread
From: Bastien DUMONT @ 2022-10-18 22:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Yes, it could! You would have access to the corresponding metadata object in the AST.

Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. Vasconcelos a écrit :
> The data is mostly in database format and could be output in the best format
> for the task, but I wanted to make it friendly for other people to use as well.
> Could a YAML metadata block be a solution?
> 
> glossary:
>   glossary_lang: grc
>   entries:
>   - headword: ἀγαθός
>     text: "□ *pt.* bom;  □ *en.* good; and so on and so forth"
>     match:
>     - γαθέ
>     - γαθοί
>     - κἀγάθ
>     - κἀγαθά
>     - κἀγαθάς
>     - κἀγαθή
>     - κἀγαθήν
>     - κἀγαθαί
>     - κἀγαθοί
>     - κἀγαθος
>   - headword: ἀγαπᾶν
>     transliteration: agapan
>     text: "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied, like;"
>     match:
>     - ἀγάπα
>     - ἀγάπαις
>     - ἀγάπη
>     - ἀγάπην
>     - ἀγάπης
>     - ἀγάπῃ
>     - ἀγαπᾶ
>     - ἀγαπᾶν
>     - ἀγαπᾶς
> 
> On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
> 
>     No, citeproc receives a data structure produced by pandoc. Pandoc is
>     responsible for the parsing. I think that your script would not be so hard
>     to rewrite in Lua, the main problem is to know if you can achieve your
>     goals this way. If your main concern is portability, then writing a Lua
>     filter with no dependancies certainly is a good solution provided that you
>     feed it with a Lua data structure (or embed the code responsible for JSON
>     parsing in your script).
> 
>     Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos a
>     écrit :
> 
>         Thank you for the suggestions, Bastien. There is technically no need
>         for
>         regex, as all the forms are spelled out to avoid the need to create ad
>         hoc
>         regex rules for each term. Now that I think about it, the principle is
>         the
>         same as Citeproc's: a tagged inline element will be matched against a
>         lookup
>         table and replaced. I will look at the citeproc code to see if it leads
>         anywhere or if it could be reused in anyway.
> 
>         On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
> 
>             Yes, but it is limited to this utf8 library. For instance, if
>             perform a
>             regexp search like `string.match('ἀγαθός', '[γδ]')`, it try to
>             match one
>             of the four bytes inside the square brackets against the string
>             'ἀγαθός', so it will return the first byte of γ, not γ. To
>             circumvent
>             this limitation, you would be forced to test γ and δ separately.
>             Nevertheless, if you always perform comparisons between whole
>             strings as
>             you currently do in your script, this should not be a problem.
> 
>             As for your concern with dependancies, you most probably would have
>             to
>             rely on a JSON library such as lunajson. However, if your JSON
>             files are
>             not supposed to change, you could also convert them to a Lua file
>             using
>             a JSON library and a serialization library, so as to be able to
>             import
>             the resulting Lua data structure directly in your filter.
> 
>             Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
>             Vasconcelos a
>             écrit :
> 
>                     As for translating the filter note that Lua can't really
>                     handle
>                     UTF-8.
>                     There is some rudimentary support for converting codepoint
>                     number ↔
>                     UTF-8
>                     byte sequences and for iterating through a string of bytes
>                     representing
>                     UTF-8 encoded characters but no concept of chars as opposed
>                     to
>                     bytes.
>                     This
>                     may become a show stopper if you need to manipulate strings
>                     containing
>                     UTF-8 text.
> 
>                 Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards
>                 includes
>                 UTF-8 support. Have you seen it? E.g. [1]https://
>                 q-syshelp.qsc.com/Content/Control_Scripting/
>                 Lua_5.3_Reference_Manual/Standard_Libraries/
>                 4_-_Basic_UTF-8_Support.htm
> 
>                     For Ancient Greek you want grc as the language tag.
> 
>                 Indeed it is (and that is generally what I use), but ἀγαθός is
>                 just
>                 Polytonic Greek, which is not the same as Ancient Greek.
> 
>                 --
>                 You received this message because you are subscribed to the
>                 Google
>                 Groups "pandoc-discuss" group.
>                 To unsubscribe from this group and stop receiving emails from
>                 it,
>                 send an email to pandoc-discuss+unsubscribe@googlegroups.com.
>                 To view this discussion on the web visit [2]https://
>                 groups.google.com/d/msgid/pandoc-discuss/
>                 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
> 
>             --
>             You received this message because you are subscribed to the Google
>             Groups "pandoc-discuss" group.
>             To unsubscribe from this group and stop receiving emails from it,
>             send
>             an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>             To view this discussion on the web visit [3]https://
>             groups.google.com/d/msgid/pandoc-discuss/
>             Y07VnbuRsuqUg8US%40localhost.
> 
>         --
>         You received this message because you are subscribed to the Google
>         Groups "pandoc-discuss" group.
>         To unsubscribe from this group and stop receiving emails from it, send
>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         To view this discussion on the web visit [4]https://groups.google.com/d
>         /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
> 
>     --
>     You received this message because you are subscribed to the Google Groups
>     "pandoc-discuss" group.
>     To unsubscribe from this group and stop receiving emails from it, send an
>     email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     To view this discussion on the web visit [5]https://groups.google.com/d/
>     msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [7]https://groups.google.com/d/msgid/
> pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
> 
> References:
> 
> [1] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> [2] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
> [3] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
> [4] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
> [5] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
> [6] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [7] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
  2022-10-18 22:06                       ` Bastien DUMONT
@ 2022-10-19 19:50                         ` Bernardo C.D.A. Vasconcelos
       [not found]                           ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Bernardo C.D.A. Vasconcelos @ 2022-10-19 19:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 10330 bytes --]


I have found this little script that takes me nearly there:

```
local vars = {}

function Meta(meta)
     for k, v in pairs(meta) do
         vars["%" .. k .. "%"] = v
     end
end

function Str(elem)
     if vars[elem.text] then
         return vars[elem.text]
     else
         return elem
     end
end

return {
     { Meta = Meta },
     { Str  = Str  }
}

```

Instead, we would use: `meta.glossary.entries`. The crux for me is 
looping through the list of entries, adding all the values of the 
`to_match` field (a.k.a. known forms) (of each entry) to `vars` as a key 
with the content of some other field (e.g. `glslink`) as value. E.g. 
`vars[ .. entry.to_match.each .. ] = entry.glslink`.



On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:

> Yes, it could! You would have access to the corresponding metadata 
> object in the AST.
>
> Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. Vasconcelos 
> a écrit :
>> The data is mostly in database format and could be output in the best 
>> format
>> for the task, but I wanted to make it friendly for other people to 
>> use as well.
>> Could a YAML metadata block be a solution?
>>
>> glossary:
>>   glossary_lang: grc
>>   entries:
>>   - headword: ἀγαθός
>>     text: "□ *pt.* bom;  □ *en.* good; and so on and so forth"
>>     match:
>>     - γαθέ
>>     - γαθοί
>>     - κἀγάθ
>>     - κἀγαθά
>>     - κἀγαθάς
>>     - κἀγαθή
>>     - κἀγαθήν
>>     - κἀγαθαί
>>     - κἀγαθοί
>>     - κἀγαθος
>>   - headword: ἀγαπᾶν
>>     transliteration: agapan
>>     text: "□ *pt.* estar satisfeito, gostar;  □ *en.* be 
>> satisfied, like;"
>>     match:
>>     - ἀγάπα
>>     - ἀγάπαις
>>     - ἀγάπη
>>     - ἀγάπην
>>     - ἀγάπης
>>     - ἀγάπῃ
>>     - ἀγαπᾶ
>>     - ἀγαπᾶν
>>     - ἀγαπᾶς
>>
>> On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
>>
>>     No, citeproc receives a data structure produced by pandoc. Pandoc 
>> is
>>     responsible for the parsing. I think that your script would not 
>> be so hard
>>     to rewrite in Lua, the main problem is to know if you can achieve 
>> your
>>     goals this way. If your main concern is portability, then writing 
>> a Lua
>>     filter with no dependancies certainly is a good solution provided 
>> that you
>>     feed it with a Lua data structure (or embed the code responsible 
>> for JSON
>>     parsing in your script).
>>
>>     Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. 
>> Vasconcelos a
>>     écrit :
>>
>>         Thank you for the suggestions, Bastien. There is technically 
>> no need
>>         for
>>         regex, as all the forms are spelled out to avoid the need to 
>> create ad
>>         hoc
>>         regex rules for each term. Now that I think about it, the 
>> principle is
>>         the
>>         same as Citeproc's: a tagged inline element will be matched 
>> against a
>>         lookup
>>         table and replaced. I will look at the citeproc code to see 
>> if it leads
>>         anywhere or if it could be reused in anyway.
>>
>>         On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
>>
>>             Yes, but it is limited to this utf8 library. For 
>> instance, if
>>             perform a
>>             regexp search like `string.match('ἀγαθός', 
>> '[γδ]')`, it try to
>>             match one
>>             of the four bytes inside the square brackets against the 
>> string
>>             'ἀγαθός', so it will return the first byte of γ, 
>> not γ. To
>>             circumvent
>>             this limitation, you would be forced to test γ and δ 
>> separately.
>>             Nevertheless, if you always perform comparisons between 
>> whole
>>             strings as
>>             you currently do in your script, this should not be a 
>> problem.
>>
>>             As for your concern with dependancies, you most probably 
>> would have
>>             to
>>             rely on a JSON library such as lunajson. However, if your 
>> JSON
>>             files are
>>             not supposed to change, you could also convert them to a 
>> Lua file
>>             using
>>             a JSON library and a serialization library, so as to be 
>> able to
>>             import
>>             the resulting Lua data structure directly in your filter.
>>
>>             Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
>>             Vasconcelos a
>>             écrit :
>>
>>                     As for translating the filter note that Lua can't 
>> really
>>                     handle
>>                     UTF-8.
>>                     There is some rudimentary support for converting 
>> codepoint
>>                     number ↔
>>                     UTF-8
>>                     byte sequences and for iterating through a string 
>> of bytes
>>                     representing
>>                     UTF-8 encoded characters but no concept of chars 
>> as opposed
>>                     to
>>                     bytes.
>>                     This
>>                     may become a show stopper if you need to 
>> manipulate strings
>>                     containing
>>                     UTF-8 text.
>>
>>                 Thanks, @BPJ, for the explanation. Apparently, Lua 
>> 5.3 onwards
>>                 includes
>>                 UTF-8 support. Have you seen it? E.g. [1]https://
>>                 q-syshelp.qsc.com/Content/Control_Scripting/
>>                 Lua_5.3_Reference_Manual/Standard_Libraries/
>>                 4_-_Basic_UTF-8_Support.htm
>>
>>                     For Ancient Greek you want grc as the language 
>> tag.
>>
>>                 Indeed it is (and that is generally what I use), but 
>> ἀγαθός is
>>                 just
>>                 Polytonic Greek, which is not the same as Ancient 
>> Greek.
>>
>>                 --
>>                 You received this message because you are subscribed 
>> to the
>>                 Google
>>                 Groups "pandoc-discuss" group.
>>                 To unsubscribe from this group and stop receiving 
>> emails from
>>                 it,
>>                 send an email to 
>> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>                 To view this discussion on the web visit [2]https://
>>                 groups.google.com/d/msgid/pandoc-discuss/
>>                 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>>
>>             --
>>             You received this message because you are subscribed to 
>> the Google
>>             Groups "pandoc-discuss" group.
>>             To unsubscribe from this group and stop receiving emails 
>> from it,
>>             send
>>             an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>             To view this discussion on the web visit [3]https://
>>             groups.google.com/d/msgid/pandoc-discuss/
>>             Y07VnbuRsuqUg8US%40localhost.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it, send
>>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit 
>> [4]https://groups.google.com/d
>>         /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
>>
>>     --
>>     You received this message because you are subscribed to the 
>> Google Groups
>>     "pandoc-discuss" group.
>>     To unsubscribe from this group and stop receiving emails from it, 
>> send an
>>     email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>     To view this discussion on the web visit 
>> [5]https://groups.google.com/d/
>>     msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
>>
>> --
>> You received this message because you are subscribed to the Google 
>> Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email
>> to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> [7]https://groups.google.com/d/msgid/
>> pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
>>
>> References:
>>
>> [1] 
>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>> [2] 
>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
>> [3] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
>> [4] 
>> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
>> [5] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
>> [6] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> [7] 
>> https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.

[-- Attachment #2: Type: text/html, Size: 12665 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found]                           ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-10-19 21:28                             ` Bastien DUMONT
  2022-10-19 22:43                               ` Bernardo C.D.A. Vasconcelos
  0 siblings, 1 reply; 13+ messages in thread
From: Bastien DUMONT @ 2022-10-19 21:28 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 11846 bytes --]

I think that the attached script could be a good starting point.

Le Wednesday 19 October 2022 à 04:50:25PM, Bernardo C.D.A. Vasconcelos a écrit :
> I have found this little script that takes me nearly there:
> 
> local vars = {}
> 
> function Meta(meta)
>     for k, v in pairs(meta) do
>         vars["%" .. k .. "%"] = v
>     end
> end
> 
> function Str(elem)
>     if vars[elem.text] then
>         return vars[elem.text]
>     else
>         return elem
>     end
> end
> 
> return {
>     { Meta = Meta },
>     { Str  = Str  }
> }
> 
> 
> Instead, we would use: meta.glossary.entries. The crux for me is looping
> through the list of entries, adding all the values of the to_match field
> (a.k.a. known forms) (of each entry) to vars as a key with the content of some
> other field (e.g. glslink) as value. E.g. vars[ .. entry.to_match.each .. ] =
> entry.glslink.
> 
> On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:
> 
>     Yes, it could! You would have access to the corresponding metadata object
>     in the AST.
> 
>     Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. Vasconcelos a
>     écrit :
> 
>         The data is mostly in database format and could be output in the best
>         format
>         for the task, but I wanted to make it friendly for other people to use
>         as well.
>         Could a YAML metadata block be a solution?
> 
>         glossary:
>         glossary_lang: grc
>         entries:
>         - headword: ἀγαθός
>         text: "□ *pt.* bom; □ *en.* good; and so on and so forth"
>         match:
>         - γαθέ
>         - γαθοί
>         - κἀγάθ
>         - κἀγαθά
>         - κἀγαθάς
>         - κἀγαθή
>         - κἀγαθήν
>         - κἀγαθαί
>         - κἀγαθοί
>         - κἀγαθος
>         - headword: ἀγαπᾶν
>         transliteration: agapan
>         text: "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied, like;"
>         match:
>         - ἀγάπα
>         - ἀγάπαις
>         - ἀγάπη
>         - ἀγάπην
>         - ἀγάπης
>         - ἀγάπῃ
>         - ἀγαπᾶ
>         - ἀγαπᾶν
>         - ἀγαπᾶς
> 
>         On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
> 
>         No, citeproc receives a data structure produced by pandoc. Pandoc is
>         responsible for the parsing. I think that your script would not be so
>         hard
>         to rewrite in Lua, the main problem is to know if you can achieve your
>         goals this way. If your main concern is portability, then writing a Lua
>         filter with no dependancies certainly is a good solution provided that
>         you
>         feed it with a Lua data structure (or embed the code responsible for
>         JSON
>         parsing in your script).
> 
>         Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos a
>         écrit :
> 
>         Thank you for the suggestions, Bastien. There is technically no need
>         for
>         regex, as all the forms are spelled out to avoid the need to create ad
>         hoc
>         regex rules for each term. Now that I think about it, the principle is
>         the
>         same as Citeproc's: a tagged inline element will be matched against a
>         lookup
>         table and replaced. I will look at the citeproc code to see if it leads
>         anywhere or if it could be reused in anyway.
> 
>         On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
> 
>         Yes, but it is limited to this utf8 library. For instance, if
>         perform a
>         regexp search like `string.match('ἀγαθός', '[γδ]')`, it try to
>         match one
>         of the four bytes inside the square brackets against the string
>         'ἀγαθός', so it will return the first byte of γ, not γ. To
>         circumvent
>         this limitation, you would be forced to test γ and δ separately.
>         Nevertheless, if you always perform comparisons between whole
>         strings as
>         you currently do in your script, this should not be a problem.
> 
>         As for your concern with dependancies, you most probably would have
>         to
>         rely on a JSON library such as lunajson. However, if your JSON
>         files are
>         not supposed to change, you could also convert them to a Lua file
>         using
>         a JSON library and a serialization library, so as to be able to
>         import
>         the resulting Lua data structure directly in your filter.
> 
>         Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
>         Vasconcelos a
>         écrit :
> 
>         As for translating the filter note that Lua can't really
>         handle
>         UTF-8.
>         There is some rudimentary support for converting codepoint
>         number ↔
>         UTF-8
>         byte sequences and for iterating through a string of bytes
>         representing
>         UTF-8 encoded characters but no concept of chars as opposed
>         to
>         bytes.
>         This
>         may become a show stopper if you need to manipulate strings
>         containing
>         UTF-8 text.
> 
>         Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards
>         includes
>         UTF-8 support. Have you seen it? E.g. [1]https://
>         q-syshelp.qsc.com/Content/Control_Scripting/
>         Lua_5.3_Reference_Manual/Standard_Libraries/
>         4_-_Basic_UTF-8_Support.htm
> 
>         For Ancient Greek you want grc as the language tag.
> 
>         Indeed it is (and that is generally what I use), but ἀγαθός is
>         just
>         Polytonic Greek, which is not the same as Ancient Greek.
> 
>         --
>         You received this message because you are subscribed to the
>         Google
>         Groups "pandoc-discuss" group.
>         To unsubscribe from this group and stop receiving emails from
>         it,
>         send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         To view this discussion on the web visit [2]https://
>         groups.google.com/d/msgid/pandoc-discuss/
>         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
> 
>         --
>         You received this message because you are subscribed to the Google
>         Groups "pandoc-discuss" group.
>         To unsubscribe from this group and stop receiving emails from it,
>         send
>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         To view this discussion on the web visit [3]https://
>         groups.google.com/d/msgid/pandoc-discuss/
>         Y07VnbuRsuqUg8US%40localhost.
> 
>         --
>         You received this message because you are subscribed to the Google
>         Groups "pandoc-discuss" group.
>         To unsubscribe from this group and stop receiving emails from it, send
>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         To view this discussion on the web visit [4][1]https://
>         groups.google.com/d
>         /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
> 
>         --
>         You received this message because you are subscribed to the Google
>         Groups
>         "pandoc-discuss" group.
>         To unsubscribe from this group and stop receiving emails from it, send
>         an
>         email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         To view this discussion on the web visit [5][2]https://
>         groups.google.com/d/
>         msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
> 
>         --
>         You received this message because you are subscribed to the Google
>         Groups
>         "pandoc-discuss" group.
>         To unsubscribe from this group and stop receiving emails from it, send
>         an email
>         to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         To view this discussion on the web visit [7][3]https://
>         groups.google.com/d/msgid/
>         pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
> 
>         References:
> 
>         [1] [4]https://q-syshelp.qsc.com/Content/Control_Scripting/
>         Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>         [2] [5]https://groups.google.com/d/msgid/pandoc-discuss/
>         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
>         [3] [6]https://groups.google.com/d/msgid/pandoc-discuss/
>         Y07VnbuRsuqUg8US%40localhost
>         [4] [7]https://groups.google.com/d/msgid/pandoc-discuss/
>         7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
>         [5] [8]https://groups.google.com/d/msgid/pandoc-discuss/
>         Y07ji07FFokQdOR%2B%40localhost
>         [6] [9]mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>         [7] [10]https://groups.google.com/d/msgid/pandoc-discuss/
>         D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&
>         utm_source=footer
> 
>     --
>     You received this message because you are subscribed to the Google Groups
>     "pandoc-discuss" group.
>     To unsubscribe from this group and stop receiving emails from it, send an
>     email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     To view this discussion on the web visit [11]https://groups.google.com/d/
>     msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [13]https://groups.google.com/d/msgid/
> pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.
> 
> References:
> 
> [1] https://groups.google.com/d
> [2] https://groups.google.com/d/
> [3] https://groups.google.com/d/msgid/
> [4] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> [5] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
> [6] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
> [7] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
> [8] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
> [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [10] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer
> [11] https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost
> [12] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [13] https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y1BsCdqttFxOi/pa%40localhost.

[-- Attachment #2: tag-greek-words.lua --]
[-- Type: text/plain, Size: 2123 bytes --]

-- I suppose that you always use plain strings in the
-- "headword", "transliteration" and "match" fields,
-- so I stringify the corresponding Inlines to be able
-- to more easily insert the values in the LaTeX string.
local stringify = pandoc.utils.stringify

local open_glslink_scd_arg = pandoc.RawInline('latex', '{')
local close_glslink_scd_arg = pandoc.RawInline('latex', '}')

-- I use two tables: one to store the data relative to the headwords,
-- the other to map the forms to the corresponding entries
-- in headwords_data.
-- Since the entries in headwords_data are tables
-- and tables are always passed by reference in Lua,
-- this approach avoids a lot of redundant writings in memory.
local headwords_data = {}
local forms_to_headwords = {}

local function get_glossary_data(meta)
  for _, entry in ipairs(meta.glossary.entries) do
    local headword = stringify(entry.headword) 
    headwords_data[headword] = {
      headword = headword,
      text = entry.text,
      transliteration = stringify(entry.transliteration)
    }
    for _, form in ipairs(entry.match) do
      forms_to_headwords[stringify(form)] = headwords_data[headword]
    end
  end
end

local function tag_words(span)
  if span.attributes.lang == 'el' then
    local content = stringify(span.content)
    local word_data = forms_to_headwords[content]
    if word_data then
      local linguistic_tags =
        -- If the "transliteration" field is missing, Lua will throw an error.
        -- I suppose that this should not happen, but if it can be so,
        -- uncomment the following line (supposing that the lonely @
        -- will not cause problems):
        -- word_data.transliteration = word_data.transliteration or ''
        pandoc.RawInline('latex',
                         '\\index{' .. word_data.transliteration ..
                         '@' .. word_data.headword .. '}' ..
                         '\\glslink{' .. word_data.transliteration .. '}')
      return { linguistic_tags, open_glslink_scd_arg, span, close_glslink_scd_arg }
    end
  end
end

return {
  { Meta = get_glossary_data },
  { Span = tag_words }
}

[-- Attachment #3: test.md --]
[-- Type: text/markdown, Size: 777 bytes --]

---
glossary:
  glossary_lang: grc
  entries:
  - headword: ἀγαθός
    transliteration: agathos
    text: "□ *pt.* bom;  □ *en.* good; and so on and so forth"
    match:
    - γαθέ
    - γαθοί
    - κἀγάθ
    - κἀγαθά
    - κἀγαθάς
    - κἀγαθή
    - κἀγαθήν
    - κἀγαθαί
    - κἀγαθοί
    - κἀγαθος
  - headword: ἀγαπᾶν
    transliteration: agapan
    text: "□ *pt.* estar satisfeito, gostar;  □ *en.* be satisfied, like;"
    match:
    - ἀγάπα
    - ἀγάπαις
    - ἀγάπη
    - ἀγάπην
    - ἀγάπης
    - ἀγάπῃ
    - ἀγαπᾶ
    - ἀγαπᾶν
    - ἀγαπᾶς
---

The words [κἀγαθά]{lang=el} and [ἀγαπᾶς]{lang=el}.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
  2022-10-19 21:28                             ` Bastien DUMONT
@ 2022-10-19 22:43                               ` Bernardo C.D.A. Vasconcelos
       [not found]                                 ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Bernardo C.D.A. Vasconcelos @ 2022-10-19 22:43 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Bastien, the only work that I was left with is to say thank you very 
much. I did some simple testing, and it seems quite elegant. Do I have 
your permission to share it with others later, giving proper 
attribution?

On 19 Oct 2022, at 18:28, Bastien DUMONT wrote:

> I think that the attached script could be a good starting point.
>
> Le Wednesday 19 October 2022 à 04:50:25PM, Bernardo C.D.A. 
> Vasconcelos a écrit :
>> I have found this little script that takes me nearly there:
>>
>> local vars = {}
>>
>> function Meta(meta)
>>     for k, v in pairs(meta) do
>>         vars["%" .. k .. "%"] = v
>>     end
>> end
>>
>> function Str(elem)
>>     if vars[elem.text] then
>>         return vars[elem.text]
>>     else
>>         return elem
>>     end
>> end
>>
>> return {
>>     { Meta = Meta },
>>     { Str  = Str  }
>> }
>>
>>
>> Instead, we would use: meta.glossary.entries. The crux for me is 
>> looping
>> through the list of entries, adding all the values of the to_match 
>> field
>> (a.k.a. known forms) (of each entry) to vars as a key with the 
>> content of some
>> other field (e.g. glslink) as value. E.g. vars[ .. 
>> entry.to_match.each .. ] =
>> entry.glslink.
>>
>> On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:
>>
>>     Yes, it could! You would have access to the corresponding 
>> metadata object
>>     in the AST.
>>
>>     Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. 
>> Vasconcelos a
>>     écrit :
>>
>>         The data is mostly in database format and could be output in 
>> the best
>>         format
>>         for the task, but I wanted to make it friendly for other 
>> people to use
>>         as well.
>>         Could a YAML metadata block be a solution?
>>
>>         glossary:
>>         glossary_lang: grc
>>         entries:
>>         - headword: ἀγαθός
>>         text: "□ *pt.* bom; □ *en.* good; and so on and so forth"
>>         match:
>>         - γαθέ
>>         - γαθοί
>>         - κἀγάθ
>>         - κἀγαθά
>>         - κἀγαθάς
>>         - κἀγαθή
>>         - κἀγαθήν
>>         - κἀγαθαί
>>         - κἀγαθοί
>>         - κἀγαθος
>>         - headword: ἀγαπᾶν
>>         transliteration: agapan
>>         text: "□ *pt.* estar satisfeito, gostar; □ *en.* be 
>> satisfied, like;"
>>         match:
>>         - ἀγάπα
>>         - ἀγάπαις
>>         - ἀγάπη
>>         - ἀγάπην
>>         - ἀγάπης
>>         - ἀγάπῃ
>>         - ἀγαπᾶ
>>         - ἀγαπᾶν
>>         - ἀγαπᾶς
>>
>>         On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
>>
>>         No, citeproc receives a data structure produced by pandoc. 
>> Pandoc is
>>         responsible for the parsing. I think that your script would 
>> not be so
>>         hard
>>         to rewrite in Lua, the main problem is to know if you can 
>> achieve your
>>         goals this way. If your main concern is portability, then 
>> writing a Lua
>>         filter with no dependancies certainly is a good solution 
>> provided that
>>         you
>>         feed it with a Lua data structure (or embed the code 
>> responsible for
>>         JSON
>>         parsing in your script).
>>
>>         Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. 
>> Vasconcelos a
>>         écrit :
>>
>>         Thank you for the suggestions, Bastien. There is technically 
>> no need
>>         for
>>         regex, as all the forms are spelled out to avoid the need to 
>> create ad
>>         hoc
>>         regex rules for each term. Now that I think about it, the 
>> principle is
>>         the
>>         same as Citeproc's: a tagged inline element will be matched 
>> against a
>>         lookup
>>         table and replaced. I will look at the citeproc code to see 
>> if it leads
>>         anywhere or if it could be reused in anyway.
>>
>>         On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
>>
>>         Yes, but it is limited to this utf8 library. For instance, if
>>         perform a
>>         regexp search like `string.match('ἀγαθός', '[γδ]')`, 
>> it try to
>>         match one
>>         of the four bytes inside the square brackets against the 
>> string
>>         'ἀγαθός', so it will return the first byte of γ, not 
>> γ. To
>>         circumvent
>>         this limitation, you would be forced to test γ and δ 
>> separately.
>>         Nevertheless, if you always perform comparisons between whole
>>         strings as
>>         you currently do in your script, this should not be a 
>> problem.
>>
>>         As for your concern with dependancies, you most probably 
>> would have
>>         to
>>         rely on a JSON library such as lunajson. However, if your 
>> JSON
>>         files are
>>         not supposed to change, you could also convert them to a Lua 
>> file
>>         using
>>         a JSON library and a serialization library, so as to be able 
>> to
>>         import
>>         the resulting Lua data structure directly in your filter.
>>
>>         Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
>>         Vasconcelos a
>>         écrit :
>>
>>         As for translating the filter note that Lua can't really
>>         handle
>>         UTF-8.
>>         There is some rudimentary support for converting codepoint
>>         number ↔
>>         UTF-8
>>         byte sequences and for iterating through a string of bytes
>>         representing
>>         UTF-8 encoded characters but no concept of chars as opposed
>>         to
>>         bytes.
>>         This
>>         may become a show stopper if you need to manipulate strings
>>         containing
>>         UTF-8 text.
>>
>>         Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 
>> onwards
>>         includes
>>         UTF-8 support. Have you seen it? E.g. [1]https://
>>         q-syshelp.qsc.com/Content/Control_Scripting/
>>         Lua_5.3_Reference_Manual/Standard_Libraries/
>>         4_-_Basic_UTF-8_Support.htm
>>
>>         For Ancient Greek you want grc as the language tag.
>>
>>         Indeed it is (and that is generally what I use), but 
>> ἀγαθός is
>>         just
>>         Polytonic Greek, which is not the same as Ancient Greek.
>>
>>         --
>>         You received this message because you are subscribed to the
>>         Google
>>         Groups "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from
>>         it,
>>         send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [2]https://
>>         groups.google.com/d/msgid/pandoc-discuss/
>>         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it,
>>         send
>>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [3]https://
>>         groups.google.com/d/msgid/pandoc-discuss/
>>         Y07VnbuRsuqUg8US%40localhost.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it, send
>>         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [4][1]https://
>>         groups.google.com/d
>>         /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups
>>         "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it, send
>>         an
>>         email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [5][2]https://
>>         groups.google.com/d/
>>         msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
>>
>>         --
>>         You received this message because you are subscribed to the 
>> Google
>>         Groups
>>         "pandoc-discuss" group.
>>         To unsubscribe from this group and stop receiving emails from 
>> it, send
>>         an email
>>         to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>         To view this discussion on the web visit [7][3]https://
>>         groups.google.com/d/msgid/
>>         pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
>>
>>         References:
>>
>>         [1] [4]https://q-syshelp.qsc.com/Content/Control_Scripting/
>>         Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>>         [2] [5]https://groups.google.com/d/msgid/pandoc-discuss/
>>         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
>>         [3] [6]https://groups.google.com/d/msgid/pandoc-discuss/
>>         Y07VnbuRsuqUg8US%40localhost
>>         [4] [7]https://groups.google.com/d/msgid/pandoc-discuss/
>>         7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
>>         [5] [8]https://groups.google.com/d/msgid/pandoc-discuss/
>>         Y07ji07FFokQdOR%2B%40localhost
>>         [6] [9]mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>         [7] [10]https://groups.google.com/d/msgid/pandoc-discuss/
>>         D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&
>>         utm_source=footer
>>
>>     --
>>     You received this message because you are subscribed to the 
>> Google Groups
>>     "pandoc-discuss" group.
>>     To unsubscribe from this group and stop receiving emails from it, 
>> send an
>>     email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>     To view this discussion on the web visit 
>> [11]https://groups.google.com/d/
>>     msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.
>>
>> --
>> You received this message because you are subscribed to the Google 
>> Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, 
>> send an email
>> to [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> [13]https://groups.google.com/d/msgid/
>> pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.
>>
>> References:
>>
>> [1] https://groups.google.com/d
>> [2] https://groups.google.com/d/
>> [3] https://groups.google.com/d/msgid/
>> [4] 
>> https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
>> [5] 
>> https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
>> [6] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
>> [7] 
>> https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
>> [8] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
>> [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> [10] 
>> https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer
>> [11] 
>> https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost
>> [12] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> [13] 
>> https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com?utm_medium=email&utm_source=footer
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/Y1BsCdqttFxOi/pa%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6%40gmail.com.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Glossary Filter for MD2Tex
       [not found]                                 ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-10-20  7:16                                   ` Bastien DUMONT
  0 siblings, 0 replies; 13+ messages in thread
From: Bastien DUMONT @ 2022-10-20  7:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Sure! Let's say that it is under the MIT license, like the filters in the official repo.

Le Wednesday 19 October 2022 à 07:43:36PM, Bernardo C.D.A. Vasconcelos a écrit :
> Bastien, the only work that I was left with is to say thank you very much. I
> did some simple testing, and it seems quite elegant. Do I have your
> permission to share it with others later, giving proper attribution?
> 
> On 19 Oct 2022, at 18:28, Bastien DUMONT wrote:
> 
> > I think that the attached script could be a good starting point.
> > 
> > Le Wednesday 19 October 2022 à 04:50:25PM, Bernardo C.D.A. Vasconcelos a
> > écrit :
> > > I have found this little script that takes me nearly there:
> > > 
> > > local vars = {}
> > > 
> > > function Meta(meta)
> > >     for k, v in pairs(meta) do
> > >         vars["%" .. k .. "%"] = v
> > >     end
> > > end
> > > 
> > > function Str(elem)
> > >     if vars[elem.text] then
> > >         return vars[elem.text]
> > >     else
> > >         return elem
> > >     end
> > > end
> > > 
> > > return {
> > >     { Meta = Meta },
> > >     { Str  = Str  }
> > > }
> > > 
> > > 
> > > Instead, we would use: meta.glossary.entries. The crux for me is
> > > looping
> > > through the list of entries, adding all the values of the to_match
> > > field
> > > (a.k.a. known forms) (of each entry) to vars as a key with the
> > > content of some
> > > other field (e.g. glslink) as value. E.g. vars[ ..
> > > entry.to_match.each .. ] =
> > > entry.glslink.
> > > 
> > > On 18 Oct 2022, at 19:06, Bastien DUMONT wrote:
> > > 
> > >     Yes, it could! You would have access to the corresponding
> > > metadata object
> > >     in the AST.
> > > 
> > >     Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A.
> > > Vasconcelos a
> > >     écrit :
> > > 
> > >         The data is mostly in database format and could be output in
> > > the best
> > >         format
> > >         for the task, but I wanted to make it friendly for other
> > > people to use
> > >         as well.
> > >         Could a YAML metadata block be a solution?
> > > 
> > >         glossary:
> > >         glossary_lang: grc
> > >         entries:
> > >         - headword: ἀγαθός
> > >         text: "□ *pt.* bom; □ *en.* good; and so on and so forth"
> > >         match:
> > >         - γαθέ
> > >         - γαθοί
> > >         - κἀγάθ
> > >         - κἀγαθά
> > >         - κἀγαθάς
> > >         - κἀγαθή
> > >         - κἀγαθήν
> > >         - κἀγαθαί
> > >         - κἀγαθοί
> > >         - κἀγαθος
> > >         - headword: ἀγαπᾶν
> > >         transliteration: agapan
> > >         text: "□ *pt.* estar satisfeito, gostar; □ *en.* be
> > > satisfied, like;"
> > >         match:
> > >         - ἀγάπα
> > >         - ἀγάπαις
> > >         - ἀγάπη
> > >         - ἀγάπην
> > >         - ἀγάπης
> > >         - ἀγάπῃ
> > >         - ἀγαπᾶ
> > >         - ἀγαπᾶν
> > >         - ἀγαπᾶς
> > > 
> > >         On 18 Oct 2022, at 14:34, Bastien DUMONT wrote:
> > > 
> > >         No, citeproc receives a data structure produced by pandoc.
> > > Pandoc is
> > >         responsible for the parsing. I think that your script would
> > > not be so
> > >         hard
> > >         to rewrite in Lua, the main problem is to know if you can
> > > achieve your
> > >         goals this way. If your main concern is portability, then
> > > writing a Lua
> > >         filter with no dependancies certainly is a good solution
> > > provided that
> > >         you
> > >         feed it with a Lua data structure (or embed the code
> > > responsible for
> > >         JSON
> > >         parsing in your script).
> > > 
> > >         Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A.
> > > Vasconcelos a
> > >         écrit :
> > > 
> > >         Thank you for the suggestions, Bastien. There is technically
> > > no need
> > >         for
> > >         regex, as all the forms are spelled out to avoid the need to
> > > create ad
> > >         hoc
> > >         regex rules for each term. Now that I think about it, the
> > > principle is
> > >         the
> > >         same as Citeproc's: a tagged inline element will be matched
> > > against a
> > >         lookup
> > >         table and replaced. I will look at the citeproc code to see
> > > if it leads
> > >         anywhere or if it could be reused in anyway.
> > > 
> > >         On 18 Oct 2022, at 13:34, Bastien DUMONT wrote:
> > > 
> > >         Yes, but it is limited to this utf8 library. For instance, if
> > >         perform a
> > >         regexp search like `string.match('ἀγαθός', '[γδ]')`, it try
> > > to
> > >         match one
> > >         of the four bytes inside the square brackets against the
> > > string
> > >         'ἀγαθός', so it will return the first byte of γ, not γ. To
> > >         circumvent
> > >         this limitation, you would be forced to test γ and δ
> > > separately.
> > >         Nevertheless, if you always perform comparisons between whole
> > >         strings as
> > >         you currently do in your script, this should not be a
> > > problem.
> > > 
> > >         As for your concern with dependancies, you most probably
> > > would have
> > >         to
> > >         rely on a JSON library such as lunajson. However, if your
> > > JSON
> > >         files are
> > >         not supposed to change, you could also convert them to a Lua
> > > file
> > >         using
> > >         a JSON library and a serialization library, so as to be able
> > > to
> > >         import
> > >         the resulting Lua data structure directly in your filter.
> > > 
> > >         Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A.
> > >         Vasconcelos a
> > >         écrit :
> > > 
> > >         As for translating the filter note that Lua can't really
> > >         handle
> > >         UTF-8.
> > >         There is some rudimentary support for converting codepoint
> > >         number ↔
> > >         UTF-8
> > >         byte sequences and for iterating through a string of bytes
> > >         representing
> > >         UTF-8 encoded characters but no concept of chars as opposed
> > >         to
> > >         bytes.
> > >         This
> > >         may become a show stopper if you need to manipulate strings
> > >         containing
> > >         UTF-8 text.
> > > 
> > >         Thanks, @BPJ, for the explanation. Apparently, Lua 5.3
> > > onwards
> > >         includes
> > >         UTF-8 support. Have you seen it? E.g. [1]https://
> > >         q-syshelp.qsc.com/Content/Control_Scripting/
> > >         Lua_5.3_Reference_Manual/Standard_Libraries/
> > >         4_-_Basic_UTF-8_Support.htm
> > > 
> > >         For Ancient Greek you want grc as the language tag.
> > > 
> > >         Indeed it is (and that is generally what I use), but ἀγαθός
> > > is
> > >         just
> > >         Polytonic Greek, which is not the same as Ancient Greek.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > >         Google
> > >         Groups "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails from
> > >         it,
> > >         send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [2]https://
> > >         groups.google.com/d/msgid/pandoc-discuss/
> > >         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it,
> > >         send
> > >         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [3]https://
> > >         groups.google.com/d/msgid/pandoc-discuss/
> > >         Y07VnbuRsuqUg8US%40localhost.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it, send
> > >         an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [4][1]https://
> > >         groups.google.com/d
> > >         /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups
> > >         "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it, send
> > >         an
> > >         email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [5][2]https://
> > >         groups.google.com/d/
> > >         msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost.
> > > 
> > >         --
> > >         You received this message because you are subscribed to the
> > > Google
> > >         Groups
> > >         "pandoc-discuss" group.
> > >         To unsubscribe from this group and stop receiving emails
> > > from it, send
> > >         an email
> > >         to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >         To view this discussion on the web visit [7][3]https://
> > >         groups.google.com/d/msgid/
> > >         pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com.
> > > 
> > >         References:
> > > 
> > >         [1] [4]https://q-syshelp.qsc.com/Content/Control_Scripting/
> > >         Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> > >         [2] [5]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
> > >         [3] [6]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         Y07VnbuRsuqUg8US%40localhost
> > >         [4] [7]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
> > >         [5] [8]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         Y07ji07FFokQdOR%2B%40localhost
> > >         [6] [9]mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > >         [7] [10]https://groups.google.com/d/msgid/pandoc-discuss/
> > >         D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&
> > >         utm_source=footer
> > > 
> > >     --
> > >     You received this message because you are subscribed to the
> > > Google Groups
> > >     "pandoc-discuss" group.
> > >     To unsubscribe from this group and stop receiving emails from
> > > it, send an
> > >     email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > >     To view this discussion on the web visit
> > > [11]https://groups.google.com/d/
> > >     msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost.
> > > 
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups
> > > "pandoc-discuss" group.
> > > To unsubscribe from this group and stop receiving emails from it,
> > > send an email
> > > to [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > > To view this discussion on the web visit
> > > [13]https://groups.google.com/d/msgid/
> > > pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com.
> > > 
> > > References:
> > > 
> > > [1] https://groups.google.com/d
> > > [2] https://groups.google.com/d/
> > > [3] https://groups.google.com/d/msgid/
> > > [4] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm
> > > [5] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com
> > > [6] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost
> > > [7] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com
> > > [8] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost
> > > [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > > [10] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer
> > > [11] https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost
> > > [12] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > > [13] https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com?utm_medium=email&utm_source=footer
> > 
> > -- 
> > You received this message because you are subscribed to the Google
> > Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y1BsCdqttFxOi/pa%40localhost.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6%40gmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y1D1xMX37opBqnii%40localhost.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-10-20  7:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17 18:25 Glossary Filter for MD2Tex Bernardo C. D. A. Vasconcelos
     [not found] ` <88a14108-f2e4-40d0-a98e-5c6f84b8ff41n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-10-17 18:38   ` BPJ
     [not found]     ` <CADAJKhCVT-PNRsSgr5hU7Zzwaq3fN+CF3SGA5mTLrc2As+R6rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-10-18 15:36       ` Bernardo C.D.A. Vasconcelos
     [not found]         ` <3307993F-F813-405F-BFEC-F17FAF27BEA5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 16:34           ` Bastien DUMONT
2022-10-18 17:16             ` Bernardo C.D.A. Vasconcelos
     [not found]               ` <7072522D-F2FE-4BAC-A575-93426852FCFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 17:34                 ` Bastien DUMONT
2022-10-18 21:43                   ` Bernardo C.D.A. Vasconcelos
     [not found]                     ` <D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-18 22:06                       ` Bastien DUMONT
2022-10-19 19:50                         ` Bernardo C.D.A. Vasconcelos
     [not found]                           ` <B93B3CA7-A461-4056-929D-592B578B184F-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-19 21:28                             ` Bastien DUMONT
2022-10-19 22:43                               ` Bernardo C.D.A. Vasconcelos
     [not found]                                 ` <272DFB73-CD83-4A77-B2C5-CCF1AF7B6BF6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-10-20  7:16                                   ` Bastien DUMONT
2022-10-18 18:42           ` BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).