I think that the attached script could be a good starting point. Le Wednesday 19 October 2022 à 04:50:25PM, Bernardo C.D.A. Vasconcelos a écrit : > I have found this little script that takes me nearly there: > > local vars = {} > > function Meta(meta) > for k, v in pairs(meta) do > vars["%" .. k .. "%"] = v > end > end > > function Str(elem) > if vars[elem.text] then > return vars[elem.text] > else > return elem > end > end > > return { > { Meta = Meta }, > { Str = Str } > } > > > Instead, we would use: meta.glossary.entries. The crux for me is looping > through the list of entries, adding all the values of the to_match field > (a.k.a. known forms) (of each entry) to vars as a key with the content of some > other field (e.g. glslink) as value. E.g. vars[ .. entry.to_match.each .. ] = > entry.glslink. > > On 18 Oct 2022, at 19:06, Bastien DUMONT wrote: > > Yes, it could! You would have access to the corresponding metadata object > in the AST. > > Le Tuesday 18 October 2022 à 06:43:48PM, Bernardo C.D.A. Vasconcelos a > écrit : > > The data is mostly in database format and could be output in the best > format > for the task, but I wanted to make it friendly for other people to use > as well. > Could a YAML metadata block be a solution? > > glossary: > glossary_lang: grc > entries: > - headword: ἀγαθός > text: "□ *pt.* bom; □ *en.* good; and so on and so forth" > match: > - γαθέ > - γαθοί > - κἀγάθ > - κἀγαθά > - κἀγαθάς > - κἀγαθή > - κἀγαθήν > - κἀγαθαί > - κἀγαθοί > - κἀγαθος > - headword: ἀγαπᾶν > transliteration: agapan > text: "□ *pt.* estar satisfeito, gostar; □ *en.* be satisfied, like;" > match: > - ἀγάπα > - ἀγάπαις > - ἀγάπη > - ἀγάπην > - ἀγάπης > - ἀγάπῃ > - ἀγαπᾶ > - ἀγαπᾶν > - ἀγαπᾶς > > On 18 Oct 2022, at 14:34, Bastien DUMONT wrote: > > No, citeproc receives a data structure produced by pandoc. Pandoc is > responsible for the parsing. I think that your script would not be so > hard > to rewrite in Lua, the main problem is to know if you can achieve your > goals this way. If your main concern is portability, then writing a Lua > filter with no dependancies certainly is a good solution provided that > you > feed it with a Lua data structure (or embed the code responsible for > JSON > parsing in your script). > > Le Tuesday 18 October 2022 à 02:16:16PM, Bernardo C.D.A. Vasconcelos a > écrit : > > Thank you for the suggestions, Bastien. There is technically no need > for > regex, as all the forms are spelled out to avoid the need to create ad > hoc > regex rules for each term. Now that I think about it, the principle is > the > same as Citeproc's: a tagged inline element will be matched against a > lookup > table and replaced. I will look at the citeproc code to see if it leads > anywhere or if it could be reused in anyway. > > On 18 Oct 2022, at 13:34, Bastien DUMONT wrote: > > Yes, but it is limited to this utf8 library. For instance, if > perform a > regexp search like `string.match('ἀγαθός', '[γδ]')`, it try to > match one > of the four bytes inside the square brackets against the string > 'ἀγαθός', so it will return the first byte of γ, not γ. To > circumvent > this limitation, you would be forced to test γ and δ separately. > Nevertheless, if you always perform comparisons between whole > strings as > you currently do in your script, this should not be a problem. > > As for your concern with dependancies, you most probably would have > to > rely on a JSON library such as lunajson. However, if your JSON > files are > not supposed to change, you could also convert them to a Lua file > using > a JSON library and a serialization library, so as to be able to > import > the resulting Lua data structure directly in your filter. > > Le Tuesday 18 October 2022 à 12:36:03PM, Bernardo C.D.A. > Vasconcelos a > écrit : > > As for translating the filter note that Lua can't really > handle > UTF-8. > There is some rudimentary support for converting codepoint > number ↔ > UTF-8 > byte sequences and for iterating through a string of bytes > representing > UTF-8 encoded characters but no concept of chars as opposed > to > bytes. > This > may become a show stopper if you need to manipulate strings > containing > UTF-8 text. > > Thanks, @BPJ, for the explanation. Apparently, Lua 5.3 onwards > includes > UTF-8 support. Have you seen it? E.g. [1]https:// > q-syshelp.qsc.com/Content/Control_Scripting/ > Lua_5.3_Reference_Manual/Standard_Libraries/ > 4_-_Basic_UTF-8_Support.htm > > For Ancient Greek you want grc as the language tag. > > Indeed it is (and that is generally what I use), but ἀγαθός is > just > Polytonic Greek, which is not the same as Ancient Greek. > > -- > You received this message because you are subscribed to the > Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from > it, > send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [2]https:// > groups.google.com/d/msgid/pandoc-discuss/ > 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com. > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, > send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [3]https:// > groups.google.com/d/msgid/pandoc-discuss/ > Y07VnbuRsuqUg8US%40localhost. > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [4][1]https:// > groups.google.com/d > /msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com. > > -- > You received this message because you are subscribed to the Google > Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [5][2]https:// > groups.google.com/d/ > msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost. > > -- > You received this message because you are subscribed to the Google > Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email > to [6]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [7][3]https:// > groups.google.com/d/msgid/ > pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com. > > References: > > [1] [4]https://q-syshelp.qsc.com/Content/Control_Scripting/ > Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm > [2] [5]https://groups.google.com/d/msgid/pandoc-discuss/ > 3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com > [3] [6]https://groups.google.com/d/msgid/pandoc-discuss/ > Y07VnbuRsuqUg8US%40localhost > [4] [7]https://groups.google.com/d/msgid/pandoc-discuss/ > 7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com > [5] [8]https://groups.google.com/d/msgid/pandoc-discuss/ > Y07ji07FFokQdOR%2B%40localhost > [6] [9]mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [7] [10]https://groups.google.com/d/msgid/pandoc-discuss/ > D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email& > utm_source=footer > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [11]https://groups.google.com/d/ > msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to [12]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [13]https://groups.google.com/d/msgid/ > pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com. > > References: > > [1] https://groups.google.com/d > [2] https://groups.google.com/d/ > [3] https://groups.google.com/d/msgid/ > [4] https://q-syshelp.qsc.com/Content/Control_Scripting/Lua_5.3_Reference_Manual/Standard_Libraries/4_-_Basic_UTF-8_Support.htm > [5] https://groups.google.com/d/msgid/pandoc-discuss/3307993F-F813-405F-BFEC-F17FAF27BEA5%40gmail.com > [6] https://groups.google.com/d/msgid/pandoc-discuss/Y07VnbuRsuqUg8US%40localhost > [7] https://groups.google.com/d/msgid/pandoc-discuss/7072522D-F2FE-4BAC-A575-93426852FCFB%40gmail.com > [8] https://groups.google.com/d/msgid/pandoc-discuss/Y07ji07FFokQdOR%2B%40localhost > [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [10] https://groups.google.com/d/msgid/pandoc-discuss/D4CB4B20-A1D5-49C8-BA96-2E37BA4FB779%40gmail.com?utm_medium=email&utm_source=footer > [11] https://groups.google.com/d/msgid/pandoc-discuss/Y08jckNrIpxbW6nR%40localhost > [12] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [13] https://groups.google.com/d/msgid/pandoc-discuss/B93B3CA7-A461-4056-929D-592B578B184F%40gmail.com?utm_medium=email&utm_source=footer -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Y1BsCdqttFxOi/pa%40localhost.