It's an upstream bug in LaTeX::ToUnicode. I just had never run into it AFAIK because all the .bib files I had written myself or downloaded from the libraries I use had used `\"{a}` rather than `\"a` which doesn't hit the bug. I have located the bug and am working on a patch. Thanks for discovering this! (There are a lot of unattended bugs though. Do you want me to send you the patch when it is ready?) Den sön 3 juli 2022 07:42Paulo Ney de Souza skrev: > I got interested in another aspect of the posting -- the program " > cleanbib.pl" by Benct. > > I installed it in Ubuntu, and found out it does not process perfectly > valid TeX code like characters that end or have a space in the middle, or > that it processes \c{e}, but not the comma-accent any of the other vowels... > > I prepared the torture test below to show the problems: > > @Book{hobbit, > title = {Les \oe uf de la serpente}, > address = {Bla\v zi\'c}, > publisher = {\c{a} \c{e} \c{i} \c{o} \c{u}}, > } > > and above all, how does this compare to: > > https://ctan.org/tex-archive/support/bibtexperllibs/LaTeX-ToUnicode > > Paulo Ney > > > On Sat, Jul 2, 2022 at 1:03 PM BPJ wrote: > >> string.gsub() optionally takes the maximum number of substitutions as a >> fourth argument, and you can reinsert capture groups in the replacement, so >> this should be fairly robust: >> >> ``````lua >> string.gsub(title, '%:(%s)', '.%1', 1) >> `````` >> >> >> Den fre 1 juli 2022 18:44John Carter Wood skrev: >> >>> Ah, of course, biblical references. Religious history is one of my >>> fields, how could I miss that? >>> >>> Looking forward to trying this out! >>> >>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:41:02 UTC+2: >>> >>>> A slightly more reliable version: >>>> >>>> >>>> >>>> ``` >>>> >>>> local stringify = pandoc.utils.stringify >>>> >>>> function Meta(m) >>>> >>>> if m.references ~= nil then >>>> >>>> for _, el in ipairs (m.references) do >>>> >>>> -- print(stringify(el.title)) >>>> >>>> el.title = pandoc.Str(string.gsub(stringify(el.title), ': ', '. >>>> ')) >>>> >>>> -- print(el.title) >>>> >>>> end >>>> >>>> end >>>> >>>> return m >>>> >>>> end``` >>>> >>>> >>>> >>>> (This won’t replace colons in biblical references, e.g. Gen 1:1) >>>> >>>> >>>> >>>> You can test with this file : >>>> >>>> >>>> >>>> ```markdown >>>> >>>> --- >>>> >>>> references: >>>> >>>> - type: book >>>> >>>> id: doe >>>> >>>> author: >>>> >>>> - family: Doe >>>> >>>> given: Jane >>>> >>>> issued: >>>> >>>> date-parts: >>>> >>>> - - 2022 >>>> >>>> title: 'A book: with a subtitle and a reference to Gen 1:1, but that >>>> is not a problem' >>>> >>>> publisher: 'Whatever press' >>>> >>>> lang: de-De >>>> >>>> ... >>>> >>>> >>>> >>>> test [@doe] >>>> >>>> ``` >>>> >>>> >>>> >>>> The filter itself does not cover capitalization. For some reason, >>>> pandoc or citeproc applies title-case transformation here. I don’t think it >>>> should though. >>>> >>>> >>>> >>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>>> Auftrag von *John Carter Wood >>>> *Gesendet:* Freitag, 1. Juli 2022 18:24 >>>> *An:* pandoc-discuss >>>> *Betreff:* Re: Changing colons to full-stops in titles >>>> >>>> >>>> >>>> That's very interesting, thanks! I'll try it out when I get a chance in >>>> the coming days. >>>> >>>> I have thought about this issue of false positives while thinking about >>>> the option of some kind of filter. But...I think they would be very rare. I >>>> have a hard time thinking of a title with a colon in it that shouldn't be >>>> -- in this case -- be turned into a dot. At least, I don't have anything in >>>> my 1,200 references where I can see that that wouldn't apply. >>>> >>>> Although, of course, I'm sure there are some out there... >>>> >>>> Just a question: would this also ensure that the first word after the >>>> dot is capitalised? Or does that open a new series of problems? :-) >>>> >>>> >>>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:17:02 UTC+2: >>>> >>>> Here’s a very simple and absolutely unreliable version of a filter. >>>> This will replace every colon in a title with a period. >>>> >>>> >>>> >>>> ```lua >>>> >>>> local stringify = pandoc.utils.stringify >>>> >>>> function Meta(m) >>>> >>>> if m.references ~= nil then >>>> >>>> for _, el in ipairs (m.references) do >>>> >>>> print(stringify(el.title)) >>>> >>>> el.title = pandoc.Str(string.gsub(stringify(el.title), ':', '.')) >>>> >>>> print(el.title) >>>> >>>> end >>>> >>>> end >>>> >>>> return m >>>> >>>> end >>>> >>>> ``` >>>> >>>> >>>> >>>> Question is how this can be made robust enough to avoid false positives. >>>> >>>> >>>> >>>> >>>> >>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>>> Auftrag von *John Carter Wood >>>> *Gesendet:* Freitag, 1. Juli 2022 17:52 >>>> *An:* pandoc-discuss >>>> *Betreff:* Re: Changing colons to full-stops in titles >>>> >>>> >>>> >>>> Thanks for the suggestions, a couple of which are kind of stretching my >>>> knowledge of these things, but I see where they're going. >>>> >>>> As to JGM's question: I am using a CSL json bibliography, so my titles >>>> are in a single field. ("title":"Science and religion: new perspectives on >>>> the dialogue") >>>> >>>> The issue is that *most* of the journals / publishers I publish in use, >>>> as here, the colon. *Some* (mainly German) styles want the period. If I >>>> were solely interested in either one, I could choose and just enter the >>>> relevant punctuation in the title field. However, I want to continue saving >>>> my bibliographic entries with a colon (because that's the most standard one >>>> for me), but have the option of automatically converting them to a period >>>> for those cases where I need to. If that makes sense. >>>> >>>> Thus: going through denis's options: >>>> >>>> 1. I have switched to json bibliographies from bibtex/biblatex as they >>>> seemed to offer more flexibility (I was running into issue with the strange >>>> archival references I have to make in my field, and JSON seemed to work >>>> better in that regard). So this seems to not apply. >>>> >>>> 2. Seems to not apply, as I have a single title field >>>> >>>> 3. Sounds really interesting, and I use BBT, though it also sounds like >>>> I would here have to create a separate bibliography file from my Zotero >>>> database for those publishers/styles that require the dot. This is not >>>> *too* onerous, as it would at least be automated. >>>> >>>> 4. Having a filter that I could simply apply (as part of a pandoc >>>> command, say) or not apply as relevant seems like the most flexible / >>>> efficient solution. I don't know lua, but if this is one possible way, then >>>> I could use it as a (hopefully fairly simple?) way into learning it. >>>> >>>> >>>> >>>> Does this help to clarify my situation? >>>> >>>> >>>> >>>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 17:34:55 UTC+2: >>>> >>>> Yes, that’s a known issue... >>>> >>>> There are a couple of possible solutions : >>>> >>>> >>>> >>>> 1. use biblatex databases and patch pandoc so it will concat title and >>>> subtitle fields using periods. (line 667 >>>> https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Citeproc/BibTeX.hs >>>> ) >>>> >>>> >>>> >>>> 2. I think pandoc’s citeproc will just treat every unknown variable as >>>> a string variable (see >>>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L1054 >>>> and >>>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L901), >>>> so you should be able to use «subtitle» in styles. (This will give you >>>> warnings when using the style with Zotero and it won’t work reliably across >>>> implementations, but anyway ...) >>>> >>>> >>>> >>>> 3. if you’re using Zotero, you can leverage Zotero BBT’s postscript >>>> feature to manipulate the JSON after exporting. >>>> >>>> E.g., this one : >>>> >>>> if (Translator.BetterCSL && item.title) { >>>> >>>> reference.title = reference.title.replace(/ : /g, '. ') >>>> >>>> } >>>> >>>> Not bullet-proof, but simple. You will want to choose a better >>>> separator, maybe a double-bar or so. >>>> >>>> >>>> >>>> 4. Doing the with lua should also be possible... >>>> >>>> >>>> >>>> The question is: do you have the subtitle in a distinct field or is it >>>> just in the title field? >>>> >>>> >>>> >>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>>> Auftrag von *John Carter Wood >>>> *Gesendet:* Freitag, 1. Juli 2022 16:39 >>>> *An:* pandoc-discuss >>>> *Betreff:* Changing colons to full-stops in titles >>>> >>>> >>>> >>>> I have one final (for now...) issue in setting up a CSL file (which I >>>> use with pandoc/citeproc and references in a json file). >>>> >>>> >>>> >>>> I'm not sure whether this is a CSL issue or whether it's an issue that >>>> can be solved via using a filter (or some other solution) in pandoc, but I >>>> thought there might be some people here who might have faced a similar >>>> issue. >>>> >>>> >>>> >>>> The house style for here (German-based publisher) wants a *full-stop/period >>>> *between main title and subtitle in citations / bibliographies; US/UK >>>> standard is a *colon* between main title and subtitle. And reference >>>> managers like Zotero -- IIUC -- save titles as single fields (at least they >>>> are in my version of Zotero). So it doesn't seem like it is possible to >>>> control what delimiter is used between them via CSL. >>>> >>>> >>>> I have found various discussions of relevant title/subtitle division >>>> issues -- some going back quite a few years -- in forums on Zotero: >>>> >>>> >>>> https://forums.zotero.org/discussion/8077/separate-fields-for-title-and-subtitle/ >>>> >>>> ...and CSL: >>>> >>>> >>>> https://discourse.citationstyles.org/t/handling-main-sub-title-splits-citeproc-js/1563/11 >>>> >>>> >>>> >>>> However, these were in part discussions among developers about >>>> *possible* changes, and I'm not sure of the current status of this >>>> issue or whether there is a way to handle it. >>>> >>>> Would it be possible to automate turning colons in titles into >>>> full-stops via using a filter? If so is there such a filter already around? >>>> Can this be done via CSL? >>>> >>>> >>>> >>>> Or is this, as of now, impossible? >>>> >>>> (Or is there a real simple solution that I have, as usual, >>>> overlooked...) >>>> >>>> -- >>>> >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/pandoc-discuss/78df697a-50f5-46d0-b0b8-29a2cbc9509an%40googlegroups.com >>>> >>>> . >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/pandoc-discuss/a2d540a6-a435-4285-aed5-018007d155cfn%40googlegroups.com >>>> >>>> . >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/pandoc-discuss/f0f222ef-e60e-4397-83ac-bec1a6ac2d08n%40googlegroups.com >>>> >>>> . >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/b3deb0de-8ba0-4159-b9f3-1ecfbe68d457n%40googlegroups.com >>> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAU66TxJKMZdDM-KVabJpmKUVo5xyuAAN03F2b89jv9Ow%40mail.gmail.com >> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMyj_GZ%3DAo_1qR2rwnAAYAaQ%3DMaf880cGLRv7yD_ianpQ%40mail.gmail.com > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBjTdgbY-xDouhDGfnE%2BJ%2BV5c3v0FUA2Hn00z59%3D%3DWeLw%40mail.gmail.com.