I got interested in another aspect of the posting -- the program " cleanbib.pl" by Benct. I installed it in Ubuntu, and found out it does not process perfectly valid TeX code like characters that end or have a space in the middle, or that it processes \c{e}, but not the comma-accent any of the other vowels... I prepared the torture test below to show the problems: @Book{hobbit, title = {Les \oe uf de la serpente}, address = {Bla\v zi\'c}, publisher = {\c{a} \c{e} \c{i} \c{o} \c{u}}, } and above all, how does this compare to: https://ctan.org/tex-archive/support/bibtexperllibs/LaTeX-ToUnicode Paulo Ney On Sat, Jul 2, 2022 at 1:03 PM BPJ wrote: > string.gsub() optionally takes the maximum number of substitutions as a > fourth argument, and you can reinsert capture groups in the replacement, so > this should be fairly robust: > > ``````lua > string.gsub(title, '%:(%s)', '.%1', 1) > `````` > > > Den fre 1 juli 2022 18:44John Carter Wood skrev: > >> Ah, of course, biblical references. Religious history is one of my >> fields, how could I miss that? >> >> Looking forward to trying this out! >> >> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:41:02 UTC+2: >> >>> A slightly more reliable version: >>> >>> >>> >>> ``` >>> >>> local stringify = pandoc.utils.stringify >>> >>> function Meta(m) >>> >>> if m.references ~= nil then >>> >>> for _, el in ipairs (m.references) do >>> >>> -- print(stringify(el.title)) >>> >>> el.title = pandoc.Str(string.gsub(stringify(el.title), ': ', '. >>> ')) >>> >>> -- print(el.title) >>> >>> end >>> >>> end >>> >>> return m >>> >>> end``` >>> >>> >>> >>> (This won’t replace colons in biblical references, e.g. Gen 1:1) >>> >>> >>> >>> You can test with this file : >>> >>> >>> >>> ```markdown >>> >>> --- >>> >>> references: >>> >>> - type: book >>> >>> id: doe >>> >>> author: >>> >>> - family: Doe >>> >>> given: Jane >>> >>> issued: >>> >>> date-parts: >>> >>> - - 2022 >>> >>> title: 'A book: with a subtitle and a reference to Gen 1:1, but that >>> is not a problem' >>> >>> publisher: 'Whatever press' >>> >>> lang: de-De >>> >>> ... >>> >>> >>> >>> test [@doe] >>> >>> ``` >>> >>> >>> >>> The filter itself does not cover capitalization. For some reason, >>> pandoc or citeproc applies title-case transformation here. I don’t think it >>> should though. >>> >>> >>> >>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>> Auftrag von *John Carter Wood >>> *Gesendet:* Freitag, 1. Juli 2022 18:24 >>> *An:* pandoc-discuss >>> *Betreff:* Re: Changing colons to full-stops in titles >>> >>> >>> >>> That's very interesting, thanks! I'll try it out when I get a chance in >>> the coming days. >>> >>> I have thought about this issue of false positives while thinking about >>> the option of some kind of filter. But...I think they would be very rare. I >>> have a hard time thinking of a title with a colon in it that shouldn't be >>> -- in this case -- be turned into a dot. At least, I don't have anything in >>> my 1,200 references where I can see that that wouldn't apply. >>> >>> Although, of course, I'm sure there are some out there... >>> >>> Just a question: would this also ensure that the first word after the >>> dot is capitalised? Or does that open a new series of problems? :-) >>> >>> >>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:17:02 UTC+2: >>> >>> Here’s a very simple and absolutely unreliable version of a filter. This >>> will replace every colon in a title with a period. >>> >>> >>> >>> ```lua >>> >>> local stringify = pandoc.utils.stringify >>> >>> function Meta(m) >>> >>> if m.references ~= nil then >>> >>> for _, el in ipairs (m.references) do >>> >>> print(stringify(el.title)) >>> >>> el.title = pandoc.Str(string.gsub(stringify(el.title), ':', '.')) >>> >>> print(el.title) >>> >>> end >>> >>> end >>> >>> return m >>> >>> end >>> >>> ``` >>> >>> >>> >>> Question is how this can be made robust enough to avoid false positives. >>> >>> >>> >>> >>> >>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>> Auftrag von *John Carter Wood >>> *Gesendet:* Freitag, 1. Juli 2022 17:52 >>> *An:* pandoc-discuss >>> *Betreff:* Re: Changing colons to full-stops in titles >>> >>> >>> >>> Thanks for the suggestions, a couple of which are kind of stretching my >>> knowledge of these things, but I see where they're going. >>> >>> As to JGM's question: I am using a CSL json bibliography, so my titles >>> are in a single field. ("title":"Science and religion: new perspectives on >>> the dialogue") >>> >>> The issue is that *most* of the journals / publishers I publish in use, >>> as here, the colon. *Some* (mainly German) styles want the period. If I >>> were solely interested in either one, I could choose and just enter the >>> relevant punctuation in the title field. However, I want to continue saving >>> my bibliographic entries with a colon (because that's the most standard one >>> for me), but have the option of automatically converting them to a period >>> for those cases where I need to. If that makes sense. >>> >>> Thus: going through denis's options: >>> >>> 1. I have switched to json bibliographies from bibtex/biblatex as they >>> seemed to offer more flexibility (I was running into issue with the strange >>> archival references I have to make in my field, and JSON seemed to work >>> better in that regard). So this seems to not apply. >>> >>> 2. Seems to not apply, as I have a single title field >>> >>> 3. Sounds really interesting, and I use BBT, though it also sounds like >>> I would here have to create a separate bibliography file from my Zotero >>> database for those publishers/styles that require the dot. This is not >>> *too* onerous, as it would at least be automated. >>> >>> 4. Having a filter that I could simply apply (as part of a pandoc >>> command, say) or not apply as relevant seems like the most flexible / >>> efficient solution. I don't know lua, but if this is one possible way, then >>> I could use it as a (hopefully fairly simple?) way into learning it. >>> >>> >>> >>> Does this help to clarify my situation? >>> >>> >>> >>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 17:34:55 UTC+2: >>> >>> Yes, that’s a known issue... >>> >>> There are a couple of possible solutions : >>> >>> >>> >>> 1. use biblatex databases and patch pandoc so it will concat title and >>> subtitle fields using periods. (line 667 >>> https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Citeproc/BibTeX.hs >>> ) >>> >>> >>> >>> 2. I think pandoc’s citeproc will just treat every unknown variable as a >>> string variable (see >>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L1054 >>> and >>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L901), >>> so you should be able to use «subtitle» in styles. (This will give you >>> warnings when using the style with Zotero and it won’t work reliably across >>> implementations, but anyway ...) >>> >>> >>> >>> 3. if you’re using Zotero, you can leverage Zotero BBT’s postscript >>> feature to manipulate the JSON after exporting. >>> >>> E.g., this one : >>> >>> if (Translator.BetterCSL && item.title) { >>> >>> reference.title = reference.title.replace(/ : /g, '. ') >>> >>> } >>> >>> Not bullet-proof, but simple. You will want to choose a better >>> separator, maybe a double-bar or so. >>> >>> >>> >>> 4. Doing the with lua should also be possible... >>> >>> >>> >>> The question is: do you have the subtitle in a distinct field or is it >>> just in the title field? >>> >>> >>> >>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>> Auftrag von *John Carter Wood >>> *Gesendet:* Freitag, 1. Juli 2022 16:39 >>> *An:* pandoc-discuss >>> *Betreff:* Changing colons to full-stops in titles >>> >>> >>> >>> I have one final (for now...) issue in setting up a CSL file (which I >>> use with pandoc/citeproc and references in a json file). >>> >>> >>> >>> I'm not sure whether this is a CSL issue or whether it's an issue that >>> can be solved via using a filter (or some other solution) in pandoc, but I >>> thought there might be some people here who might have faced a similar >>> issue. >>> >>> >>> >>> The house style for here (German-based publisher) wants a *full-stop/period >>> *between main title and subtitle in citations / bibliographies; US/UK >>> standard is a *colon* between main title and subtitle. And reference >>> managers like Zotero -- IIUC -- save titles as single fields (at least they >>> are in my version of Zotero). So it doesn't seem like it is possible to >>> control what delimiter is used between them via CSL. >>> >>> >>> I have found various discussions of relevant title/subtitle division >>> issues -- some going back quite a few years -- in forums on Zotero: >>> >>> >>> https://forums.zotero.org/discussion/8077/separate-fields-for-title-and-subtitle/ >>> >>> ...and CSL: >>> >>> >>> https://discourse.citationstyles.org/t/handling-main-sub-title-splits-citeproc-js/1563/11 >>> >>> >>> >>> However, these were in part discussions among developers about >>> *possible* changes, and I'm not sure of the current status of this >>> issue or whether there is a way to handle it. >>> >>> Would it be possible to automate turning colons in titles into >>> full-stops via using a filter? If so is there such a filter already around? >>> Can this be done via CSL? >>> >>> >>> >>> Or is this, as of now, impossible? >>> >>> (Or is there a real simple solution that I have, as usual, >>> overlooked...) >>> >>> -- >>> >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/78df697a-50f5-46d0-b0b8-29a2cbc9509an%40googlegroups.com >>> >>> . >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/a2d540a6-a435-4285-aed5-018007d155cfn%40googlegroups.com >>> >>> . >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/f0f222ef-e60e-4397-83ac-bec1a6ac2d08n%40googlegroups.com >>> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/b3deb0de-8ba0-4159-b9f3-1ecfbe68d457n%40googlegroups.com >> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAU66TxJKMZdDM-KVabJpmKUVo5xyuAAN03F2b89jv9Ow%40mail.gmail.com > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMyj_GZ%3DAo_1qR2rwnAAYAaQ%3DMaf880cGLRv7yD_ianpQ%40mail.gmail.com.