Den sön 3 juli 2022 18:55Paulo Ney de Souza skrev: > > On Sun, Jul 3, 2022 at 5:15 AM BPJ wrote: > >> It's an upstream bug in LaTeX::ToUnicode. >> > > On LaTeX::ToUnicode ? I thought you only used BibTeX::Parser. > Which uses LaTeX::ToUnicode in its `cleaned_*` methods. > I just had never run into it AFAIK because all the .bib files I had >> written myself or downloaded from the libraries I use had used `\"{a}` >> rather than `\"a` which doesn't hit the bug. I have located the bug and am >> working on a patch. Thanks for discovering this! (There are a lot of >> unattended bugs though. Do you want me to send you the patch when it is >> ready?) >> > > I know of some bugs on BibTeX::Parser (and none on LaTeX::ToUnicode). > There are some on the old CPAN RT tracker. I don't know if they are de facto fixed. It would be nice to have all of them listed o the issues of the project > page: > > https://github.com/borisveytsman/BibTeXPerlLibs/issues > Thanks for the link. It is missing on MetaCPAN. One of the old bugs complains about sloppy packaging. > especially if you are producing a patch. > Well it will be listed when I submit a PR! > Paulo Ney > > > >> Den sön 3 juli 2022 07:42Paulo Ney de Souza skrev: >> >>> I got interested in another aspect of the posting -- the program " >>> cleanbib.pl" by Benct. >>> >>> I installed it in Ubuntu, and found out it does not process perfectly >>> valid TeX code like characters that end or have a space in the middle, or >>> that it processes \c{e}, but not the comma-accent any of the other vowels... >>> >>> I prepared the torture test below to show the problems: >>> >>> @Book{hobbit, >>> title = {Les \oe uf de la serpente}, >>> address = {Bla\v zi\'c}, >>> publisher = {\c{a} \c{e} \c{i} \c{o} \c{u}}, >>> } >>> >>> and above all, how does this compare to: >>> >>> https://ctan.org/tex-archive/support/bibtexperllibs/LaTeX-ToUnicode >>> >>> Paulo Ney >>> >>> >>> On Sat, Jul 2, 2022 at 1:03 PM BPJ wrote: >>> >>>> string.gsub() optionally takes the maximum number of substitutions as a >>>> fourth argument, and you can reinsert capture groups in the replacement, so >>>> this should be fairly robust: >>>> >>>> ``````lua >>>> string.gsub(title, '%:(%s)', '.%1', 1) >>>> `````` >>>> >>>> >>>> Den fre 1 juli 2022 18:44John Carter Wood skrev: >>>> >>>>> Ah, of course, biblical references. Religious history is one of my >>>>> fields, how could I miss that? >>>>> >>>>> Looking forward to trying this out! >>>>> >>>>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:41:02 UTC+2: >>>>> >>>>>> A slightly more reliable version: >>>>>> >>>>>> >>>>>> >>>>>> ``` >>>>>> >>>>>> local stringify = pandoc.utils.stringify >>>>>> >>>>>> function Meta(m) >>>>>> >>>>>> if m.references ~= nil then >>>>>> >>>>>> for _, el in ipairs (m.references) do >>>>>> >>>>>> -- print(stringify(el.title)) >>>>>> >>>>>> el.title = pandoc.Str(string.gsub(stringify(el.title), ': ', '. >>>>>> ')) >>>>>> >>>>>> -- print(el.title) >>>>>> >>>>>> end >>>>>> >>>>>> end >>>>>> >>>>>> return m >>>>>> >>>>>> end``` >>>>>> >>>>>> >>>>>> >>>>>> (This won’t replace colons in biblical references, e.g. Gen 1:1) >>>>>> >>>>>> >>>>>> >>>>>> You can test with this file : >>>>>> >>>>>> >>>>>> >>>>>> ```markdown >>>>>> >>>>>> --- >>>>>> >>>>>> references: >>>>>> >>>>>> - type: book >>>>>> >>>>>> id: doe >>>>>> >>>>>> author: >>>>>> >>>>>> - family: Doe >>>>>> >>>>>> given: Jane >>>>>> >>>>>> issued: >>>>>> >>>>>> date-parts: >>>>>> >>>>>> - - 2022 >>>>>> >>>>>> title: 'A book: with a subtitle and a reference to Gen 1:1, but >>>>>> that is not a problem' >>>>>> >>>>>> publisher: 'Whatever press' >>>>>> >>>>>> lang: de-De >>>>>> >>>>>> ... >>>>>> >>>>>> >>>>>> >>>>>> test [@doe] >>>>>> >>>>>> ``` >>>>>> >>>>>> >>>>>> >>>>>> The filter itself does not cover capitalization. For some reason, >>>>>> pandoc or citeproc applies title-case transformation here. I don’t think it >>>>>> should though. >>>>>> >>>>>> >>>>>> >>>>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>>>>> Auftrag von *John Carter Wood >>>>>> *Gesendet:* Freitag, 1. Juli 2022 18:24 >>>>>> *An:* pandoc-discuss >>>>>> *Betreff:* Re: Changing colons to full-stops in titles >>>>>> >>>>>> >>>>>> >>>>>> That's very interesting, thanks! I'll try it out when I get a chance >>>>>> in the coming days. >>>>>> >>>>>> I have thought about this issue of false positives while thinking >>>>>> about the option of some kind of filter. But...I think they would be very >>>>>> rare. I have a hard time thinking of a title with a colon in it that >>>>>> shouldn't be -- in this case -- be turned into a dot. At least, I don't >>>>>> have anything in my 1,200 references where I can see that that wouldn't >>>>>> apply. >>>>>> >>>>>> Although, of course, I'm sure there are some out there... >>>>>> >>>>>> Just a question: would this also ensure that the first word after the >>>>>> dot is capitalised? Or does that open a new series of problems? :-) >>>>>> >>>>>> >>>>>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:17:02 UTC+2: >>>>>> >>>>>> Here’s a very simple and absolutely unreliable version of a filter. >>>>>> This will replace every colon in a title with a period. >>>>>> >>>>>> >>>>>> >>>>>> ```lua >>>>>> >>>>>> local stringify = pandoc.utils.stringify >>>>>> >>>>>> function Meta(m) >>>>>> >>>>>> if m.references ~= nil then >>>>>> >>>>>> for _, el in ipairs (m.references) do >>>>>> >>>>>> print(stringify(el.title)) >>>>>> >>>>>> el.title = pandoc.Str(string.gsub(stringify(el.title), ':', >>>>>> '.')) >>>>>> >>>>>> print(el.title) >>>>>> >>>>>> end >>>>>> >>>>>> end >>>>>> >>>>>> return m >>>>>> >>>>>> end >>>>>> >>>>>> ``` >>>>>> >>>>>> >>>>>> >>>>>> Question is how this can be made robust enough to avoid false >>>>>> positives. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>>>>> Auftrag von *John Carter Wood >>>>>> *Gesendet:* Freitag, 1. Juli 2022 17:52 >>>>>> *An:* pandoc-discuss >>>>>> *Betreff:* Re: Changing colons to full-stops in titles >>>>>> >>>>>> >>>>>> >>>>>> Thanks for the suggestions, a couple of which are kind of stretching >>>>>> my knowledge of these things, but I see where they're going. >>>>>> >>>>>> As to JGM's question: I am using a CSL json bibliography, so my >>>>>> titles are in a single field. ("title":"Science and religion: new >>>>>> perspectives on the dialogue") >>>>>> >>>>>> The issue is that *most* of the journals / publishers I publish in >>>>>> use, as here, the colon. *Some* (mainly German) styles want the period. If >>>>>> I were solely interested in either one, I could choose and just enter the >>>>>> relevant punctuation in the title field. However, I want to continue saving >>>>>> my bibliographic entries with a colon (because that's the most standard one >>>>>> for me), but have the option of automatically converting them to a period >>>>>> for those cases where I need to. If that makes sense. >>>>>> >>>>>> Thus: going through denis's options: >>>>>> >>>>>> 1. I have switched to json bibliographies from bibtex/biblatex as >>>>>> they seemed to offer more flexibility (I was running into issue with the >>>>>> strange archival references I have to make in my field, and JSON seemed to >>>>>> work better in that regard). So this seems to not apply. >>>>>> >>>>>> 2. Seems to not apply, as I have a single title field >>>>>> >>>>>> 3. Sounds really interesting, and I use BBT, though it also sounds >>>>>> like I would here have to create a separate bibliography file from my >>>>>> Zotero database for those publishers/styles that require the dot. This is >>>>>> not *too* onerous, as it would at least be automated. >>>>>> >>>>>> 4. Having a filter that I could simply apply (as part of a pandoc >>>>>> command, say) or not apply as relevant seems like the most flexible / >>>>>> efficient solution. I don't know lua, but if this is one possible way, then >>>>>> I could use it as a (hopefully fairly simple?) way into learning it. >>>>>> >>>>>> >>>>>> >>>>>> Does this help to clarify my situation? >>>>>> >>>>>> >>>>>> >>>>>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 17:34:55 UTC+2: >>>>>> >>>>>> Yes, that’s a known issue... >>>>>> >>>>>> There are a couple of possible solutions : >>>>>> >>>>>> >>>>>> >>>>>> 1. use biblatex databases and patch pandoc so it will concat title >>>>>> and subtitle fields using periods. (line 667 >>>>>> https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Citeproc/BibTeX.hs >>>>>> ) >>>>>> >>>>>> >>>>>> >>>>>> 2. I think pandoc’s citeproc will just treat every unknown variable >>>>>> as a string variable (see >>>>>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L1054 >>>>>> and >>>>>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L901), >>>>>> so you should be able to use «subtitle» in styles. (This will give you >>>>>> warnings when using the style with Zotero and it won’t work reliably across >>>>>> implementations, but anyway ...) >>>>>> >>>>>> >>>>>> >>>>>> 3. if you’re using Zotero, you can leverage Zotero BBT’s postscript >>>>>> feature to manipulate the JSON after exporting. >>>>>> >>>>>> E.g., this one : >>>>>> >>>>>> if (Translator.BetterCSL && item.title) { >>>>>> >>>>>> reference.title = reference.title.replace(/ : /g, '. ') >>>>>> >>>>>> } >>>>>> >>>>>> Not bullet-proof, but simple. You will want to choose a better >>>>>> separator, maybe a double-bar or so. >>>>>> >>>>>> >>>>>> >>>>>> 4. Doing the with lua should also be possible... >>>>>> >>>>>> >>>>>> >>>>>> The question is: do you have the subtitle in a distinct field or is >>>>>> it just in the title field? >>>>>> >>>>>> >>>>>> >>>>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org *Im >>>>>> Auftrag von *John Carter Wood >>>>>> *Gesendet:* Freitag, 1. Juli 2022 16:39 >>>>>> *An:* pandoc-discuss >>>>>> *Betreff:* Changing colons to full-stops in titles >>>>>> >>>>>> >>>>>> >>>>>> I have one final (for now...) issue in setting up a CSL file (which I >>>>>> use with pandoc/citeproc and references in a json file). >>>>>> >>>>>> >>>>>> >>>>>> I'm not sure whether this is a CSL issue or whether it's an issue >>>>>> that can be solved via using a filter (or some other solution) in pandoc, >>>>>> but I thought there might be some people here who might have faced a >>>>>> similar issue. >>>>>> >>>>>> >>>>>> >>>>>> The house style for here (German-based publisher) wants a *full-stop/period >>>>>> *between main title and subtitle in citations / bibliographies; >>>>>> US/UK standard is a *colon* between main title and subtitle. And >>>>>> reference managers like Zotero -- IIUC -- save titles as single fields (at >>>>>> least they are in my version of Zotero). So it doesn't seem like it is >>>>>> possible to control what delimiter is used between them via CSL. >>>>>> >>>>>> >>>>>> I have found various discussions of relevant title/subtitle division >>>>>> issues -- some going back quite a few years -- in forums on Zotero: >>>>>> >>>>>> >>>>>> https://forums.zotero.org/discussion/8077/separate-fields-for-title-and-subtitle/ >>>>>> >>>>>> ...and CSL: >>>>>> >>>>>> >>>>>> https://discourse.citationstyles.org/t/handling-main-sub-title-splits-citeproc-js/1563/11 >>>>>> >>>>>> >>>>>> >>>>>> However, these were in part discussions among developers about >>>>>> *possible* changes, and I'm not sure of the current status of this >>>>>> issue or whether there is a way to handle it. >>>>>> >>>>>> Would it be possible to automate turning colons in titles into >>>>>> full-stops via using a filter? If so is there such a filter already around? >>>>>> Can this be done via CSL? >>>>>> >>>>>> >>>>>> >>>>>> Or is this, as of now, impossible? >>>>>> >>>>>> (Or is there a real simple solution that I have, as usual, >>>>>> overlooked...) >>>>>> >>>>>> -- >>>>>> >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "pandoc-discuss" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/pandoc-discuss/78df697a-50f5-46d0-b0b8-29a2cbc9509an%40googlegroups.com >>>>>> >>>>>> . >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "pandoc-discuss" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>>>> >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/pandoc-discuss/a2d540a6-a435-4285-aed5-018007d155cfn%40googlegroups.com >>>>>> >>>>>> . >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "pandoc-discuss" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>>>> >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/pandoc-discuss/f0f222ef-e60e-4397-83ac-bec1a6ac2d08n%40googlegroups.com >>>>>> >>>>>> . >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "pandoc-discuss" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/pandoc-discuss/b3deb0de-8ba0-4159-b9f3-1ecfbe68d457n%40googlegroups.com >>>>> >>>>> . >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAU66TxJKMZdDM-KVabJpmKUVo5xyuAAN03F2b89jv9Ow%40mail.gmail.com >>>> >>>> . >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMyj_GZ%3DAo_1qR2rwnAAYAaQ%3DMaf880cGLRv7yD_ianpQ%40mail.gmail.com >>> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBjTdgbY-xDouhDGfnE%2BJ%2BV5c3v0FUA2Hn00z59%3D%3DWeLw%40mail.gmail.com >> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZNNsfQs_Lt8agoaseyrNfdhrVOC9GTusMEdfecJFCBnug%40mail.gmail.com > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCjht%3DmJqUgEbyHQWcp%2BP5gKjYUoEfe4VQAOa6SS1b5Ag%40mail.gmail.com.