public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Paulo Ney de Souza <pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Changing colons to full-stops in titles
Date: Sat, 2 Jul 2022 22:41:44 -0700	[thread overview]
Message-ID: <CAFVhNZMyj_GZ=Ao_1qR2rwnAAYAaQ=Maf880cGLRv7yD_ianpQ@mail.gmail.com> (raw)
In-Reply-To: <CADAJKhAU66TxJKMZdDM-KVabJpmKUVo5xyuAAN03F2b89jv9Ow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 13956 bytes --]

I got interested in another aspect of the posting -- the program "
cleanbib.pl" by Benct.

I installed it in Ubuntu, and found out it does not process perfectly valid
TeX code like characters that end or have a space in the middle, or that it
processes \c{e}, but not the comma-accent any of the other vowels...

I prepared the torture test below to show the problems:

@Book{hobbit,
  title    = {Les \oe uf  de la serpente},
  address = {Bla\v zi\'c},
  publisher = {\c{a} \c{e} \c{i} \c{o} \c{u}},
}

and above all, how does this compare to:

    https://ctan.org/tex-archive/support/bibtexperllibs/LaTeX-ToUnicode

Paulo Ney


On Sat, Jul 2, 2022 at 1:03 PM BPJ <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> wrote:

> string.gsub() optionally takes the maximum number of substitutions as a
> fourth argument, and you can reinsert capture groups in the replacement, so
> this should be fairly robust:
>
> ``````lua
> string.gsub(title, '%:(%s)', '.%1', 1)
> ``````
>
>
> Den fre 1 juli 2022 18:44John Carter Wood <woodjo-ZOsAvrTRSvuEhhMi0yms2Q@public.gmane.org> skrev:
>
>> Ah, of course, biblical references. Religious history is one of my
>> fields, how could I miss that?
>>
>> Looking forward to trying this out!
>>
>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:41:02 UTC+2:
>>
>>> A slightly more reliable version:
>>>
>>>
>>>
>>> ```
>>>
>>> local stringify = pandoc.utils.stringify
>>>
>>> function Meta(m)
>>>
>>>   if m.references ~= nil then
>>>
>>>     for _, el in ipairs (m.references) do
>>>
>>>       -- print(stringify(el.title))
>>>
>>>       el.title = pandoc.Str(string.gsub(stringify(el.title), ': ', '.
>>> '))
>>>
>>>       -- print(el.title)
>>>
>>>     end
>>>
>>>   end
>>>
>>>   return m
>>>
>>> end```
>>>
>>>
>>>
>>> (This won’t replace colons in biblical references, e.g. Gen 1:1)
>>>
>>>
>>>
>>> You can test with this file :
>>>
>>>
>>>
>>> ```markdown
>>>
>>> ---
>>>
>>> references:
>>>
>>> - type: book
>>>
>>>   id: doe
>>>
>>>   author:
>>>
>>>   - family: Doe
>>>
>>>     given: Jane
>>>
>>>   issued:
>>>
>>>     date-parts:
>>>
>>>     - - 2022
>>>
>>>   title: 'A book: with a subtitle and a reference to Gen 1:1, but that
>>> is not a problem'
>>>
>>>   publisher: 'Whatever press'
>>>
>>>   lang: de-De
>>>
>>> ...
>>>
>>>
>>>
>>> test [@doe]
>>>
>>> ```
>>>
>>>
>>>
>>> The filter itself does not cover capitalization. For some reason,
>>> pandoc or citeproc applies title-case transformation here. I don’t think it
>>> should though.
>>>
>>>
>>>
>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> *Im
>>> Auftrag von *John Carter Wood
>>> *Gesendet:* Freitag, 1. Juli 2022 18:24
>>> *An:* pandoc-discuss <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
>>> *Betreff:* Re: Changing colons to full-stops in titles
>>>
>>>
>>>
>>> That's very interesting, thanks! I'll try it out when I get a chance in
>>> the coming days.
>>>
>>> I have thought about this issue of false positives while thinking about
>>> the option of some kind of filter. But...I think they would be very rare. I
>>> have a hard time thinking of a title with a colon in it that shouldn't be
>>> -- in this case -- be turned into a dot. At least, I don't have anything in
>>> my 1,200 references where I can see that that wouldn't apply.
>>>
>>> Although, of course, I'm sure there are some out there...
>>>
>>> Just a question: would this also ensure that the first word after the
>>> dot is capitalised? Or does that open a new series of problems? :-)
>>>
>>>
>>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 18:17:02 UTC+2:
>>>
>>> Here’s a very simple and absolutely unreliable version of a filter. This
>>> will replace every colon in a title with a period.
>>>
>>>
>>>
>>> ```lua
>>>
>>> local stringify = pandoc.utils.stringify
>>>
>>> function Meta(m)
>>>
>>>   if m.references ~= nil then
>>>
>>>     for _, el in ipairs (m.references) do
>>>
>>>       print(stringify(el.title))
>>>
>>>       el.title = pandoc.Str(string.gsub(stringify(el.title), ':', '.'))
>>>
>>>       print(el.title)
>>>
>>>     end
>>>
>>>   end
>>>
>>>   return m
>>>
>>> end
>>>
>>> ```
>>>
>>>
>>>
>>> Question is how this can be made robust enough to avoid false positives.
>>>
>>>
>>>
>>>
>>>
>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> *Im
>>> Auftrag von *John Carter Wood
>>> *Gesendet:* Freitag, 1. Juli 2022 17:52
>>> *An:* pandoc-discuss <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
>>> *Betreff:* Re: Changing colons to full-stops in titles
>>>
>>>
>>>
>>> Thanks for the suggestions, a couple of which are kind of stretching my
>>> knowledge of these things, but I see where they're going.
>>>
>>> As to JGM's question: I am using a CSL json bibliography, so my titles
>>> are in a single field. ("title":"Science and religion: new perspectives on
>>> the dialogue")
>>>
>>> The issue is that *most* of the journals / publishers I publish in use,
>>> as here, the colon. *Some* (mainly German) styles want the period. If I
>>> were solely interested in either one, I could choose and just enter the
>>> relevant punctuation in the title field. However, I want to continue saving
>>> my bibliographic entries with a colon (because that's the most standard one
>>> for me), but have the option of automatically converting them to a period
>>> for those cases where I need to. If that makes sense.
>>>
>>> Thus: going through denis's options:
>>>
>>> 1. I have switched to json bibliographies from bibtex/biblatex as they
>>> seemed to offer more flexibility (I was running into issue with the strange
>>> archival references I have to make in my field, and JSON seemed to work
>>> better in that regard). So this seems to not apply.
>>>
>>> 2. Seems to not apply, as I have a single title field
>>>
>>> 3. Sounds really interesting, and I use BBT, though it also sounds like
>>> I would here have to create a separate bibliography file from my Zotero
>>> database for those publishers/styles that require the dot. This is not
>>> *too* onerous, as it would at least be automated.
>>>
>>> 4. Having a filter that I could simply apply (as part of a pandoc
>>> command, say) or not apply as relevant seems like the most flexible /
>>> efficient solution. I don't know lua, but if this is one possible way, then
>>> I could use it as a (hopefully fairly simple?) way into learning it.
>>>
>>>
>>>
>>> Does this help to clarify my situation?
>>>
>>>
>>>
>>> denis...-NSENcxR/0n0@public.gmane.org schrieb am Freitag, 1. Juli 2022 um 17:34:55 UTC+2:
>>>
>>> Yes, that’s a known issue...
>>>
>>> There are a couple of possible solutions :
>>>
>>>
>>>
>>> 1. use biblatex databases and patch pandoc so it will concat title and
>>> subtitle fields using periods. (line 667
>>> https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Citeproc/BibTeX.hs
>>> )
>>>
>>>
>>>
>>> 2. I think pandoc’s citeproc will just treat every unknown variable as a
>>> string variable (see
>>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L1054
>>> and
>>> https://github.com/jgm/citeproc/blob/3f94424db469c804cf2dac2d22dc7a18b614f43e/src/Citeproc/Types.hs#L901),
>>> so you should be able to use «subtitle» in styles. (This will give you
>>> warnings when using the style with Zotero and it won’t work reliably across
>>> implementations, but anyway ...)
>>>
>>>
>>>
>>> 3. if you’re using Zotero, you can leverage Zotero BBT’s postscript
>>> feature to manipulate the JSON after exporting.
>>>
>>> E.g., this one :
>>>
>>> if (Translator.BetterCSL && item.title) {
>>>
>>>   reference.title = reference.title.replace(/ : /g, '. ')
>>>
>>> }
>>>
>>> Not bullet-proof, but simple. You will want to choose a better
>>> separator, maybe a double-bar or so.
>>>
>>>
>>>
>>> 4. Doing the with lua should also be possible...
>>>
>>>
>>>
>>> The question is: do you have the subtitle in a distinct field or is it
>>> just in the title field?
>>>
>>>
>>>
>>> *Von:* pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> *Im
>>> Auftrag von *John Carter Wood
>>> *Gesendet:* Freitag, 1. Juli 2022 16:39
>>> *An:* pandoc-discuss <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
>>> *Betreff:* Changing colons to full-stops in titles
>>>
>>>
>>>
>>> I have one final (for now...) issue in setting up a CSL file (which I
>>> use with pandoc/citeproc and references in a json file).
>>>
>>>
>>>
>>> I'm not sure whether this is a CSL issue or whether it's an issue that
>>> can be solved via using a filter (or some other solution) in pandoc, but I
>>> thought there might be some people here who might have faced a similar
>>> issue.
>>>
>>>
>>>
>>> The house style for here (German-based publisher) wants a *full-stop/period
>>> *between main title and subtitle in citations / bibliographies; US/UK
>>> standard is a *colon* between main title and subtitle. And reference
>>> managers like Zotero -- IIUC -- save titles as single fields (at least they
>>> are in my version of Zotero). So it doesn't seem like it is possible to
>>> control what delimiter is used between them via CSL.
>>>
>>>
>>> I have found various discussions of relevant title/subtitle division
>>> issues -- some going back quite a few years -- in forums on Zotero:
>>>
>>>
>>> https://forums.zotero.org/discussion/8077/separate-fields-for-title-and-subtitle/
>>>
>>> ...and CSL:
>>>
>>>
>>> https://discourse.citationstyles.org/t/handling-main-sub-title-splits-citeproc-js/1563/11
>>>
>>>
>>>
>>> However, these were in part discussions among developers about
>>> *possible* changes, and I'm not sure of the current status of this
>>> issue or whether there is a way to handle it.
>>>
>>> Would it be possible to automate turning colons in titles into
>>> full-stops via using a filter? If so is there such a filter already around?
>>> Can this be done via CSL?
>>>
>>>
>>>
>>> Or is this, as of now, impossible?
>>>
>>> (Or is there a real simple solution that I have, as usual,
>>> overlooked...)
>>>
>>> --
>>>
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/78df697a-50f5-46d0-b0b8-29a2cbc9509an%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/78df697a-50f5-46d0-b0b8-29a2cbc9509an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/a2d540a6-a435-4285-aed5-018007d155cfn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/a2d540a6-a435-4285-aed5-018007d155cfn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/f0f222ef-e60e-4397-83ac-bec1a6ac2d08n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/f0f222ef-e60e-4397-83ac-bec1a6ac2d08n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/b3deb0de-8ba0-4159-b9f3-1ecfbe68d457n%40googlegroups.com
>> <https://groups.google.com/d/msgid/pandoc-discuss/b3deb0de-8ba0-4159-b9f3-1ecfbe68d457n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAU66TxJKMZdDM-KVabJpmKUVo5xyuAAN03F2b89jv9Ow%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAU66TxJKMZdDM-KVabJpmKUVo5xyuAAN03F2b89jv9Ow%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMyj_GZ%3DAo_1qR2rwnAAYAaQ%3DMaf880cGLRv7yD_ianpQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 25987 bytes --]

  parent reply	other threads:[~2022-07-03  5:41 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <AQHYjVhIsMIE8s9IhE2lBYWP9lHO8a1pnTTw///rpoCAACWJYP//4zWAgAAjbfA=>
     [not found] ` <AQHYjVhIsMIE8s9IhE2lBYWP9lHO8a1pnTTw///rpoCAACWJYA==>
     [not found]   ` <AQHYjVhIsMIE8s9IhE2lBYWP9lHO8a1pnTTw>
2022-07-01 14:38     ` John Carter Wood
     [not found]       ` <78df697a-50f5-46d0-b0b8-29a2cbc9509an-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-01 15:30         ` John MacFarlane
     [not found]           ` <m2sfnkvoih.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-07-01 15:41             ` AW: " denis.maier-NSENcxR/0n0
2022-07-01 15:34         ` denis.maier-NSENcxR/0n0
     [not found]           ` <f440b082946148b884b45ed1a3421de4-NSENcxR/0n0@public.gmane.org>
2022-07-01 15:52             ` John Carter Wood
     [not found]               ` <a2d540a6-a435-4285-aed5-018007d155cfn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-01 16:16                 ` AW: " denis.maier-NSENcxR/0n0
     [not found]                   ` <bc85a4891eec431f846ee69f9fcbe167-NSENcxR/0n0@public.gmane.org>
2022-07-01 16:23                     ` John Carter Wood
     [not found]                       ` <f0f222ef-e60e-4397-83ac-bec1a6ac2d08n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-01 16:40                         ` AW: " denis.maier-NSENcxR/0n0
     [not found]                           ` <2a8d940b3675472fb4b50ead406f6fc7-NSENcxR/0n0@public.gmane.org>
2022-07-01 16:43                             ` John Carter Wood
     [not found]                               ` <b3deb0de-8ba0-4159-b9f3-1ecfbe68d457n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-02 20:02                                 ` BPJ
     [not found]                                   ` <CADAJKhAU66TxJKMZdDM-KVabJpmKUVo5xyuAAN03F2b89jv9Ow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-07-03  5:41                                     ` Paulo Ney de Souza [this message]
     [not found]                                       ` <CAFVhNZMyj_GZ=Ao_1qR2rwnAAYAaQ=Maf880cGLRv7yD_ianpQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-07-03 12:15                                         ` BPJ
     [not found]                                           ` <CADAJKhBjTdgbY-xDouhDGfnE+J+V5c3v0FUA2Hn00z59==WeLw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-07-03 16:53                                             ` Paulo Ney de Souza
     [not found]                                               ` <CAFVhNZNNsfQs_Lt8agoaseyrNfdhrVOC9GTusMEdfecJFCBnug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-07-03 19:43                                                 ` BPJ
     [not found]                                                   ` <CADAJKhCjht=mJqUgEbyHQWcp+P5gKjYUoEfe4VQAOa6SS1b5Ag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-07-04  9:09                                                     ` BPJ
2022-07-01 16:54                             ` AW: " Sukil Etxenike arizaleta
     [not found]                               ` <6621a84d-37f3-6741-bb0a-d4bfd6dac4bf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-07-01 17:28                                 ` Sukil Etxenike arizaleta
2022-07-05 10:02                             ` John Carter Wood
     [not found]                               ` <5c2a1a00-28b9-4846-9541-92baaf0d5200n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-05 10:16                                 ` Albert Krewinkel
     [not found]                                   ` <87h73vj1m1.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-07-05 11:45                                     ` John Carter Wood
     [not found]                                       ` <1ebf2a90-159e-4445-bf3a-b068526877cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-05 11:49                                         ` Albert Krewinkel
     [not found]                                           ` <87czejixon.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-07-05 12:02                                             ` John Carter Wood
2022-07-05 12:05                                         ` Sukil Etxenike arizaleta
2022-07-05 15:50                                         ` AW: " denis.maier-NSENcxR/0n0
     [not found]                                           ` <2bb67f22fc3044088f798c777f3684d2-NSENcxR/0n0@public.gmane.org>
2022-07-07 19:05                                             ` denis.maier-NSENcxR/0n0
2022-07-05 10:38                                 ` denis.maier-NSENcxR/0n0
2022-07-02 16:28         ` BPJ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFVhNZMyj_GZ=Ao_1qR2rwnAAYAaQ=Maf880cGLRv7yD_ianpQ@mail.gmail.com' \
    --to=pauloney-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).