public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Error compiling with icu support / possible workaround?
Date: Thu, 8 Apr 2021 09:12:33 +0200	[thread overview]
Message-ID: <YG6s4b/U9A+ab6qs@localhost> (raw)
In-Reply-To: <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>

It would be the most complete and flexible option, but implementing the regional subvariations may be enough. Most people don't have a precise idea about sorting rules, except that letters with diacritics should be placed after their counterparts without diacritics, so letting the library enforce the official rules for the locale they choose makes sense. This said, more options would be best, if you have the motivation to do it!

As for French, the 2012 edition of the Petit Robert has cote > côte > coté > côté.

Le Wednesday 07 April 2021 à 07:23:15PM, John MacFarlane a écrit :
> 
> On second thought, leaving it as an option makes a lot of sense.
> We wouldn't want to force fr-FR to be sorted contrary to the French
> Academy's official dictionary...
> 
> The question is really how to pass this kind of option through
> pandoc/citeproc, if it's going to be user-selectable for fr.
> It looks like there's a BCP 47 key "kb" corresponding to "backwards 2",
> https://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
> so maybe one just says fr-FR-u-kb-true or (canonical equivalent
> according to 3.2.1) fr-FR-u-kb
> 
> For alternative collations for a language we could do the same,
> e.g. es-ES-u-co-traditional.
> 
> Parsing and representing these complex language tags is started
> to get pretty complicated!
> 
> John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> 
> > I note that data/collation/fr_CA.xml has
> >
> >    [backwards 2]
> >
> > and data/collation/fr.xml does not.
> >
> > 'backwards 2' says to sort the second-level collation elements
> > backwards; that's what the "French accents" option does.  So that
> > explains the perl script's behavior; it is faithfully following
> > the locales, which specify this for Canadian French but not
> > European French.
> >
> > My parser for collation files currently does nothing with the
> > `[backwards 2]`, but maybe it's something I should implement.
> >
> > "'Nick Bart' via pandoc-discuss"
> > <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
> >
> >> Bastien, BJP - many thanks, that’s helpful. Still, the main practical question,
> >> I guess, is whether the default sort order the "new" pandoc generates for French
> >> - either with or without the "optFrenchAccents" modification - is acceptable
> >> from the point of view of a native speaker of French or not, and if not, what
> >> you would suggest instead.
> >>
> >>
> >> As to multiple collations, I commented earlier:
> >>
> >>> ... I tend to think that the default collation (which usually seems to follow
> >>> the most recent rules for a given language) would usually be sufficient.
> >>
> >> That being said, it seems that most of the information (in
> >> https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
> >> infrastructure for supporting different collation systems for a given language is
> >> in place already, so the following might be worth a try:
> >>
> >> pandoc is relying on IETF BCP 47 language tags anyway
> >> [https://tools.ietf.org/rfc/bcp/bcp47.txt].
> >>
> >> A number of locale attributes contained in the Common Locale Data Repository
> >> (CLDR), including those pertaining to collation, can be expressed as extensions
> >> to "simple" language tags of the form "en-US".
> >>
> >> IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
> >> [https://tools.ietf.org/html/rfc6067]. Relevant quote:
> >>
> >>>    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
> >>>
> >>>    o  The base language tag "de-DE" (German as used in Germany), exactly as
> >>>    defined by [BCP47] using subtags from the IANA Language Subtag Registry.
> >>>
> >>>    o  The singleton 'u', identifying this extension.
> >>>
> >>>    o  The attribute 'attr', which is an example for illustration (no
> >>>    attributes were defined at the time this document was published).
> >>>
> >>>    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the
> >>>    type 'phonebk' (Phonebook collation order).
> >>
> >> On IETF BCP 47 extensions, see also
> >> https://www.w3.org/International/articles/language-tags/#extension.
> >>
> >> So if this does not appear too difficult, it might provide a lot of additional
> >> flexibility if pandoc were to support the particular subset of "Extension U"
> >> strings pertaining to collation, i.e., those starting with "-u-co-" in pandoc's
> >> "lang" metadata field, or command line argument. (In the absence of such a string,
> >> pandoc should of course use the default collation order.)
> >>
> >> -- 
> >> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o8epo6p8.fsf%40MacBook-Pro.hsd1.ca.comcast.net.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YG6s4b/U9A%2Bab6qs%40localhost.


  parent reply	other threads:[~2021-04-08  7:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-21 13:04 'Nick Bart' via pandoc-discuss
2021-03-22  5:55 ` John MacFarlane
     [not found]   ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 20:29     ` jcr
     [not found]       ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-23 19:04         ` John MacFarlane
     [not found]           ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-23 19:53             ` 'Nick Bart' via pandoc-discuss
2021-03-25 19:45               ` John MacFarlane
     [not found]                 ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-04 18:52                   ` John MacFarlane
     [not found]                     ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-05 23:17                       ` John MacFarlane
     [not found]                         ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06  9:21                           ` 'Nick Bart' via pandoc-discuss
2021-04-06 16:18                             ` John MacFarlane
     [not found]                               ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 16:42                                 ` 'Nick Bart' via pandoc-discuss
2021-04-06 18:14                                   ` Bastien DUMONT
2021-04-06 23:38                                     ` John MacFarlane
     [not found]                                       ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-07  7:52                                         ` BPJ
     [not found]                                           ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-07  9:37                                             ` BPJ
2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
2021-04-07 10:02                                     ` Bastien DUMONT
2021-04-07 12:32                                     ` BPJ
2021-04-08  1:41                                     ` John MacFarlane
     [not found]                                       ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  2:23                                         ` John MacFarlane
     [not found]                                           ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  7:12                                             ` Bastien DUMONT [this message]
2021-04-09 15:34                                             ` John MacFarlane
2021-03-22  5:59 ` John MacFarlane
     [not found]   ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22  6:08     ` John MacFarlane
     [not found]       ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 14:29         ` 'Nick Bart' via pandoc-discuss
2021-04-17 23:19           ` John MacFarlane
     [not found]             ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19  9:54               ` 'Nick Bart' via pandoc-discuss
2021-04-19 11:10                 ` Bastien DUMONT
2021-04-19 12:56                   ` 'Nick Bart' via pandoc-discuss
2021-04-19 13:16                     ` Bastien DUMONT
2021-04-19 16:19                       ` John MacFarlane
2021-04-19 16:16                 ` John MacFarlane
     [not found]                   ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 16:31                     ` 'Nick Bart' via pandoc-discuss
2021-04-19 18:08                       ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YG6s4b/U9A+ab6qs@localhost \
    --to=bastien.dumont-vwifzpto/vqstnjn9+bgxg@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).