public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: 'Nick Bart' via pandoc-discuss
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>,
	"pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org"
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Error compiling with icu support / possible workaround?
Date: Wed, 07 Apr 2021 19:23:15 -0700	[thread overview]
Message-ID: <m2o8epo6p8.fsf@MacBook-Pro.hsd1.ca.comcast.net> (raw)
In-Reply-To: <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>


On second thought, leaving it as an option makes a lot of sense.
We wouldn't want to force fr-FR to be sorted contrary to the French
Academy's official dictionary...

The question is really how to pass this kind of option through
pandoc/citeproc, if it's going to be user-selectable for fr.
It looks like there's a BCP 47 key "kb" corresponding to "backwards 2",
https://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
so maybe one just says fr-FR-u-kb-true or (canonical equivalent
according to 3.2.1) fr-FR-u-kb

For alternative collations for a language we could do the same,
e.g. es-ES-u-co-traditional.

Parsing and representing these complex language tags is started
to get pretty complicated!

John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I note that data/collation/fr_CA.xml has
>
>    [backwards 2]
>
> and data/collation/fr.xml does not.
>
> 'backwards 2' says to sort the second-level collation elements
> backwards; that's what the "French accents" option does.  So that
> explains the perl script's behavior; it is faithfully following
> the locales, which specify this for Canadian French but not
> European French.
>
> My parser for collation files currently does nothing with the
> `[backwards 2]`, but maybe it's something I should implement.
>
> "'Nick Bart' via pandoc-discuss"
> <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
>
>> Bastien, BJP - many thanks, that’s helpful. Still, the main practical question,
>> I guess, is whether the default sort order the "new" pandoc generates for French
>> - either with or without the "optFrenchAccents" modification - is acceptable
>> from the point of view of a native speaker of French or not, and if not, what
>> you would suggest instead.
>>
>>
>> As to multiple collations, I commented earlier:
>>
>>> ... I tend to think that the default collation (which usually seems to follow
>>> the most recent rules for a given language) would usually be sufficient.
>>
>> That being said, it seems that most of the information (in
>> https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
>> infrastructure for supporting different collation systems for a given language is
>> in place already, so the following might be worth a try:
>>
>> pandoc is relying on IETF BCP 47 language tags anyway
>> [https://tools.ietf.org/rfc/bcp/bcp47.txt].
>>
>> A number of locale attributes contained in the Common Locale Data Repository
>> (CLDR), including those pertaining to collation, can be expressed as extensions
>> to "simple" language tags of the form "en-US".
>>
>> IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
>> [https://tools.ietf.org/html/rfc6067]. Relevant quote:
>>
>>>    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
>>>
>>>    o  The base language tag "de-DE" (German as used in Germany), exactly as
>>>    defined by [BCP47] using subtags from the IANA Language Subtag Registry.
>>>
>>>    o  The singleton 'u', identifying this extension.
>>>
>>>    o  The attribute 'attr', which is an example for illustration (no
>>>    attributes were defined at the time this document was published).
>>>
>>>    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the
>>>    type 'phonebk' (Phonebook collation order).
>>
>> On IETF BCP 47 extensions, see also
>> https://www.w3.org/International/articles/language-tags/#extension.
>>
>> So if this does not appear too difficult, it might provide a lot of additional
>> flexibility if pandoc were to support the particular subset of "Extension U"
>> strings pertaining to collation, i.e., those starting with "-u-co-" in pandoc's
>> "lang" metadata field, or command line argument. (In the absence of such a string,
>> pandoc should of course use the default collation order.)
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o8epo6p8.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


  parent reply	other threads:[~2021-04-08  2:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-21 13:04 'Nick Bart' via pandoc-discuss
2021-03-22  5:55 ` John MacFarlane
     [not found]   ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 20:29     ` jcr
     [not found]       ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-23 19:04         ` John MacFarlane
     [not found]           ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-23 19:53             ` 'Nick Bart' via pandoc-discuss
2021-03-25 19:45               ` John MacFarlane
     [not found]                 ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-04 18:52                   ` John MacFarlane
     [not found]                     ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-05 23:17                       ` John MacFarlane
     [not found]                         ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06  9:21                           ` 'Nick Bart' via pandoc-discuss
2021-04-06 16:18                             ` John MacFarlane
     [not found]                               ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 16:42                                 ` 'Nick Bart' via pandoc-discuss
2021-04-06 18:14                                   ` Bastien DUMONT
2021-04-06 23:38                                     ` John MacFarlane
     [not found]                                       ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-07  7:52                                         ` BPJ
     [not found]                                           ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-07  9:37                                             ` BPJ
2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
2021-04-07 10:02                                     ` Bastien DUMONT
2021-04-07 12:32                                     ` BPJ
2021-04-08  1:41                                     ` John MacFarlane
     [not found]                                       ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  2:23                                         ` John MacFarlane [this message]
     [not found]                                           ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  7:12                                             ` Bastien DUMONT
2021-04-09 15:34                                             ` John MacFarlane
2021-03-22  5:59 ` John MacFarlane
     [not found]   ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22  6:08     ` John MacFarlane
     [not found]       ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 14:29         ` 'Nick Bart' via pandoc-discuss
2021-04-17 23:19           ` John MacFarlane
     [not found]             ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19  9:54               ` 'Nick Bart' via pandoc-discuss
2021-04-19 11:10                 ` Bastien DUMONT
2021-04-19 12:56                   ` 'Nick Bart' via pandoc-discuss
2021-04-19 13:16                     ` Bastien DUMONT
2021-04-19 16:19                       ` John MacFarlane
2021-04-19 16:16                 ` John MacFarlane
     [not found]                   ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 16:31                     ` 'Nick Bart' via pandoc-discuss
2021-04-19 18:08                       ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2o8epo6p8.fsf@MacBook-Pro.hsd1.ca.comcast.net \
    --to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).