Error compiling with icu support / possible workaround?

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Error compiling with icu support / possible workaround?
@ 2021-03-21 13:04 'Nick Bart' via pandoc-discuss
  2021-03-22  5:55 ` John MacFarlane
  2021-03-22  5:59 ` John MacFarlane
  0 siblings, 2 replies; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-03-21 13:04 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3034 bytes --]

Trying to compile pandoc with icu support, on macOS (with latest pandoc dev, and latest icu4c installed via homebrew), as described in https://pandoc.org/installing.html, and using stack, the process errors out with number of messages concerning text-icu (the last one being fatal):

```
text-icu > /private/var/folders/tr/2vzllytd31v6tb7hs_j7n2hm0000gs/T/stack-59bd324a08690294/text-icu-0.7.0.1/cbits/text_icu.c:308:43: error:
text-icu > error: use of undeclared identifier 'TRUE'
text-icu > return u_strCompareIter(iter1, iter2, TRUE);
text-icu > ^
text-icu > |
text-icu > 308 | return u_strCompareIter(iter1, iter2, TRUE);
text-icu > | ^
text-icu > 3 warnings and 1 error generated.
text-icu > `gcc' failed in phase `C Compiler'. (Exit code: 1)
```

(all but identical to what has been reported at https://github.com/haskell/text-icu/issues/49)

The error(s) appear to be caused by recent changes to icu4c, which seems to have dropped custom-defined TRUE and FALSE values (see https://github.com/haskell/text-icu/issues/49).

While there is an open pull request intended to fix this issue (https://github.com/haskell/text-icu/pull/48), it seems not to be clear that text-icu has an active maintainer at the moment, and it’s unclear when that pull request will eventually make it into an official text-icu release.

An unofficial fork of text-icu claims to have fixed the issue (https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b).

I wonder if anyone could indicate how to tweak the pandoc install command to include, for the time being, the WorldSEnder/text-icu fork rather than the official one - or whether there is anything else I could try to fix this issue on the pandoc side. (I tried downgrading icu4c via homebrew, but apparenty no formulae for earlier versions are available.)

As an aside, while I fully understand the wish not having to include a huge external C library by default, I feel that pandoc’s default sorting algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat below par. In particular, it does not even comply with mainstream English-language rules as far accented characters are concerned. The Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words beginning with or including accented letters are alphabetized as though they were unaccented.” One of their examples gives the sort order “Ubeda – Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda – Ubina – Über“.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/RvYr4eCyVE6BDciI-E4L0xJD_y60Nj2t6NJYdYcKeSmXBsMd6SZUZWu4Pm_pPwgfdTrGVS1_afWf2R5fi7hhWMKjewo3yakQKUhPv6Sj5JQ%3D%40protonmail.com.

[-- Attachment #2: Type: text/html, Size: 6194 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-03-21 13:04 Error compiling with icu support / possible workaround? 'Nick Bart' via pandoc-discuss
@ 2021-03-22  5:55 ` John MacFarlane
       [not found]   ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  2021-03-22  5:59 ` John MacFarlane
  1 sibling, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-03-22  5:55 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

"'Nick Bart' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

> An unofficial fork of text-icu claims to have fixed the issue (https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b).
>
> I wonder if anyone could indicate how to tweak the pandoc install command to include, for the time being, the WorldSEnder/text-icu fork rather than the official one - or whether there is anything else I could try to fix this issue on the pandoc side. (I tried downgrading icu4c via homebrew, but apparenty no formulae for earlier versions are available.)

Replace stack.yaml with this:


``` stack.yaml
flags:
  pandoc:
    trypandoc: false
    embed_data_files: true
  QuickCheck:
    old-random: false
  citeproc:
    icu: true
packages:
- '.'
extra-deps:
- hslua-1.3.0
- hslua-module-path-0.1.0
- jira-wiki-markup-1.3.4
- skylighting-core-0.10.5
- skylighting-0.10.5
- doclayout-0.3.0.2
- citeproc-0.3.0.9
- texmath-0.12.2
- random-1.2.0
- git: https://github.com/WorldSEnder/text-icu
  commit: 7657227a7ca8ad13be86db5c190b806774a5fd6b
ghc-options:
   "$locals": -fhide-source-paths -Wno-missing-home-modules
resolver: lts-17.5
nix:
  packages: [zlib]
```

Then stack install.

> As an aside, while I fully understand the wish not having to include a huge external C library by default, I feel that pandoc’s default sorting algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat below par. In particular, it does not even comply with mainstream English-language rules as far accented characters are concerned. The Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words beginning with or including accented letters are alphabetized as though they were unaccented.” One of their examples gives the sort order “Ubeda – Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda – Ubina – Über“.

Yes. I agree.  Actually, if we just need special treatment for
English locales, then I don't think it should be too hard.  We
can use the Haskell unicode-transforms library (already a
dependency of pandoc) to normalize the text and then remove
accents:

Prelude Data.Text.Normalize Data.Text Data.Char> Data.Text.filter (not . isMark) $ normalize NFD "dérégler"
"deregler"

We could sort on the result of that transform.

(This method would also affect non-Western scripts, though, and
I don't know what the rules around those are...)

For non-English locales, would we want to fall back to RFC 5051?

I'm not sure what all the relevant rules are; if it's not too
terribly complicated, I wonder if a pure Haskell library could
be cooked up.  It's a shame that there's no way to do proper
unicode collation in Haskell without the difficult icu4
dependency.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m25z1jpw9n.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-03-21 13:04 Error compiling with icu support / possible workaround? 'Nick Bart' via pandoc-discuss
  2021-03-22  5:55 ` John MacFarlane
@ 2021-03-22  5:59 ` John MacFarlane
       [not found]   ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-03-22  5:59 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Further note on this:  we should compile with icu support in
our binary releases -- at least on macos and linux, where I
know how to do it.  Maybe someone can help me figure out
how to do it on Windows.

"'Nick Bart' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

> Trying to compile pandoc with icu support, on macOS (with latest pandoc dev, and latest icu4c installed via homebrew), as described in https://pandoc.org/installing.html, and using stack, the process errors out with number of messages concerning text-icu (the last one being fatal):
>
> ```
> text-icu > /private/var/folders/tr/2vzllytd31v6tb7hs_j7n2hm0000gs/T/stack-59bd324a08690294/text-icu-0.7.0.1/cbits/text_icu.c:308:43: error:
> text-icu > error: use of undeclared identifier 'TRUE'
> text-icu > return u_strCompareIter(iter1, iter2, TRUE);
> text-icu > ^
> text-icu > |
> text-icu > 308 | return u_strCompareIter(iter1, iter2, TRUE);
> text-icu > | ^
> text-icu > 3 warnings and 1 error generated.
> text-icu > `gcc' failed in phase `C Compiler'. (Exit code: 1)
> ```
>
> (all but identical to what has been reported at https://github.com/haskell/text-icu/issues/49)
>
> The error(s) appear to be caused by recent changes to icu4c, which seems to have dropped custom-defined TRUE and FALSE values (see https://github.com/haskell/text-icu/issues/49).
>
> While there is an open pull request intended to fix this issue (https://github.com/haskell/text-icu/pull/48), it seems not to be clear that text-icu has an active maintainer at the moment, and it’s unclear when that pull request will eventually make it into an official text-icu release.
>
> An unofficial fork of text-icu claims to have fixed the issue (https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b).
>
> I wonder if anyone could indicate how to tweak the pandoc install command to include, for the time being, the WorldSEnder/text-icu fork rather than the official one - or whether there is anything else I could try to fix this issue on the pandoc side. (I tried downgrading icu4c via homebrew, but apparenty no formulae for earlier versions are available.)
>
> As an aside, while I fully understand the wish not having to include a huge external C library by default, I feel that pandoc’s default sorting algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat below par. In particular, it does not even comply with mainstream English-language rules as far accented characters are concerned. The Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words beginning with or including accented letters are alphabetized as though they were unaccented.” One of their examples gives the sort order “Ubeda – Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda – Ubina – Über“.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/RvYr4eCyVE6BDciI-E4L0xJD_y60Nj2t6NJYdYcKeSmXBsMd6SZUZWu4Pm_pPwgfdTrGVS1_afWf2R5fi7hhWMKjewo3yakQKUhPv6Sj5JQ%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m235wnpw3l.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]   ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-03-22  6:08     ` John MacFarlane
       [not found]       ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-03-22  6:08 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


On second thought, I'd be reluctant to depend for a release
on an unreleased fork.  The last release of text-icu is 2015!
The fork is quite old too.

It would be worth seeing if someone in the Haskell world could
take over this library, which seems pretty important.
Posting on the Haskell reddit might be one way to get attention
to the issue.



John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Further note on this:  we should compile with icu support in
> our binary releases -- at least on macos and linux, where I
> know how to do it.  Maybe someone can help me figure out
> how to do it on Windows.
>
> "'Nick Bart' via pandoc-discuss"
> <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
>
>> Trying to compile pandoc with icu support, on macOS (with latest pandoc dev, and latest icu4c installed via homebrew), as described in https://pandoc.org/installing.html, and using stack, the process errors out with number of messages concerning text-icu (the last one being fatal):
>>
>> ```
>> text-icu > /private/var/folders/tr/2vzllytd31v6tb7hs_j7n2hm0000gs/T/stack-59bd324a08690294/text-icu-0.7.0.1/cbits/text_icu.c:308:43: error:
>> text-icu > error: use of undeclared identifier 'TRUE'
>> text-icu > return u_strCompareIter(iter1, iter2, TRUE);
>> text-icu > ^
>> text-icu > |
>> text-icu > 308 | return u_strCompareIter(iter1, iter2, TRUE);
>> text-icu > | ^
>> text-icu > 3 warnings and 1 error generated.
>> text-icu > `gcc' failed in phase `C Compiler'. (Exit code: 1)
>> ```
>>
>> (all but identical to what has been reported at https://github.com/haskell/text-icu/issues/49)
>>
>> The error(s) appear to be caused by recent changes to icu4c, which seems to have dropped custom-defined TRUE and FALSE values (see https://github.com/haskell/text-icu/issues/49).
>>
>> While there is an open pull request intended to fix this issue (https://github.com/haskell/text-icu/pull/48), it seems not to be clear that text-icu has an active maintainer at the moment, and it’s unclear when that pull request will eventually make it into an official text-icu release.
>>
>> An unofficial fork of text-icu claims to have fixed the issue (https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b).
>>
>> I wonder if anyone could indicate how to tweak the pandoc install command to include, for the time being, the WorldSEnder/text-icu fork rather than the official one - or whether there is anything else I could try to fix this issue on the pandoc side. (I tried downgrading icu4c via homebrew, but apparenty no formulae for earlier versions are available.)
>>
>> As an aside, while I fully understand the wish not having to include a huge external C library by default, I feel that pandoc’s default sorting algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat below par. In particular, it does not even comply with mainstream English-language rules as far accented characters are concerned. The Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words beginning with or including accented letters are alphabetized as though they were unaccented.” One of their examples gives the sort order “Ubeda – Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda – Ubina – Über“.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/RvYr4eCyVE6BDciI-E4L0xJD_y60Nj2t6NJYdYcKeSmXBsMd6SZUZWu4Pm_pPwgfdTrGVS1_afWf2R5fi7hhWMKjewo3yakQKUhPv6Sj5JQ%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2wntzoh3n.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]       ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-03-22 14:29         ` 'Nick Bart' via pandoc-discuss
  2021-04-17 23:19           ` John MacFarlane
  0 siblings, 1 reply; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-03-22 14:29 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

With a stack.yaml file modified according to your suggestions I succeeded building pandoc with icu support. Many thanks.

As to non-icu approaches:

For English locales removing accents before sorting would mean an improvement, and actually that’s all that seems to be required to comply with the CMOS’s rules.

A few other languages might benefit from this approach, too - but as far as I can see this would be limited to Dutch, Portuguese, and German (where, in addition to removing accents, “ß” would have to be transformed to “ss” before sorting). Caveat: I have only checked https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions - which may or may not be authoritative, and only covers languages using an “extended Latin alphabet”.

Other languages’ rules typically seem to be much more involved, and removing accents before sorting might actually worsen things compared to relying on “i;unicode-casemap” (RFC 5051). One example is Spanish, where “ñ“ should definitely be sorted as distinct letter *after* “n”. So, yes, if not using icu4c, falling back to RFC 5051 for those languages where we are not reasonably sure removing accents before sorting is useful seems to make sense.

Still, my conclusion is that for most languages listed in https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions - let alone those using non-Latin alphabets - “i;unicode-casemap” (RFC 5051) just won’t be adequate.

The only readily available and robust solution - short of reimplementing parts of it in Haskell - seems to be icu4c. I for one wouldn’t mind at all if you decided to include it into the pandoc binaries by default.

As to fixes to the official text-icu branch, it seems we’re getting at least some attention already, see https://github.com/haskell/text-icu/issues/49#issuecomment-804097813. Let’s see ...

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/lidHEiNPa68F2kffS3J03CKtQ-u1OPPEtAClLKPcdu4_8AQ5AdkUnFss7zQElbbw14QMD_P8bp7MzgiN3ew78EqYzEKbQJZaZ3aAA9By2vQ%3D%40protonmail.com.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]   ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-03-22 20:29     ` jcr
       [not found]       ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: jcr @ 2021-03-22 20:29 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3852 bytes --]

I'm not an expert in this, but I believe a pure Haskell solution mean 
implementing the Unicode Collation Algorithm 
<https://unicode.org/reports/tr10/>. The Unicode Common Locale Data 
Repository <http://cldr.unicode.org/> contains the per-locale settings to 
configure the algorithm to sort according to the locale's rules. This is 
what ICU does.

On Monday, March 22, 2021 at 6:56:04 AM UTC+1 John MacFarlane wrote:

> "'Nick Bart' via pandoc-discuss"
> <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
>
> > An unofficial fork of text-icu claims to have fixed the issue (
> https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b
> ).
> >
> > I wonder if anyone could indicate how to tweak the pandoc install 
> command to include, for the time being, the WorldSEnder/text-icu fork 
> rather than the official one - or whether there is anything else I could 
> try to fix this issue on the pandoc side. (I tried downgrading icu4c via 
> homebrew, but apparenty no formulae for earlier versions are available.)
>
> Replace stack.yaml with this:
>
>
> ``` stack.yaml
> flags:
> pandoc:
> trypandoc: false
> embed_data_files: true
> QuickCheck:
> old-random: false
> citeproc:
> icu: true
> packages:
> - '.'
> extra-deps:
> - hslua-1.3.0
> - hslua-module-path-0.1.0
> - jira-wiki-markup-1.3.4
> - skylighting-core-0.10.5
> - skylighting-0.10.5
> - doclayout-0.3.0.2
> - citeproc-0.3.0.9
> - texmath-0.12.2
> - random-1.2.0
> - git: https://github.com/WorldSEnder/text-icu
> commit: 7657227a7ca8ad13be86db5c190b806774a5fd6b
> ghc-options:
> "$locals": -fhide-source-paths -Wno-missing-home-modules
> resolver: lts-17.5
> nix:
> packages: [zlib]
> ```
>
> Then stack install.
>
> > As an aside, while I fully understand the wish not having to include a 
> huge external C library by default, I feel that pandoc’s default sorting 
> algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat 
> below par. In particular, it does not even comply with mainstream 
> English-language rules as far accented characters are concerned. The 
> Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words 
> beginning with or including accented letters are alphabetized as though 
> they were unaccented.” One of their examples gives the sort order “Ubeda – 
> Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda 
> – Ubina – Über“.
>
> Yes. I agree. Actually, if we just need special treatment for
> English locales, then I don't think it should be too hard. We
> can use the Haskell unicode-transforms library (already a
> dependency of pandoc) to normalize the text and then remove
> accents:
>
> Prelude Data.Text.Normalize Data.Text Data.Char> Data.Text.filter (not . 
> isMark) $ normalize NFD "dérégler"
> "deregler"
>
> We could sort on the result of that transform.
>
> (This method would also affect non-Western scripts, though, and
> I don't know what the rules around those are...)
>
> For non-English locales, would we want to fall back to RFC 5051?
>
> I'm not sure what all the relevant rules are; if it's not too
> terribly complicated, I wonder if a pure Haskell library could
> be cooked up. It's a shame that there's no way to do proper
> unicode collation in Haskell without the difficult icu4
> dependency.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5035db2e-16b9-4923-8e38-d95b81d27840n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5230 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]       ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-03-23 19:04         ` John MacFarlane
       [not found]           ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-03-23 19:04 UTC (permalink / raw)
  To: jcr, pandoc-discuss


Just a note: I've started working on a library that does this.

The basics are mostly working now (about 4x slower than text-icu but not
too bad).  But I haven't yet implemented the locale-sensitive
sorting hints.



jcr <ffi.appdev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I'm not an expert in this, but I believe a pure Haskell solution mean 
> implementing the Unicode Collation Algorithm 
> <https://unicode.org/reports/tr10/>. The Unicode Common Locale Data 
> Repository <http://cldr.unicode.org/> contains the per-locale settings to 
> configure the algorithm to sort according to the locale's rules. This is 
> what ICU does.
>
> On Monday, March 22, 2021 at 6:56:04 AM UTC+1 John MacFarlane wrote:
>
>> "'Nick Bart' via pandoc-discuss"
>> <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
>>
>> > An unofficial fork of text-icu claims to have fixed the issue (
>> https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b
>> ).
>> >
>> > I wonder if anyone could indicate how to tweak the pandoc install 
>> command to include, for the time being, the WorldSEnder/text-icu fork 
>> rather than the official one - or whether there is anything else I could 
>> try to fix this issue on the pandoc side. (I tried downgrading icu4c via 
>> homebrew, but apparenty no formulae for earlier versions are available.)
>>
>> Replace stack.yaml with this:
>>
>>
>> ``` stack.yaml
>> flags:
>> pandoc:
>> trypandoc: false
>> embed_data_files: true
>> QuickCheck:
>> old-random: false
>> citeproc:
>> icu: true
>> packages:
>> - '.'
>> extra-deps:
>> - hslua-1.3.0
>> - hslua-module-path-0.1.0
>> - jira-wiki-markup-1.3.4
>> - skylighting-core-0.10.5
>> - skylighting-0.10.5
>> - doclayout-0.3.0.2
>> - citeproc-0.3.0.9
>> - texmath-0.12.2
>> - random-1.2.0
>> - git: https://github.com/WorldSEnder/text-icu
>> commit: 7657227a7ca8ad13be86db5c190b806774a5fd6b
>> ghc-options:
>> "$locals": -fhide-source-paths -Wno-missing-home-modules
>> resolver: lts-17.5
>> nix:
>> packages: [zlib]
>> ```
>>
>> Then stack install.
>>
>> > As an aside, while I fully understand the wish not having to include a 
>> huge external C library by default, I feel that pandoc’s default sorting 
>> algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat 
>> below par. In particular, it does not even comply with mainstream 
>> English-language rules as far accented characters are concerned. The 
>> Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words 
>> beginning with or including accented letters are alphabetized as though 
>> they were unaccented.” One of their examples gives the sort order “Ubeda – 
>> Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda 
>> – Ubina – Über“.
>>
>> Yes. I agree. Actually, if we just need special treatment for
>> English locales, then I don't think it should be too hard. We
>> can use the Haskell unicode-transforms library (already a
>> dependency of pandoc) to normalize the text and then remove
>> accents:
>>
>> Prelude Data.Text.Normalize Data.Text Data.Char> Data.Text.filter (not . 
>> isMark) $ normalize NFD "dérégler"
>> "deregler"
>>
>> We could sort on the result of that transform.
>>
>> (This method would also affect non-Western scripts, though, and
>> I don't know what the rules around those are...)
>>
>> For non-English locales, would we want to fall back to RFC 5051?
>>
>> I'm not sure what all the relevant rules are; if it's not too
>> terribly complicated, I wonder if a pure Haskell library could
>> be cooked up. It's a shame that there's no way to do proper
>> unicode collation in Haskell without the difficult icu4
>> dependency.
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5035db2e-16b9-4923-8e38-d95b81d27840n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o8f9ofmw.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]           ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-03-23 19:53             ` 'Nick Bart' via pandoc-discuss
  2021-03-25 19:45               ` John MacFarlane
  0 siblings, 1 reply; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-03-23 19:53 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Amazing. I'll be happy to test anything you come up with.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-03-23 19:53             ` 'Nick Bart' via pandoc-discuss
@ 2021-03-25 19:45               ` John MacFarlane
       [not found]                 ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-03-25 19:45 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I'm making good progress on this. I have the locale-specific
tailorings mostly working now, I think.  Everything is compiled
into the library using Template Haskell and parsers that extract
the data directly from the unicode data files.  There's even
a quasiquoter you can use to define custom tailorings in
your source file, using the same syntax that the unicode
CLDR files use, e.g.

    myCollation = applyTailoring [tailoring|&n<m<<M|] rootCollation

There are still a lot of little things to chase down, and there
are lots of details in the spec that aren't yet supported yet and
maybe never will be. But it should come close enough to be useful.

I hope to have a version you can start playing around with in
a week or two.

"'Nick Bart' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

> Amazing. I'll be happy to test anything you come up with.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ULxwAu5CKenam0w9v0kgbCYf6Nm3MDyRbVqX7ERxSzRCMkQJ1TRQFrJFx2HoXcFD_Ephwo6S3Kadh4DJv-HLtqB2azepn2whUUYXbwtBIhE%3D%40protonmail.com.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                 ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-04 18:52                   ` John MacFarlane
       [not found]                     ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-04 18:52 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Update: I've put the library (unicode-collation) on GitHub:
https://github.com/jgm/unicode-collation

I also have a branch of citeproc that uses this:
https://github.com/jgm/citeproc/tree/unicode-collation

And a branch of pandoc that uses this citeproc:
https://github.com/jgm/pandoc/tree/unicode-collation

Still ironing out a few kinks involving date sorting,
but if you want to start testing on this branch, it
would be helpful.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                     ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-05 23:17                       ` John MacFarlane
       [not found]                         ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-05 23:17 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Here are a couple tests I just added to the unicode-collation branch.
Let me know if the output seems right.

```
% pandoc --citeproc -t plain
---
lang: en-US
csl: command/apa.csl
references:
- id: a1
  type: book
  author:
  - family: Ubina
    given: A. John
  issued: 1985
- id: a2
  type: book
  author:
  - family: Über
    given: Aglaia
  issued: 1996
- id: a3
  type: book
  author:
  - family: Oñate
    given: José
  issued: 1985
- id: a4
  type: book
  author:
  - family: Onush
    given: Frank
  issued: 2002
- id: a5
  type: book
  author:
  - family: O'Neil
    given: Timothy
  issued: 2010
---

[@a1;@a2;@a3;@a4;@a5]
^D
(O’Neil, 2010; Oñate, 1985; Onush, 2002; Über, 1996; Ubina, 1985)

O’Neil, T. (2010).

Oñate, J. (1985).

Onush, F. (2002).

Über, A. (1996).

Ubina, A. J. (1985).
```

```
% pandoc --citeproc -t plain
---
lang: es
csl: command/apa.csl
references:
- id: a1
  type: book
  author:
  - family: Ubina
    given: A. John
  issued: 1985
- id: a2
  type: book
  author:
  - family: Über
    given: Aglaia
  issued: 1996
- id: a3
  type: book
  author:
  - family: Oñate
    given: José
  issued: 1985
- id: a4
  type: book
  author:
  - family: Onush
    given: Frank
  issued: 2002
- id: a5
  type: book
  author:
  - family: O'Neil
    given: Timothy
  issued: 2010
---

[@a1;@a2;@a3;@a4;@a5]
^D
(O’Neil, 2010; Onush, 2002; Oñate, 1985; Über, 1996; Ubina, 1985)

O’Neil, T. (2010).

Onush, F. (2002).

Oñate, J. (1985).

Über, A. (1996).

Ubina, A. J. (1985).
```

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m21rbos4nd.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                         ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-06  9:21                           ` 'Nick Bart' via pandoc-discuss
  2021-04-06 16:18                             ` John MacFarlane
  0 siblings, 1 reply; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-04-06  9:21 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

The two examples from you latest post look ok to me - and, if further confirmation for "es" should be needed, the ICU4C Demo at https://icu4c-demos.unicode.org/icu-bin/collation.html generates the same sort order with any of the three "es" variants they offer.

What doesn’t look right so far is sorting according to French rules: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions claims "For French, the last accent in a given word determines the order.[13] For example, in French, the following four words would be sorted this way: cote < côte < coté < côté."

https://icu4c-demos.unicode.org/icu-bin/collation.html (which for some reason offers "fr-CA" only) generates the same sort order ("cote < côte < coté < côté").

However, using the "new" pandoc branch with the following example:

```
pandoc -C -t plain << EOT

Expected:
cote
côte
coté
côté

---
nocite: '@*'
lang: fr
references:
- id: cote
  author: cote
- id: côte
  author: côte
- id: coté
  author: coté
- id: côté
  author: côté
...
EOT
```

I get this sort order instead:

```
Expected: cote côte coté côté

cote. s. d.

coté. s. d.

côte. s. d.

côté. s. d.
```

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ztOzz7OZvq0y49K9g2Rbuj3fXNL05TinB60Ntkc0jVom24XTwQenCasydvkGxZPka8jEUD-3b-U2dM-fi-tnxxGIr2NDErxSfMFBEVekK7I%3D%40protonmail.com.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-06  9:21                           ` 'Nick Bart' via pandoc-discuss
@ 2021-04-06 16:18                             ` John MacFarlane
       [not found]                               ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-06 16:18 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


I've added optFrenchAccents to unicode-collation, and the latest
pandoc in the unicode-collation branch fixes the issue you
identified with French accents.

Currently I enable this whenever lang is "fr" -- but I don't
know if that's right; maybe some French-speaking countries
don't do this?

More testing welcome, esp. with non-latin alphabets -- I don't
have a good stock of samples for those.

One further point:  unicode-collation supports multiple
collations for a given language (e.g. es vs. es/traditional,
which sorts some letter combinations differently).  I don't
know whether it's worth providing a way for pandoc and citeproc
to use these, or if we can always use the default collation for
a language.


"'Nick Bart' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

> The two examples from you latest post look ok to me - and, if further confirmation for "es" should be needed, the ICU4C Demo at https://icu4c-demos.unicode.org/icu-bin/collation.html generates the same sort order with any of the three "es" variants they offer.
>
> What doesn’t look right so far is sorting according to French rules: https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions claims "For French, the last accent in a given word determines the order.[13] For example, in French, the following four words would be sorted this way: cote < côte < coté < côté."
>
> https://icu4c-demos.unicode.org/icu-bin/collation.html (which for some reason offers "fr-CA" only) generates the same sort order ("cote < côte < coté < côté").
>
> However, using the "new" pandoc branch with the following example:
>
> ```
> pandoc -C -t plain << EOT
>
> Expected:
> cote
> côte
> coté
> côté
>
> ---
> nocite: '@*'
> lang: fr
> references:
> - id: cote
>   author: cote
> - id: côte
>   author: côte
> - id: coté
>   author: coté
> - id: côté
>   author: côté
> ...
> EOT
> ```
>
> I get this sort order instead:
>
> ```
> Expected: cote côte coté côté
>
> cote. s. d.
>
> coté. s. d.
>
> côte. s. d.
>
> côté. s. d.
> ```
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ztOzz7OZvq0y49K9g2Rbuj3fXNL05TinB60Ntkc0jVom24XTwQenCasydvkGxZPka8jEUD-3b-U2dM-fi-tnxxGIr2NDErxSfMFBEVekK7I%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m27dlfqtd1.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                               ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-06 16:42                                 ` 'Nick Bart' via pandoc-discuss
  2021-04-06 18:14                                   ` Bastien DUMONT
  2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
  0 siblings, 2 replies; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-04-06 16:42 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Concerning French, I checked a few more sources, and some of them seem to hold different views on French collation: https://fr.wikipedia.org/wiki/Alphabet_fran%C3%A7ais states that diacritics should be disregarded when sorting, except in Quebec French, where accented characters are to appear after their unaccented counterparts. No "last syllable" rule is mentioned at all. In addition, in a printed French dictionary, Le Nouveau Petit Robert (1994), I couldn’t find any explicit rules on sorting, but entries are ordered "cote < coté < côte < côté". Hopefully some native speakers of French will chime in here.

As to supporting multiple collations, I tend to think that the default collation (which usually seems to follow the most recent rules for a given language) would usually be sufficient.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/lIJvVkf_iXceir6oyQVnvHDTXlTIgech_5Trj2TRBY6uBZ_AnU8ghvMV6not9E_QSwG0BhZJUnHprUcIN8UlAKrUw7DzQF5-ZpIki3TC74Q%3D%40protonmail.com.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-06 16:42                                 ` 'Nick Bart' via pandoc-discuss
@ 2021-04-06 18:14                                   ` Bastien DUMONT
  2021-04-06 23:38                                     ` John MacFarlane
  2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
  1 sibling, 1 reply; 34+ messages in thread
From: Bastien DUMONT @ 2021-04-06 18:14 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss

Hi,

Honestly, these are such subtleties that, as a native French speaker, I have no precise ideas about it. I would say that accents are only a secondary criterium for sorting (cote < côte < coteau). Actually the Wikipedia page about the French alphabet agrees with that: "diacritics and ligatures are taken into account only at a third level, after the second level (case). [...] In Quebec French diacritics are considered more important than case." (I hope my translation is not too bad.) Unfortunately they give no reference. As for the "last syllable" rule, I have never heard of it, but the French Academy's dictionary online has cote < côte < coté < côté (https://www.dictionnaire-academie.fr/article/A9C4445?history=2). Anyway I guess that it rarely applies. I will check a recent Robert whenever possible (maybe tomorrow): they introduced a lot of changes in 2010.

The French Association for Normalization produced a norm in 1969 about proper names' sorting, but it is behind a paywall and I am not sure that it is really in use.

Cheers,

Bastien

Le Tuesday 06 April 2021 à 04:42:40PM, 'Nick Bart' via pandoc-discuss a écrit :
> Concerning French, I checked a few more sources, and some of them seem to hold different views on French collation: https://fr.wikipedia.org/wiki/Alphabet_fran%C3%A7ais states that diacritics should be disregarded when sorting, except in Quebec French, where accented characters are to appear after their unaccented counterparts. No "last syllable" rule is mentioned at all. In addition, in a printed French dictionary, Le Nouveau Petit Robert (1994), I couldn’t find any explicit rules on sorting, but entries are ordered "cote < coté < côte < côté". Hopefully some native speakers of French will chime in here.
> 
> As to supporting multiple collations, I tend to think that the default collation (which usually seems to follow the most recent rules for a given language) would usually be sufficient.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/lIJvVkf_iXceir6oyQVnvHDTXlTIgech_5Trj2TRBY6uBZ_AnU8ghvMV6not9E_QSwG0BhZJUnHprUcIN8UlAKrUw7DzQF5-ZpIki3TC74Q%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YGylIXTe6M3FSBIl%40localhost.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-06 18:14                                   ` Bastien DUMONT
@ 2021-04-06 23:38                                     ` John MacFarlane
       [not found]                                       ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-06 23:38 UTC (permalink / raw)
  To: Bastien DUMONT, 'Nick Bart' via pandoc-discuss


I just checked my 2006 Le Robert Micro: it has

cote < côte < côté

coté appears as a subheading of cote, so I'm not sure it's
clear from this how it is to be ordered.  Not inconsistent
with the French Academy anyway.

Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org> writes:

> Hi,
>
> Honestly, these are such subtleties that, as a native French speaker, I have no precise ideas about it. I would say that accents are only a secondary criterium for sorting (cote < côte < coteau). Actually the Wikipedia page about the French alphabet agrees with that: "diacritics and ligatures are taken into account only at a third level, after the second level (case). [...] In Quebec French diacritics are considered more important than case." (I hope my translation is not too bad.) Unfortunately they give no reference. As for the "last syllable" rule, I have never heard of it, but the French Academy's dictionary online has cote < côte < coté < côté (https://www.dictionnaire-academie.fr/article/A9C4445?history=2). Anyway I guess that it rarely applies. I will check a recent Robert whenever possible (maybe tomorrow): they introduced a lot of changes in 2010.
>
> The French Association for Normalization produced a norm in 1969 about proper names' sorting, but it is behind a paywall and I am not sure that it is really in use.
>
> Cheers,
>
> Bastien
>
> Le Tuesday 06 April 2021 à 04:42:40PM, 'Nick Bart' via pandoc-discuss a écrit :
>> Concerning French, I checked a few more sources, and some of them seem to hold different views on French collation: https://fr.wikipedia.org/wiki/Alphabet_fran%C3%A7ais states that diacritics should be disregarded when sorting, except in Quebec French, where accented characters are to appear after their unaccented counterparts. No "last syllable" rule is mentioned at all. In addition, in a printed French dictionary, Le Nouveau Petit Robert (1994), I couldn’t find any explicit rules on sorting, but entries are ordered "cote < coté < côte < côté". Hopefully some native speakers of French will chime in here.
>>
>> As to supporting multiple collations, I tend to think that the default collation (which usually seems to follow the most recent rules for a given language) would usually be sufficient.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/lIJvVkf_iXceir6oyQVnvHDTXlTIgech_5Trj2TRBY6uBZ_AnU8ghvMV6not9E_QSwG0BhZJUnHprUcIN8UlAKrUw7DzQF5-ZpIki3TC74Q%3D%40protonmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YGylIXTe6M3FSBIl%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2h7kjoueo.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                                       ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-07  7:52                                         ` BPJ
       [not found]                                           ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: BPJ @ 2021-04-07  7:52 UTC (permalink / raw)
  To: pandoc-discuss; +Cc: Bastien DUMONT


[-- Attachment #1.1: Type: text/plain, Size: 5481 bytes --]

I tried this out with the latest Unicode::Collate::Locale

<
https://metacpan.org/pod/release/SADAHIRO/Unicode-Collate-1.29/Collate/Locale.pm
>

With all of fr_FR fr_CA fr_BE fr_Ch and both Normalization Form C and
Normalization Form D and it turns out that fr_CA actually is different!

Locale: fr_FR; getlocale: default
Normalization: NFC
Sorted: cote coté côte côté
Normalization: NFD
Sorted: cote coté côte côté
Locale: fr_CA; getlocale: fr_CA
Normalization: NFC
Sorted: cote côte coté côté
Normalization: NFD
Sorted: cote côte coté côté
Locale: fr_BE; getlocale: default
Normalization: NFC
Sorted: cote coté côte côté
Normalization: NFD
Sorted: cote coté côte côté
Locale: fr_CH; getlocale: default
Normalization: NFC
Sorted: cote coté côte côté
Normalization: NFD
Sorted: cote coté côte côté

If you want to try the script you will need to install the Unicode::Collate
CPAN distribution first, and perl if you are not on a Unixy system. See:

<http://www.cpan.org/modules/INSTALL.html>

<https://www.perl.org/get.html>

I recommend Strawberry Perl on Windows.

Den ons 7 apr. 2021 01:39John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> skrev:

>
> I just checked my 2006 Le Robert Micro: it has
>
> cote < côte < côté
>
> coté appears as a subheading of cote, so I'm not sure it's
> clear from this how it is to be ordered.  Not inconsistent
> with the French Academy anyway.
>
> Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org> writes:
>
> > Hi,
> >
> > Honestly, these are such subtleties that, as a native French speaker, I
> have no precise ideas about it. I would say that accents are only a
> secondary criterium for sorting (cote < côte < coteau). Actually the
> Wikipedia page about the French alphabet agrees with that: "diacritics and
> ligatures are taken into account only at a third level, after the second
> level (case). [...] In Quebec French diacritics are considered more
> important than case." (I hope my translation is not too bad.) Unfortunately
> they give no reference. As for the "last syllable" rule, I have never heard
> of it, but the French Academy's dictionary online has cote < côte < coté <
> côté (https://www.dictionnaire-academie.fr/article/A9C4445?history=2).
> Anyway I guess that it rarely applies. I will check a recent Robert
> whenever possible (maybe tomorrow): they introduced a lot of changes in
> 2010.
> >
> > The French Association for Normalization produced a norm in 1969 about
> proper names' sorting, but it is behind a paywall and I am not sure that it
> is really in use.
> >
> > Cheers,
> >
> > Bastien
> >
> > Le Tuesday 06 April 2021 à 04:42:40PM, 'Nick Bart' via pandoc-discuss a
> écrit :
> >> Concerning French, I checked a few more sources, and some of them seem
> to hold different views on French collation:
> https://fr.wikipedia.org/wiki/Alphabet_fran%C3%A7ais states that
> diacritics should be disregarded when sorting, except in Quebec French,
> where accented characters are to appear after their unaccented
> counterparts. No "last syllable" rule is mentioned at all. In addition, in
> a printed French dictionary, Le Nouveau Petit Robert (1994), I couldn’t
> find any explicit rules on sorting, but entries are ordered "cote < coté <
> côte < côté". Hopefully some native speakers of French will chime in here.
> >>
> >> As to supporting multiple collations, I tend to think that the default
> collation (which usually seems to follow the most recent rules for a given
> language) would usually be sufficient.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/lIJvVkf_iXceir6oyQVnvHDTXlTIgech_5Trj2TRBY6uBZ_AnU8ghvMV6not9E_QSwG0BhZJUnHprUcIN8UlAKrUw7DzQF5-ZpIki3TC74Q%3D%40protonmail.com
> .
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/YGylIXTe6M3FSBIl%40localhost
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/m2h7kjoueo.fsf%40MacBook-Pro.hsd1.ca.comcast.net
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBpFS7Mq7NriLc8wexqwwLsEy%2B9OmBiNWbPaMgYKy8jbw%40mail.gmail.com.

[-- Attachment #1.2: Type: text/html, Size: 8282 bytes --]

[-- Attachment #2: french-sorting.pl --]
[-- Type: text/x-perl, Size: 705 bytes --]

#!/usr/bin/env perl

use 5.014;
# use utf8;
use utf8::all;
use strict;
use warnings;
use warnings FATAL => 'utf8';
use autodie;

# use open qw[ :utf8 :std ];

use Unicode::Collate::Locale;

my @nfd = qw[ côté cote côte coté ];
my @nfc = qw[ côté cote côte coté ];

my @locales = qw[ fr_FR fr_CA fr_BE fr_CH ];

my @norms = (
  [ NFC => \@nfc ],
  [ NFD => \@nfd ],
);

for my $locale ( @locales ) {
  my $coll = Unicode::Collate::Locale->new(locale => $locale);
  say sprintf "Locale: $locale; getlocale: %s", $coll->getlocale;
  for my $norm ( @norms ) {
    my($name, $words) = @$norm;
    say "Normalization: $name";
    my @sorted = $coll->sort(@$words);
    say "Sorted: @sorted";
  }
}


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-06 16:42                                 ` 'Nick Bart' via pandoc-discuss
  2021-04-06 18:14                                   ` Bastien DUMONT
@ 2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
  2021-04-07 10:02                                     ` Bastien DUMONT
                                                       ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-04-07  9:35 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Bastien, BJP - many thanks, that’s helpful. Still, the main practical question,
I guess, is whether the default sort order the "new" pandoc generates for French
- either with or without the "optFrenchAccents" modification - is acceptable
from the point of view of a native speaker of French or not, and if not, what
you would suggest instead.


As to multiple collations, I commented earlier:

> ... I tend to think that the default collation (which usually seems to follow
> the most recent rules for a given language) would usually be sufficient.

That being said, it seems that most of the information (in
https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
infrastructure for supporting different collation systems for a given language is
in place already, so the following might be worth a try:

pandoc is relying on IETF BCP 47 language tags anyway
[https://tools.ietf.org/rfc/bcp/bcp47.txt].

A number of locale attributes contained in the Common Locale Data Repository
(CLDR), including those pertaining to collation, can be expressed as extensions
to "simple" language tags of the form "en-US".

IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
[https://tools.ietf.org/html/rfc6067]. Relevant quote:

>    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
>
>    o  The base language tag "de-DE" (German as used in Germany), exactly as
>    defined by [BCP47] using subtags from the IANA Language Subtag Registry.
>
>    o  The singleton 'u', identifying this extension.
>
>    o  The attribute 'attr', which is an example for illustration (no
>    attributes were defined at the time this document was published).
>
>    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the
>    type 'phonebk' (Phonebook collation order).

On IETF BCP 47 extensions, see also
https://www.w3.org/International/articles/language-tags/#extension.

So if this does not appear too difficult, it might provide a lot of additional
flexibility if pandoc were to support the particular subset of "Extension U"
strings pertaining to collation, i.e., those starting with "-u-co-" in pandoc's
"lang" metadata field, or command line argument. (In the absence of such a string,
pandoc should of course use the default collation order.)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                                           ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2021-04-07  9:37                                             ` BPJ
  0 siblings, 0 replies; 34+ messages in thread
From: BPJ @ 2021-04-07  9:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 6072 bytes --]

I updated my script to be configurable so that you can try various locales,
normalization forms and lists of words with
perl/Unicode::Collate::Locale/Unicode::Normalize.

Info on required CPAN modules/perl version are in a comment at the top of
the file.

After installing the requirements use the --help option for usage
instructions.


Den ons 7 apr. 2021 09:52BPJ <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> skrev:

> I tried this out with the latest Unicode::Collate::Locale
>
> <
> https://metacpan.org/pod/release/SADAHIRO/Unicode-Collate-1.29/Collate/Locale.pm
> >
>
> With all of fr_FR fr_CA fr_BE fr_Ch and both Normalization Form C and
> Normalization Form D and it turns out that fr_CA actually is different!
>
> Locale: fr_FR; getlocale: default
> Normalization: NFC
> Sorted: cote coté côte côté
> Normalization: NFD
> Sorted: cote coté côte côté
> Locale: fr_CA; getlocale: fr_CA
> Normalization: NFC
> Sorted: cote côte coté côté
> Normalization: NFD
> Sorted: cote côte coté côté
> Locale: fr_BE; getlocale: default
> Normalization: NFC
> Sorted: cote coté côte côté
> Normalization: NFD
> Sorted: cote coté côte côté
> Locale: fr_CH; getlocale: default
> Normalization: NFC
> Sorted: cote coté côte côté
> Normalization: NFD
> Sorted: cote coté côte côté
>
> If you want to try the script you will need to install the
> Unicode::Collate CPAN distribution first, and perl if you are not on a
> Unixy system. See:
>
> <http://www.cpan.org/modules/INSTALL.html>
>
> <https://www.perl.org/get.html>
>
> I recommend Strawberry Perl on Windows.
>
> Den ons 7 apr. 2021 01:39John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> skrev:
>
>>
>> I just checked my 2006 Le Robert Micro: it has
>>
>> cote < côte < côté
>>
>> coté appears as a subheading of cote, so I'm not sure it's
>> clear from this how it is to be ordered.  Not inconsistent
>> with the French Academy anyway.
>>
>> Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org> writes:
>>
>> > Hi,
>> >
>> > Honestly, these are such subtleties that, as a native French speaker, I
>> have no precise ideas about it. I would say that accents are only a
>> secondary criterium for sorting (cote < côte < coteau). Actually the
>> Wikipedia page about the French alphabet agrees with that: "diacritics and
>> ligatures are taken into account only at a third level, after the second
>> level (case). [...] In Quebec French diacritics are considered more
>> important than case." (I hope my translation is not too bad.) Unfortunately
>> they give no reference. As for the "last syllable" rule, I have never heard
>> of it, but the French Academy's dictionary online has cote < côte < coté <
>> côté (https://www.dictionnaire-academie.fr/article/A9C4445?history=2).
>> Anyway I guess that it rarely applies. I will check a recent Robert
>> whenever possible (maybe tomorrow): they introduced a lot of changes in
>> 2010.
>> >
>> > The French Association for Normalization produced a norm in 1969 about
>> proper names' sorting, but it is behind a paywall and I am not sure that it
>> is really in use.
>> >
>> > Cheers,
>> >
>> > Bastien
>> >
>> > Le Tuesday 06 April 2021 à 04:42:40PM, 'Nick Bart' via pandoc-discuss a
>> écrit :
>> >> Concerning French, I checked a few more sources, and some of them seem
>> to hold different views on French collation:
>> https://fr.wikipedia.org/wiki/Alphabet_fran%C3%A7ais states that
>> diacritics should be disregarded when sorting, except in Quebec French,
>> where accented characters are to appear after their unaccented
>> counterparts. No "last syllable" rule is mentioned at all. In addition, in
>> a printed French dictionary, Le Nouveau Petit Robert (1994), I couldn’t
>> find any explicit rules on sorting, but entries are ordered "cote < coté <
>> côte < côté". Hopefully some native speakers of French will chime in here.
>> >>
>> >> As to supporting multiple collations, I tend to think that the default
>> collation (which usually seems to follow the most recent rules for a given
>> language) would usually be sufficient.
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/lIJvVkf_iXceir6oyQVnvHDTXlTIgech_5Trj2TRBY6uBZ_AnU8ghvMV6not9E_QSwG0BhZJUnHprUcIN8UlAKrUw7DzQF5-ZpIki3TC74Q%3D%40protonmail.com
>> .
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/YGylIXTe6M3FSBIl%40localhost
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/m2h7kjoueo.fsf%40MacBook-Pro.hsd1.ca.comcast.net
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDZHQYcZQog7i3DiwFG%3D2T3WeefE_w3hUbfrq0o1FEiYQ%40mail.gmail.com.

[-- Attachment #1.2: Type: text/html, Size: 9411 bytes --]

[-- Attachment #2: try-locale-sorting.pl --]
[-- Type: text/x-perl, Size: 2664 bytes --]

#!/usr/bin/env perl

# Try out sorting according to various locales with Unicode::Collate::Locale and normalization forms with Unicode::Normalize.
#
# Requires the following CPAN modules to be installed:
#
# utf8::all
#
# Unicode::Collate::Locale
#
# Unicode::Normalize
#
# Path::Tiny
#
# Getopt::Long::Descriptive
#
# See: 
# <http://www.cpan.org/modules/INSTALL.html>
#
# Also requires perl 5.10.1 or later.
#
# If you are on a Unixy system you probably have a new enough perl installed.
# Otherwise see:
# <https://www.perl.org/get.html>
#
# On Windows I would recommend Strawberry Perl.
#
# This software is copyright (c) 2021 by Benct Philip Jonsson.
#
# This is free software; you can redistribute it and/or modify it under
# the same terms as the Perl 5 programming language system itself.
#
# http://dev.perl.org/licenses/
#

use 5.010001;
# use utf8;
use utf8::all;
use strict;
use warnings;
use warnings FATAL => 'utf8';
use autodie;

# use open qw[ :utf8 :std ];

use Unicode::Collate::Locale;
use Unicode::Normalize qw[normalize];
use Path::Tiny qw[path];
use Getopt::Long::Descriptive;

my($opt,$usage) = describe_options(
  '%c %o',
  [ 'locale|l=s@', 'A locale to try like "fr" or "fr-CA". Repeatable.',
    +{ required => 1 },
  ],
  [ 'normalize|n=s@',
    'A Unicode Normalization Form according to Unicode::Normalize to apply like NFC or NFD. For unnormalized say -n 0 (zero). Repeatable. Default: NFC.',
    +{ default => ['NFC'] },
  ],
  [ 'input|i=s', 'Name of text file with lines to sort. Assumed to be UTF-8 encoded.',
    + { required => 1 },
  ],
  [ 'output|o=s', 'Name of output file to print to. Optional. Default: stdout.',
  ],
  [ 'help|h', 'Print help text and exit.',
    +{ shortcircuit => 1 },
  ],
  +{
    show_defaults => 0,
    getopt_conf => [qw(no_auto_abbrev no_bundling no_ignore_case)],
  },
);

if ( $opt->help ) {
  say "$0: try out sorting according to various locales with Unicode::Collate::Locale and normalization forms with Unicode::Normalize.";
  print $usage->text;
  exit;
}

my $locales = $opt->locale;
my $norms   = $opt->normalize;
my $in      = $opt->input;
my $out     = $opt->output;

my $fh = $out ? path($out)->openw_utf8 : \*STDOUT;

select $fh;

my @lines = path($in)->lines_utf8;

for my $locale ( @$locales ) {
  my $coll = Unicode::Collate::Locale->new(locale => $locale);
  printf "Locale: $locale; getlocale: %s\n\n", $coll->getlocale;
  for my $norm ( @$norms ) {
    print "Normalization: $norm\n\n";
    my @normed = $norm ? (map { normalize $norm, $_ } @lines) : @lines;
    my @sorted = $coll->sort(@normed);
    print "Sorted:\n\n@sorted\n\n";
  }
}

select STDOUT;

close $fh;

exit;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
@ 2021-04-07 10:02                                     ` Bastien DUMONT
  2021-04-07 12:32                                     ` BPJ
  2021-04-08  1:41                                     ` John MacFarlane
  2 siblings, 0 replies; 34+ messages in thread
From: Bastien DUMONT @ 2021-04-07 10:02 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss

Just to be clear, now we have "cote côte coté côté" with optFrenchAccents and "cote coté côte côté" without? I'm perplex. I will ask people with a more nazi-like mind than me.

Le Wednesday 07 April 2021 à 09:35:22AM, 'Nick Bart' via pandoc-discuss a écrit :
> Bastien, BJP - many thanks, that’s helpful. Still, the main practical question,
> I guess, is whether the default sort order the "new" pandoc generates for French
> - either with or without the "optFrenchAccents" modification - is acceptable
> from the point of view of a native speaker of French or not, and if not, what
> you would suggest instead.
> 
> 
> As to multiple collations, I commented earlier:
> 
> > ... I tend to think that the default collation (which usually seems to follow
> > the most recent rules for a given language) would usually be sufficient.
> 
> That being said, it seems that most of the information (in
> https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
> infrastructure for supporting different collation systems for a given language is
> in place already, so the following might be worth a try:
> 
> pandoc is relying on IETF BCP 47 language tags anyway
> [https://tools.ietf.org/rfc/bcp/bcp47.txt].
> 
> A number of locale attributes contained in the Common Locale Data Repository
> (CLDR), including those pertaining to collation, can be expressed as extensions
> to "simple" language tags of the form "en-US".
> 
> IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
> [https://tools.ietf.org/html/rfc6067]. Relevant quote:
> 
> >    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
> >
> >    o  The base language tag "de-DE" (German as used in Germany), exactly as
> >    defined by [BCP47] using subtags from the IANA Language Subtag Registry.
> >
> >    o  The singleton 'u', identifying this extension.
> >
> >    o  The attribute 'attr', which is an example for illustration (no
> >    attributes were defined at the time this document was published).
> >
> >    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the
> >    type 'phonebk' (Phonebook collation order).
> 
> On IETF BCP 47 extensions, see also
> https://www.w3.org/International/articles/language-tags/#extension.
> 
> So if this does not appear too difficult, it might provide a lot of additional
> flexibility if pandoc were to support the particular subset of "Extension U"
> strings pertaining to collation, i.e., those starting with "-u-co-" in pandoc's
> "lang" metadata field, or command line argument. (In the absence of such a string,
> pandoc should of course use the default collation order.)
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YG2DP7GDjEdDD3sM%40localhost.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
  2021-04-07 10:02                                     ` Bastien DUMONT
@ 2021-04-07 12:32                                     ` BPJ
  2021-04-08  1:41                                     ` John MacFarlane
  2 siblings, 0 replies; 34+ messages in thread
From: BPJ @ 2021-04-07 12:32 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 3857 bytes --]

Nick, John, check out the documentation and code for
Unicode::Collate::Locale which I linked for a lucid description of what
seems to be the official syntax for locale tags, and an algorithm to handle
them, progressively falling back to something more general depending on
what is (not) included in the tag.

Den ons 7 apr. 2021 11:36'Nick Bart' via pandoc-discuss <
pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> skrev:

> Bastien, BJP - many thanks, that’s helpful. Still, the main practical
> question,
> I guess, is whether the default sort order the "new" pandoc generates for
> French
> - either with or without the "optFrenchAccents" modification - is
> acceptable
> from the point of view of a native speaker of French or not, and if not,
> what
> you would suggest instead.
>
>
> As to multiple collations, I commented earlier:
>
> > ... I tend to think that the default collation (which usually seems to
> follow
> > the most recent rules for a given language) would usually be sufficient.
>
> That being said, it seems that most of the information (in
> https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
> infrastructure for supporting different collation systems for a given
> language is
> in place already, so the following might be worth a try:
>
> pandoc is relying on IETF BCP 47 language tags anyway
> [https://tools.ietf.org/rfc/bcp/bcp47.txt].
>
> A number of locale attributes contained in the Common Locale Data
> Repository
> (CLDR), including those pertaining to collation, can be expressed as
> extensions
> to "simple" language tags of the form "en-US".
>
> IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
> [https://tools.ietf.org/html/rfc6067]. Relevant quote:
>
> >    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
> >
> >    o  The base language tag "de-DE" (German as used in Germany), exactly
> as
> >    defined by [BCP47] using subtags from the IANA Language Subtag
> Registry.
> >
> >    o  The singleton 'u', identifying this extension.
> >
> >    o  The attribute 'attr', which is an example for illustration (no
> >    attributes were defined at the time this document was published).
> >
> >    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation)
> and the
> >    type 'phonebk' (Phonebook collation order).
>
> On IETF BCP 47 extensions, see also
> https://www.w3.org/International/articles/language-tags/#extension.
>
> So if this does not appear too difficult, it might provide a lot of
> additional
> flexibility if pandoc were to support the particular subset of "Extension
> U"
> strings pertaining to collation, i.e., those starting with "-u-co-" in
> pandoc's
> "lang" metadata field, or command line argument. (In the absence of such a
> string,
> pandoc should of course use the default collation order.)
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAQhmzFhxTMFmJYSBjWb_wU%2Bi1dJnPCVREngwpO8zXdsg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5480 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
  2021-04-07 10:02                                     ` Bastien DUMONT
  2021-04-07 12:32                                     ` BPJ
@ 2021-04-08  1:41                                     ` John MacFarlane
       [not found]                                       ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  2 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-08  1:41 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


I note that data/collation/fr_CA.xml has

   [backwards 2]

and data/collation/fr.xml does not.

'backwards 2' says to sort the second-level collation elements
backwards; that's what the "French accents" option does.  So that
explains the perl script's behavior; it is faithfully following
the locales, which specify this for Canadian French but not
European French.

My parser for collation files currently does nothing with the
`[backwards 2]`, but maybe it's something I should implement.

"'Nick Bart' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

> Bastien, BJP - many thanks, that’s helpful. Still, the main practical question,
> I guess, is whether the default sort order the "new" pandoc generates for French
> - either with or without the "optFrenchAccents" modification - is acceptable
> from the point of view of a native speaker of French or not, and if not, what
> you would suggest instead.
>
>
> As to multiple collations, I commented earlier:
>
>> ... I tend to think that the default collation (which usually seems to follow
>> the most recent rules for a given language) would usually be sufficient.
>
> That being said, it seems that most of the information (in
> https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
> infrastructure for supporting different collation systems for a given language is
> in place already, so the following might be worth a try:
>
> pandoc is relying on IETF BCP 47 language tags anyway
> [https://tools.ietf.org/rfc/bcp/bcp47.txt].
>
> A number of locale attributes contained in the Common Locale Data Repository
> (CLDR), including those pertaining to collation, can be expressed as extensions
> to "simple" language tags of the form "en-US".
>
> IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
> [https://tools.ietf.org/html/rfc6067]. Relevant quote:
>
>>    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
>>
>>    o  The base language tag "de-DE" (German as used in Germany), exactly as
>>    defined by [BCP47] using subtags from the IANA Language Subtag Registry.
>>
>>    o  The singleton 'u', identifying this extension.
>>
>>    o  The attribute 'attr', which is an example for illustration (no
>>    attributes were defined at the time this document was published).
>>
>>    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the
>>    type 'phonebk' (Phonebook collation order).
>
> On IETF BCP 47 extensions, see also
> https://www.w3.org/International/articles/language-tags/#extension.
>
> So if this does not appear too difficult, it might provide a lot of additional
> flexibility if pandoc were to support the particular subset of "Extension U"
> strings pertaining to collation, i.e., those starting with "-u-co-" in pandoc's
> "lang" metadata field, or command line argument. (In the absence of such a string,
> pandoc should of course use the default collation order.)
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2wntdo8m2.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                                       ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-08  2:23                                         ` John MacFarlane
       [not found]                                           ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-08  2:23 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


On second thought, leaving it as an option makes a lot of sense.
We wouldn't want to force fr-FR to be sorted contrary to the French
Academy's official dictionary...

The question is really how to pass this kind of option through
pandoc/citeproc, if it's going to be user-selectable for fr.
It looks like there's a BCP 47 key "kb" corresponding to "backwards 2",
https://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
so maybe one just says fr-FR-u-kb-true or (canonical equivalent
according to 3.2.1) fr-FR-u-kb

For alternative collations for a language we could do the same,
e.g. es-ES-u-co-traditional.

Parsing and representing these complex language tags is started
to get pretty complicated!

John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I note that data/collation/fr_CA.xml has
>
>    [backwards 2]
>
> and data/collation/fr.xml does not.
>
> 'backwards 2' says to sort the second-level collation elements
> backwards; that's what the "French accents" option does.  So that
> explains the perl script's behavior; it is faithfully following
> the locales, which specify this for Canadian French but not
> European French.
>
> My parser for collation files currently does nothing with the
> `[backwards 2]`, but maybe it's something I should implement.
>
> "'Nick Bart' via pandoc-discuss"
> <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
>
>> Bastien, BJP - many thanks, that’s helpful. Still, the main practical question,
>> I guess, is whether the default sort order the "new" pandoc generates for French
>> - either with or without the "optFrenchAccents" modification - is acceptable
>> from the point of view of a native speaker of French or not, and if not, what
>> you would suggest instead.
>>
>>
>> As to multiple collations, I commented earlier:
>>
>>> ... I tend to think that the default collation (which usually seems to follow
>>> the most recent rules for a given language) would usually be sufficient.
>>
>> That being said, it seems that most of the information (in
>> https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
>> infrastructure for supporting different collation systems for a given language is
>> in place already, so the following might be worth a try:
>>
>> pandoc is relying on IETF BCP 47 language tags anyway
>> [https://tools.ietf.org/rfc/bcp/bcp47.txt].
>>
>> A number of locale attributes contained in the Common Locale Data Repository
>> (CLDR), including those pertaining to collation, can be expressed as extensions
>> to "simple" language tags of the form "en-US".
>>
>> IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
>> [https://tools.ietf.org/html/rfc6067]. Relevant quote:
>>
>>>    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
>>>
>>>    o  The base language tag "de-DE" (German as used in Germany), exactly as
>>>    defined by [BCP47] using subtags from the IANA Language Subtag Registry.
>>>
>>>    o  The singleton 'u', identifying this extension.
>>>
>>>    o  The attribute 'attr', which is an example for illustration (no
>>>    attributes were defined at the time this document was published).
>>>
>>>    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the
>>>    type 'phonebk' (Phonebook collation order).
>>
>> On IETF BCP 47 extensions, see also
>> https://www.w3.org/International/articles/language-tags/#extension.
>>
>> So if this does not appear too difficult, it might provide a lot of additional
>> flexibility if pandoc were to support the particular subset of "Extension U"
>> strings pertaining to collation, i.e., those starting with "-u-co-" in pandoc's
>> "lang" metadata field, or command line argument. (In the absence of such a string,
>> pandoc should of course use the default collation order.)
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o8epo6p8.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                                           ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-08  7:12                                             ` Bastien DUMONT
  2021-04-09 15:34                                             ` John MacFarlane
  1 sibling, 0 replies; 34+ messages in thread
From: Bastien DUMONT @ 2021-04-08  7:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

It would be the most complete and flexible option, but implementing the regional subvariations may be enough. Most people don't have a precise idea about sorting rules, except that letters with diacritics should be placed after their counterparts without diacritics, so letting the library enforce the official rules for the locale they choose makes sense. This said, more options would be best, if you have the motivation to do it!

As for French, the 2012 edition of the Petit Robert has cote > côte > coté > côté.

Le Wednesday 07 April 2021 à 07:23:15PM, John MacFarlane a écrit :
> 
> On second thought, leaving it as an option makes a lot of sense.
> We wouldn't want to force fr-FR to be sorted contrary to the French
> Academy's official dictionary...
> 
> The question is really how to pass this kind of option through
> pandoc/citeproc, if it's going to be user-selectable for fr.
> It looks like there's a BCP 47 key "kb" corresponding to "backwards 2",
> https://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
> so maybe one just says fr-FR-u-kb-true or (canonical equivalent
> according to 3.2.1) fr-FR-u-kb
> 
> For alternative collations for a language we could do the same,
> e.g. es-ES-u-co-traditional.
> 
> Parsing and representing these complex language tags is started
> to get pretty complicated!
> 
> John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> 
> > I note that data/collation/fr_CA.xml has
> >
> >    [backwards 2]
> >
> > and data/collation/fr.xml does not.
> >
> > 'backwards 2' says to sort the second-level collation elements
> > backwards; that's what the "French accents" option does.  So that
> > explains the perl script's behavior; it is faithfully following
> > the locales, which specify this for Canadian French but not
> > European French.
> >
> > My parser for collation files currently does nothing with the
> > `[backwards 2]`, but maybe it's something I should implement.
> >
> > "'Nick Bart' via pandoc-discuss"
> > <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
> >
> >> Bastien, BJP - many thanks, that’s helpful. Still, the main practical question,
> >> I guess, is whether the default sort order the "new" pandoc generates for French
> >> - either with or without the "optFrenchAccents" modification - is acceptable
> >> from the point of view of a native speaker of French or not, and if not, what
> >> you would suggest instead.
> >>
> >>
> >> As to multiple collations, I commented earlier:
> >>
> >>> ... I tend to think that the default collation (which usually seems to follow
> >>> the most recent rules for a given language) would usually be sufficient.
> >>
> >> That being said, it seems that most of the information (in
> >> https://github.com/jgm/unicode-collation/tree/main/data) and, I assume,
> >> infrastructure for supporting different collation systems for a given language is
> >> in place already, so the following might be worth a try:
> >>
> >> pandoc is relying on IETF BCP 47 language tags anyway
> >> [https://tools.ietf.org/rfc/bcp/bcp47.txt].
> >>
> >> A number of locale attributes contained in the Common Locale Data Repository
> >> (CLDR), including those pertaining to collation, can be expressed as extensions
> >> to "simple" language tags of the form "en-US".
> >>
> >> IETF BCP 47 Extension U (Unicode Locale) is described in RFC 6067
> >> [https://tools.ietf.org/html/rfc6067]. Relevant quote:
> >>
> >>>    For example, the language tag "de-DE-u-attr-co-phonebk" consists of:
> >>>
> >>>    o  The base language tag "de-DE" (German as used in Germany), exactly as
> >>>    defined by [BCP47] using subtags from the IANA Language Subtag Registry.
> >>>
> >>>    o  The singleton 'u', identifying this extension.
> >>>
> >>>    o  The attribute 'attr', which is an example for illustration (no
> >>>    attributes were defined at the time this document was published).
> >>>
> >>>    o  The keyword 'co-phonebk', consisting to the key 'co' (Collation) and the
> >>>    type 'phonebk' (Phonebook collation order).
> >>
> >> On IETF BCP 47 extensions, see also
> >> https://www.w3.org/International/articles/language-tags/#extension.
> >>
> >> So if this does not appear too difficult, it might provide a lot of additional
> >> flexibility if pandoc were to support the particular subset of "Extension U"
> >> strings pertaining to collation, i.e., those starting with "-u-co-" in pandoc's
> >> "lang" metadata field, or command line argument. (In the absence of such a string,
> >> pandoc should of course use the default collation order.)
> >>
> >> -- 
> >> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fkSA06gm5QfCBknaCRunOSZwTsdOX6DMRGx0IQVOs9yszm16IeaCsTwX_cV-nhZ1kQ0LDEkxylV4IKJzSuiZbkjx3HSyD2NLgJTkW9DQB6U%3D%40protonmail.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o8epo6p8.fsf%40MacBook-Pro.hsd1.ca.comcast.net.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YG6s4b/U9A%2Bab6qs%40localhost.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                                           ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  2021-04-08  7:12                                             ` Bastien DUMONT
@ 2021-04-09 15:34                                             ` John MacFarlane
  1 sibling, 0 replies; 34+ messages in thread
From: John MacFarlane @ 2021-04-09 15:34 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


I've got the complex language tags working now.

John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On second thought, leaving it as an option makes a lot of sense.
> We wouldn't want to force fr-FR to be sorted contrary to the French
> Academy's official dictionary...
>
> The question is really how to pass this kind of option through
> pandoc/citeproc, if it's going to be user-selectable for fr.
> It looks like there's a BCP 47 key "kb" corresponding to "backwards 2",
> https://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
> so maybe one just says fr-FR-u-kb-true or (canonical equivalent
> according to 3.2.1) fr-FR-u-kb
>
> For alternative collations for a language we could do the same,
> e.g. es-ES-u-co-traditional.
>
> Parsing and representing these complex language tags is started
> to get pretty complicated!
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-03-22 14:29         ` 'Nick Bart' via pandoc-discuss
@ 2021-04-17 23:19           ` John MacFarlane
       [not found]             ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-17 23:19 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Pandoc master branch now links to unicode-collation-enabled citeproc!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]             ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-19  9:54               ` 'Nick Bart' via pandoc-discuss
  2021-04-19 11:10                 ` Bastien DUMONT
  2021-04-19 16:16                 ` John MacFarlane
  0 siblings, 2 replies; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-04-19  9:54 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Great news. pandoc and citeproc supporting Unicode root collation alone is a big step forward, and getting collation for most, if not all other locales right is even better.

I have been testing a few more locales (my current list includes cy da de de-DE de-u-co-phonebk de-DE-u-co-phonebk de-AT-u-co-phonebk nl es es-u-co-search es-u-co-standard es-u-co-trad et fi fi-u-co-phonebk fo fr fr-CA fr-u-kb fr-FR-u-kb hu is nb nn pl sv sv-u-co-reformed), and all these look ok to me. (Caveat: My tests have been rather superficial so far, usually just a few lines with one or two letters focusing on those bits that diverge from root [e.g., sv: Aa / Ab / B / C / Va / Wa / Vz / Wz / X / Y / Z / Å / Ä / Ö, and still mainly based on info from https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions alone.)

I’ll try to continue testing other locales as well, but there are areas such as CJK that I’ll have to leave to others entirely.

In addition, a few questions:

It seems that "xx-XX" tags can be used instead of "xx" in all contexts, even if not listed in README.md. For example, sv-SV-u-co-reformed falls back to sv-u-co-reformed, and de-DE-u-co-phonebk to de-u-co-phonebk. Is this supposed to be true across the board?

Some collations such as fr-u-kb seem to be gone now (the most recent unicode-collate --verbose tells me fr-u-kb falls back to Tailoring: ROOT). So it seems if you want "cote < côte < coté < côté" (following the "last syllable" rule), you’ll have to use fr-CA. A (maybe hypothetical) question: What about side effects of fr-CA, e.g., suppose you want the "last syllable" collation, but, assuming these are set automatically via the lang tag as well, a non-Canadian date format, currency symbol, etc. (conceivably when using latex commands such as \today in a pandoc -> latex/babel-or-polyglossia workflow)?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a5FISVkGHOzOebu7thoi1x6PhsdcORhT6yS5hlSVCdI4B_vWSH5AKyWM_AHW-88lz5EaYaQs8uAjmk8xG2aJqLFNUCvH41c7yCUSzklMaQk%3D%40protonmail.com.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-19  9:54               ` 'Nick Bart' via pandoc-discuss
@ 2021-04-19 11:10                 ` Bastien DUMONT
  2021-04-19 12:56                   ` 'Nick Bart' via pandoc-discuss
  2021-04-19 16:16                 ` John MacFarlane
  1 sibling, 1 reply; 34+ messages in thread
From: Bastien DUMONT @ 2021-04-19 11:10 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss

Doesn't fr-FR enforce the last syllable rule?

Le Monday 19 April 2021 à 09:54:07AM, 'Nick Bart' via pandoc-discuss a écrit :
> Great news. pandoc and citeproc supporting Unicode root collation alone is a big step forward, and getting collation for most, if not all other locales right is even better.
> 
> I have been testing a few more locales (my current list includes cy da de de-DE de-u-co-phonebk de-DE-u-co-phonebk de-AT-u-co-phonebk nl es es-u-co-search es-u-co-standard es-u-co-trad et fi fi-u-co-phonebk fo fr fr-CA fr-u-kb fr-FR-u-kb hu is nb nn pl sv sv-u-co-reformed), and all these look ok to me. (Caveat: My tests have been rather superficial so far, usually just a few lines with one or two letters focusing on those bits that diverge from root [e.g., sv: Aa / Ab / B / C / Va / Wa / Vz / Wz / X / Y / Z / Å / Ä / Ö, and still mainly based on info from https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions alone.)
> 
> I’ll try to continue testing other locales as well, but there are areas such as CJK that I’ll have to leave to others entirely.
> 
> In addition, a few questions:
> 
> It seems that "xx-XX" tags can be used instead of "xx" in all contexts, even if not listed in README.md. For example, sv-SV-u-co-reformed falls back to sv-u-co-reformed, and de-DE-u-co-phonebk to de-u-co-phonebk. Is this supposed to be true across the board?
> 
> Some collations such as fr-u-kb seem to be gone now (the most recent unicode-collate --verbose tells me fr-u-kb falls back to Tailoring: ROOT). So it seems if you want "cote < côte < coté < côté" (following the "last syllable" rule), you’ll have to use fr-CA. A (maybe hypothetical) question: What about side effects of fr-CA, e.g., suppose you want the "last syllable" collation, but, assuming these are set automatically via the lang tag as well, a non-Canadian date format, currency symbol, etc. (conceivably when using latex commands such as \today in a pandoc -> latex/babel-or-polyglossia workflow)?
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a5FISVkGHOzOebu7thoi1x6PhsdcORhT6yS5hlSVCdI4B_vWSH5AKyWM_AHW-88lz5EaYaQs8uAjmk8xG2aJqLFNUCvH41c7yCUSzklMaQk%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YH1lHx3DAwsLt3iO%40localhost.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-19 11:10                 ` Bastien DUMONT
@ 2021-04-19 12:56                   ` 'Nick Bart' via pandoc-discuss
  2021-04-19 13:16                     ` Bastien DUMONT
  0 siblings, 1 reply; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-04-19 12:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

> Doesn't fr-FR enforce the last syllable rule?

No. fr-FR falls back to the Unicode root collation, just like fr, and generates:

cote
coté
côte
côté


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/rnDIO6M7fMlf-C2qUTJ0TlQNdXuNfEB7tuL1tvxU8y-ujXg-ZstjGP5pFA-hM0Nt51o595V34_q2XsXpfvmCPmpTcEPe5UtNqkZuyFDEeyk%3D%40protonmail.com.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-19 12:56                   ` 'Nick Bart' via pandoc-discuss
@ 2021-04-19 13:16                     ` Bastien DUMONT
  2021-04-19 16:19                       ` John MacFarlane
  0 siblings, 1 reply; 34+ messages in thread
From: Bastien DUMONT @ 2021-04-19 13:16 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss

It should!

Le Monday 19 April 2021 à 12:56:35PM, 'Nick Bart' via pandoc-discuss a écrit :
> > Doesn't fr-FR enforce the last syllable rule?
> 
> No. fr-FR falls back to the Unicode root collation, just like fr, and generates:
> 
> cote
> coté
> côte
> côté
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/rnDIO6M7fMlf-C2qUTJ0TlQNdXuNfEB7tuL1tvxU8y-ujXg-ZstjGP5pFA-hM0Nt51o595V34_q2XsXpfvmCPmpTcEPe5UtNqkZuyFDEeyk%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YH2Crt90jNRADgz3%40localhost.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-19  9:54               ` 'Nick Bart' via pandoc-discuss
  2021-04-19 11:10                 ` Bastien DUMONT
@ 2021-04-19 16:16                 ` John MacFarlane
       [not found]                   ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  1 sibling, 1 reply; 34+ messages in thread
From: John MacFarlane @ 2021-04-19 16:16 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

"'Nick Bart' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

> It seems that "xx-XX" tags can be used instead of "xx" in all contexts, even if not listed in README.md. For example, sv-SV-u-co-reformed falls back to sv-u-co-reformed, and de-DE-u-co-phonebk to de-u-co-phonebk. Is this supposed to be true across the board?

Yes -- we have a fallback algorithm.  In general xx-YY will match
xx if there is no tailoring defined for xx-YY specifically.
(Similarly for scripts and collations.  Thus, xx-Cyrl will match
xx if xx-Cyrl is not specifically defined, and xx-u-co-phonebk
will match xx if a phonebk collation is not defined for xx.
But the other around around is not possible:  if you request
de, it won't match de-u-co-phonebk, even though there is no
de tailoring defined...this was the bug you reported earlier.)

> Some collations such as fr-u-kb seem to be gone now (the most recent unicode-collate --verbose tells me fr-u-kb falls back to Tailoring: ROOT).

That only refers to the tailoring itself, not to the collator
options set by -u-kb.  So, in this example fr-u-kb still works
(sets backwards accent sorting), but the verbose output only
tells you about the tailoring match.

> So it seems if you want "cote < côte < coté < côté" (following the "last syllable" rule), you’ll have to use fr-CA.

No, you can still use fr-u-kb

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m235vmdzbh.fsf%40MacBook-Pro.hsd1.ca.comcast.net.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-19 13:16                     ` Bastien DUMONT
@ 2021-04-19 16:19                       ` John MacFarlane
  0 siblings, 0 replies; 34+ messages in thread
From: John MacFarlane @ 2021-04-19 16:19 UTC (permalink / raw)
  To: Bastien DUMONT, 'Nick Bart' via pandoc-discuss


Here we're following the CLDR collation definitions, which define
[backwards 2] for fr-CA but not for fr.

However, you can specify fr-u-kb or fr-u-kb-true to get that behavior.

Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org> writes:

> It should!
>
> Le Monday 19 April 2021 à 12:56:35PM, 'Nick Bart' via pandoc-discuss a écrit :
>> > Doesn't fr-FR enforce the last syllable rule?
>> 
>> No. fr-FR falls back to the Unicode root collation, just like fr, and generates:
>> 
>> cote
>> coté
>> côte
>> côté
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/rnDIO6M7fMlf-C2qUTJ0TlQNdXuNfEB7tuL1tvxU8y-ujXg-ZstjGP5pFA-hM0Nt51o595V34_q2XsXpfvmCPmpTcEPe5UtNqkZuyFDEeyk%3D%40protonmail.com.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YH2Crt90jNRADgz3%40localhost.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2zgxuckmt.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
       [not found]                   ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-19 16:31                     ` 'Nick Bart' via pandoc-discuss
  2021-04-19 18:08                       ` John MacFarlane
  0 siblings, 1 reply; 34+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2021-04-19 16:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

>> Some collations such as fr-u-kb seem to be gone now (the most recent unicode-collate --verbose tells me fr-u-kb falls back to Tailoring: ROOT).

> That only refers to the tailoring itself, not to the collator
options set by -u-kb.  So, in this example fr-u-kb still works
(sets backwards accent sorting), but the verbose output only
tells you about the tailoring match.

>> So it seems if you want "cote < côte < coté < côté" (following the "last syllable" rule), you’ll have to use fr-CA.

> No, you can still use fr-u-kb

I see, sorry for not having looked any further that the "Tailoring: ROOT" message. The collation itself is indeed as expected in this case, "cote < côte < coté < côté".

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fmDu51kN5TmTVgCzyh36tTZhkEe027wJDy2_Bicfd0wUirpbkK_4b8C8Pb_CNdhqnTiTE7gSfP-I7tZthOhnHbVsg8fVNZ2C8_xo5jBZc1c%3D%40protonmail.com.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Error compiling with icu support / possible workaround?
  2021-04-19 16:31                     ` 'Nick Bart' via pandoc-discuss
@ 2021-04-19 18:08                       ` John MacFarlane
  0 siblings, 0 replies; 34+ messages in thread
From: John MacFarlane @ 2021-04-19 18:08 UTC (permalink / raw)
  To: 'Nick Bart' via pandoc-discuss,
	pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


I'm adding more verbose output, so now you can get:

% unicode-collate --verbose fr-CA-u-ka-shifted-true < french-accents.md
Options:
  Tailoring:          fr-CA
  Variable weighting: NonIgnorable
  French accents:     True
  Upper before lower: False
  Normalize:          True
cote
  0063 006F 0074 0065
  [1FD6 213C 21F7 2007 | 0020 0020 0020 0020 | 0002 0002 0002 0002]
côte
  0063 006F 0302 0074 0065
  [1FD6 213C 21F7 2007 | 0020 0020 0027 0020 0020 | 0002 0002 0002 0002 0002]
coté
  0063 006F 0074 0065 0301
  [1FD6 213C 21F7 2007 | 0024 0020 0020 0020 0020 | 0002 0002 0002 0002 0002]
côté
  0063 006F 0302 0074 0065 0301
  [1FD6 213C 21F7 2007 | 0024 0020 0020 0027 0020 0020 | 0002 0002 0002 0002 0002 0002]


"'Nick Bart' via pandoc-discuss"
<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:

>>> Some collations such as fr-u-kb seem to be gone now (the most recent unicode-collate --verbose tells me fr-u-kb falls back to Tailoring: ROOT).
>
>> That only refers to the tailoring itself, not to the collator
> options set by -u-kb.  So, in this example fr-u-kb still works
> (sets backwards accent sorting), but the verbose output only
> tells you about the tailoring match.
>
>>> So it seems if you want "cote < côte < coté < côté" (following the "last syllable" rule), you’ll have to use fr-CA.
>
>> No, you can still use fr-u-kb
>
> I see, sorry for not having looked any further that the "Tailoring: ROOT" message. The collation itself is indeed as expected in this case, "cote < côte < coté < côté".
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fmDu51kN5TmTVgCzyh36tTZhkEe027wJDy2_Bicfd0wUirpbkK_4b8C8Pb_CNdhqnTiTE7gSfP-I7tZthOhnHbVsg8fVNZ2C8_xo5jBZc1c%3D%40protonmail.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2wnsycflk.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-04-19 18:08 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-21 13:04 Error compiling with icu support / possible workaround? 'Nick Bart' via pandoc-discuss
2021-03-22  5:55 ` John MacFarlane
     [not found]   ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 20:29     ` jcr
     [not found]       ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-23 19:04         ` John MacFarlane
     [not found]           ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-23 19:53             ` 'Nick Bart' via pandoc-discuss
2021-03-25 19:45               ` John MacFarlane
     [not found]                 ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-04 18:52                   ` John MacFarlane
     [not found]                     ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-05 23:17                       ` John MacFarlane
     [not found]                         ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06  9:21                           ` 'Nick Bart' via pandoc-discuss
2021-04-06 16:18                             ` John MacFarlane
     [not found]                               ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 16:42                                 ` 'Nick Bart' via pandoc-discuss
2021-04-06 18:14                                   ` Bastien DUMONT
2021-04-06 23:38                                     ` John MacFarlane
     [not found]                                       ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-07  7:52                                         ` BPJ
     [not found]                                           ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-07  9:37                                             ` BPJ
2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
2021-04-07 10:02                                     ` Bastien DUMONT
2021-04-07 12:32                                     ` BPJ
2021-04-08  1:41                                     ` John MacFarlane
     [not found]                                       ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  2:23                                         ` John MacFarlane
     [not found]                                           ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  7:12                                             ` Bastien DUMONT
2021-04-09 15:34                                             ` John MacFarlane
2021-03-22  5:59 ` John MacFarlane
     [not found]   ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22  6:08     ` John MacFarlane
     [not found]       ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 14:29         ` 'Nick Bart' via pandoc-discuss
2021-04-17 23:19           ` John MacFarlane
     [not found]             ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19  9:54               ` 'Nick Bart' via pandoc-discuss
2021-04-19 11:10                 ` Bastien DUMONT
2021-04-19 12:56                   ` 'Nick Bart' via pandoc-discuss
2021-04-19 13:16                     ` Bastien DUMONT
2021-04-19 16:19                       ` John MacFarlane
2021-04-19 16:16                 ` John MacFarlane
     [not found]                   ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 16:31                     ` 'Nick Bart' via pandoc-discuss
2021-04-19 18:08                       ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).