I updated my script to be configurable so that you can try various locales, normalization forms and lists of words with perl/Unicode::Collate::Locale/Unicode::Normalize. 

Info on required CPAN modules/perl version are in a comment at the top of the file.

After installing the requirements use the --help option for usage instructions.


Den ons 7 apr. 2021 09:52BPJ <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> skrev:
I tried this out with the latest Unicode::Collate::Locale

<https://metacpan.org/pod/release/SADAHIRO/Unicode-Collate-1.29/Collate/Locale.pm>

With all of fr_FR fr_CA fr_BE fr_Ch and both Normalization Form C and Normalization Form D and it turns out that fr_CA actually is different!

Locale: fr_FR; getlocale: default
Normalization: NFC
Sorted: cote coté côte côté
Normalization: NFD
Sorted: cote coté côte côté
Locale: fr_CA; getlocale: fr_CA
Normalization: NFC
Sorted: cote côte coté côté
Normalization: NFD
Sorted: cote côte coté côté
Locale: fr_BE; getlocale: default
Normalization: NFC
Sorted: cote coté côte côté
Normalization: NFD
Sorted: cote coté côte côté
Locale: fr_CH; getlocale: default
Normalization: NFC
Sorted: cote coté côte côté
Normalization: NFD
Sorted: cote coté côte côté

If you want to try the script you will need to install the Unicode::Collate CPAN distribution first, and perl if you are not on a Unixy system. See:

<http://www.cpan.org/modules/INSTALL.html>

<https://www.perl.org/get.html>

I recommend Strawberry Perl on Windows.

Den ons 7 apr. 2021 01:39John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> skrev:

I just checked my 2006 Le Robert Micro: it has

cote < côte < côté

coté appears as a subheading of cote, so I'm not sure it's
clear from this how it is to be ordered.  Not inconsistent
with the French Academy anyway.

Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org> writes:

> Hi,
>
> Honestly, these are such subtleties that, as a native French speaker, I have no precise ideas about it. I would say that accents are only a secondary criterium for sorting (cote < côte < coteau). Actually the Wikipedia page about the French alphabet agrees with that: "diacritics and ligatures are taken into account only at a third level, after the second level (case). [...] In Quebec French diacritics are considered more important than case." (I hope my translation is not too bad.) Unfortunately they give no reference. As for the "last syllable" rule, I have never heard of it, but the French Academy's dictionary online has cote < côte < coté < côté (https://www.dictionnaire-academie.fr/article/A9C4445?history=2). Anyway I guess that it rarely applies. I will check a recent Robert whenever possible (maybe tomorrow): they introduced a lot of changes in 2010.
>
> The French Association for Normalization produced a norm in 1969 about proper names' sorting, but it is behind a paywall and I am not sure that it is really in use.
>
> Cheers,
>
> Bastien
>
> Le Tuesday 06 April 2021 à 04:42:40PM, 'Nick Bart' via pandoc-discuss a écrit :
>> Concerning French, I checked a few more sources, and some of them seem to hold different views on French collation: https://fr.wikipedia.org/wiki/Alphabet_fran%C3%A7ais states that diacritics should be disregarded when sorting, except in Quebec French, where accented characters are to appear after their unaccented counterparts. No "last syllable" rule is mentioned at all. In addition, in a printed French dictionary, Le Nouveau Petit Robert (1994), I couldn’t find any explicit rules on sorting, but entries are ordered "cote < coté < côte < côté". Hopefully some native speakers of French will chime in here.
>>
>> As to supporting multiple collations, I tend to think that the default collation (which usually seems to follow the most recent rules for a given language) would usually be sufficient.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/lIJvVkf_iXceir6oyQVnvHDTXlTIgech_5Trj2TRBY6uBZ_AnU8ghvMV6not9E_QSwG0BhZJUnHprUcIN8UlAKrUw7DzQF5-ZpIki3TC74Q%3D%40protonmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YGylIXTe6M3FSBIl%40localhost.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2h7kjoueo.fsf%40MacBook-Pro.hsd1.ca.comcast.net.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDZHQYcZQog7i3DiwFG%3D2T3WeefE_w3hUbfrq0o1FEiYQ%40mail.gmail.com.