From: BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Error compiling with icu support / possible workaround?
Date: Wed, 7 Apr 2021 11:37:27 +0200 [thread overview]
Message-ID: <CADAJKhDZHQYcZQog7i3DiwFG=2T3WeefE_w3hUbfrq0o1FEiYQ@mail.gmail.com> (raw)
In-Reply-To: <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 6072 bytes --]
I updated my script to be configurable so that you can try various locales,
normalization forms and lists of words with
perl/Unicode::Collate::Locale/Unicode::Normalize.
Info on required CPAN modules/perl version are in a comment at the top of
the file.
After installing the requirements use the --help option for usage
instructions.
Den ons 7 apr. 2021 09:52BPJ <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> skrev:
> I tried this out with the latest Unicode::Collate::Locale
>
> <
> https://metacpan.org/pod/release/SADAHIRO/Unicode-Collate-1.29/Collate/Locale.pm
> >
>
> With all of fr_FR fr_CA fr_BE fr_Ch and both Normalization Form C and
> Normalization Form D and it turns out that fr_CA actually is different!
>
> Locale: fr_FR; getlocale: default
> Normalization: NFC
> Sorted: cote coté côte côté
> Normalization: NFD
> Sorted: cote coté côte côté
> Locale: fr_CA; getlocale: fr_CA
> Normalization: NFC
> Sorted: cote côte coté côté
> Normalization: NFD
> Sorted: cote côte coté côté
> Locale: fr_BE; getlocale: default
> Normalization: NFC
> Sorted: cote coté côte côté
> Normalization: NFD
> Sorted: cote coté côte côté
> Locale: fr_CH; getlocale: default
> Normalization: NFC
> Sorted: cote coté côte côté
> Normalization: NFD
> Sorted: cote coté côte côté
>
> If you want to try the script you will need to install the
> Unicode::Collate CPAN distribution first, and perl if you are not on a
> Unixy system. See:
>
> <http://www.cpan.org/modules/INSTALL.html>
>
> <https://www.perl.org/get.html>
>
> I recommend Strawberry Perl on Windows.
>
> Den ons 7 apr. 2021 01:39John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> skrev:
>
>>
>> I just checked my 2006 Le Robert Micro: it has
>>
>> cote < côte < côté
>>
>> coté appears as a subheading of cote, so I'm not sure it's
>> clear from this how it is to be ordered. Not inconsistent
>> with the French Academy anyway.
>>
>> Bastien DUMONT <bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org> writes:
>>
>> > Hi,
>> >
>> > Honestly, these are such subtleties that, as a native French speaker, I
>> have no precise ideas about it. I would say that accents are only a
>> secondary criterium for sorting (cote < côte < coteau). Actually the
>> Wikipedia page about the French alphabet agrees with that: "diacritics and
>> ligatures are taken into account only at a third level, after the second
>> level (case). [...] In Quebec French diacritics are considered more
>> important than case." (I hope my translation is not too bad.) Unfortunately
>> they give no reference. As for the "last syllable" rule, I have never heard
>> of it, but the French Academy's dictionary online has cote < côte < coté <
>> côté (https://www.dictionnaire-academie.fr/article/A9C4445?history=2).
>> Anyway I guess that it rarely applies. I will check a recent Robert
>> whenever possible (maybe tomorrow): they introduced a lot of changes in
>> 2010.
>> >
>> > The French Association for Normalization produced a norm in 1969 about
>> proper names' sorting, but it is behind a paywall and I am not sure that it
>> is really in use.
>> >
>> > Cheers,
>> >
>> > Bastien
>> >
>> > Le Tuesday 06 April 2021 à 04:42:40PM, 'Nick Bart' via pandoc-discuss a
>> écrit :
>> >> Concerning French, I checked a few more sources, and some of them seem
>> to hold different views on French collation:
>> https://fr.wikipedia.org/wiki/Alphabet_fran%C3%A7ais states that
>> diacritics should be disregarded when sorting, except in Quebec French,
>> where accented characters are to appear after their unaccented
>> counterparts. No "last syllable" rule is mentioned at all. In addition, in
>> a printed French dictionary, Le Nouveau Petit Robert (1994), I couldn’t
>> find any explicit rules on sorting, but entries are ordered "cote < coté <
>> côte < côté". Hopefully some native speakers of French will chime in here.
>> >>
>> >> As to supporting multiple collations, I tend to think that the default
>> collation (which usually seems to follow the most recent rules for a given
>> language) would usually be sufficient.
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/lIJvVkf_iXceir6oyQVnvHDTXlTIgech_5Trj2TRBY6uBZ_AnU8ghvMV6not9E_QSwG0BhZJUnHprUcIN8UlAKrUw7DzQF5-ZpIki3TC74Q%3D%40protonmail.com
>> .
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/YGylIXTe6M3FSBIl%40localhost
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/m2h7kjoueo.fsf%40MacBook-Pro.hsd1.ca.comcast.net
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDZHQYcZQog7i3DiwFG%3D2T3WeefE_w3hUbfrq0o1FEiYQ%40mail.gmail.com.
[-- Attachment #1.2: Type: text/html, Size: 9411 bytes --]
[-- Attachment #2: try-locale-sorting.pl --]
[-- Type: text/x-perl, Size: 2664 bytes --]
#!/usr/bin/env perl
# Try out sorting according to various locales with Unicode::Collate::Locale and normalization forms with Unicode::Normalize.
#
# Requires the following CPAN modules to be installed:
#
# utf8::all
#
# Unicode::Collate::Locale
#
# Unicode::Normalize
#
# Path::Tiny
#
# Getopt::Long::Descriptive
#
# See:
# <http://www.cpan.org/modules/INSTALL.html>
#
# Also requires perl 5.10.1 or later.
#
# If you are on a Unixy system you probably have a new enough perl installed.
# Otherwise see:
# <https://www.perl.org/get.html>
#
# On Windows I would recommend Strawberry Perl.
#
# This software is copyright (c) 2021 by Benct Philip Jonsson.
#
# This is free software; you can redistribute it and/or modify it under
# the same terms as the Perl 5 programming language system itself.
#
# http://dev.perl.org/licenses/
#
use 5.010001;
# use utf8;
use utf8::all;
use strict;
use warnings;
use warnings FATAL => 'utf8';
use autodie;
# use open qw[ :utf8 :std ];
use Unicode::Collate::Locale;
use Unicode::Normalize qw[normalize];
use Path::Tiny qw[path];
use Getopt::Long::Descriptive;
my($opt,$usage) = describe_options(
'%c %o',
[ 'locale|l=s@', 'A locale to try like "fr" or "fr-CA". Repeatable.',
+{ required => 1 },
],
[ 'normalize|n=s@',
'A Unicode Normalization Form according to Unicode::Normalize to apply like NFC or NFD. For unnormalized say -n 0 (zero). Repeatable. Default: NFC.',
+{ default => ['NFC'] },
],
[ 'input|i=s', 'Name of text file with lines to sort. Assumed to be UTF-8 encoded.',
+ { required => 1 },
],
[ 'output|o=s', 'Name of output file to print to. Optional. Default: stdout.',
],
[ 'help|h', 'Print help text and exit.',
+{ shortcircuit => 1 },
],
+{
show_defaults => 0,
getopt_conf => [qw(no_auto_abbrev no_bundling no_ignore_case)],
},
);
if ( $opt->help ) {
say "$0: try out sorting according to various locales with Unicode::Collate::Locale and normalization forms with Unicode::Normalize.";
print $usage->text;
exit;
}
my $locales = $opt->locale;
my $norms = $opt->normalize;
my $in = $opt->input;
my $out = $opt->output;
my $fh = $out ? path($out)->openw_utf8 : \*STDOUT;
select $fh;
my @lines = path($in)->lines_utf8;
for my $locale ( @$locales ) {
my $coll = Unicode::Collate::Locale->new(locale => $locale);
printf "Locale: $locale; getlocale: %s\n\n", $coll->getlocale;
for my $norm ( @$norms ) {
print "Normalization: $norm\n\n";
my @normed = $norm ? (map { normalize $norm, $_ } @lines) : @lines;
my @sorted = $coll->sort(@normed);
print "Sorted:\n\n@sorted\n\n";
}
}
select STDOUT;
close $fh;
exit;
next prev parent reply other threads:[~2021-04-07 9:37 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-21 13:04 'Nick Bart' via pandoc-discuss
2021-03-22 5:55 ` John MacFarlane
[not found] ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 20:29 ` jcr
[not found] ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-23 19:04 ` John MacFarlane
[not found] ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-23 19:53 ` 'Nick Bart' via pandoc-discuss
2021-03-25 19:45 ` John MacFarlane
[not found] ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-04 18:52 ` John MacFarlane
[not found] ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-05 23:17 ` John MacFarlane
[not found] ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 9:21 ` 'Nick Bart' via pandoc-discuss
2021-04-06 16:18 ` John MacFarlane
[not found] ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 16:42 ` 'Nick Bart' via pandoc-discuss
2021-04-06 18:14 ` Bastien DUMONT
2021-04-06 23:38 ` John MacFarlane
[not found] ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-07 7:52 ` BPJ
[not found] ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-07 9:37 ` BPJ [this message]
2021-04-07 9:35 ` 'Nick Bart' via pandoc-discuss
2021-04-07 10:02 ` Bastien DUMONT
2021-04-07 12:32 ` BPJ
2021-04-08 1:41 ` John MacFarlane
[not found] ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08 2:23 ` John MacFarlane
[not found] ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08 7:12 ` Bastien DUMONT
2021-04-09 15:34 ` John MacFarlane
2021-03-22 5:59 ` John MacFarlane
[not found] ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 6:08 ` John MacFarlane
[not found] ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 14:29 ` 'Nick Bart' via pandoc-discuss
2021-04-17 23:19 ` John MacFarlane
[not found] ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 9:54 ` 'Nick Bart' via pandoc-discuss
2021-04-19 11:10 ` Bastien DUMONT
2021-04-19 12:56 ` 'Nick Bart' via pandoc-discuss
2021-04-19 13:16 ` Bastien DUMONT
2021-04-19 16:19 ` John MacFarlane
2021-04-19 16:16 ` John MacFarlane
[not found] ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 16:31 ` 'Nick Bart' via pandoc-discuss
2021-04-19 18:08 ` John MacFarlane
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADAJKhDZHQYcZQog7i3DiwFG=2T3WeefE_w3hUbfrq0o1FEiYQ@mail.gmail.com' \
--to=melroch-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).