From: jcr <ffi.appdev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Error compiling with icu support / possible workaround?
Date: Mon, 22 Mar 2021 13:29:56 -0700 (PDT) [thread overview]
Message-ID: <5035db2e-16b9-4923-8e38-d95b81d27840n@googlegroups.com> (raw)
In-Reply-To: <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 3852 bytes --]
I'm not an expert in this, but I believe a pure Haskell solution mean
implementing the Unicode Collation Algorithm
<https://unicode.org/reports/tr10/>. The Unicode Common Locale Data
Repository <http://cldr.unicode.org/> contains the per-locale settings to
configure the algorithm to sort according to the locale's rules. This is
what ICU does.
On Monday, March 22, 2021 at 6:56:04 AM UTC+1 John MacFarlane wrote:
> "'Nick Bart' via pandoc-discuss"
> <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
>
> > An unofficial fork of text-icu claims to have fixed the issue (
> https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b
> ).
> >
> > I wonder if anyone could indicate how to tweak the pandoc install
> command to include, for the time being, the WorldSEnder/text-icu fork
> rather than the official one - or whether there is anything else I could
> try to fix this issue on the pandoc side. (I tried downgrading icu4c via
> homebrew, but apparenty no formulae for earlier versions are available.)
>
> Replace stack.yaml with this:
>
>
> ``` stack.yaml
> flags:
> pandoc:
> trypandoc: false
> embed_data_files: true
> QuickCheck:
> old-random: false
> citeproc:
> icu: true
> packages:
> - '.'
> extra-deps:
> - hslua-1.3.0
> - hslua-module-path-0.1.0
> - jira-wiki-markup-1.3.4
> - skylighting-core-0.10.5
> - skylighting-0.10.5
> - doclayout-0.3.0.2
> - citeproc-0.3.0.9
> - texmath-0.12.2
> - random-1.2.0
> - git: https://github.com/WorldSEnder/text-icu
> commit: 7657227a7ca8ad13be86db5c190b806774a5fd6b
> ghc-options:
> "$locals": -fhide-source-paths -Wno-missing-home-modules
> resolver: lts-17.5
> nix:
> packages: [zlib]
> ```
>
> Then stack install.
>
> > As an aside, while I fully understand the wish not having to include a
> huge external C library by default, I feel that pandoc’s default sorting
> algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat
> below par. In particular, it does not even comply with mainstream
> English-language rules as far accented characters are concerned. The
> Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words
> beginning with or including accented letters are alphabetized as though
> they were unaccented.” One of their examples gives the sort order “Ubeda –
> Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda
> – Ubina – Über“.
>
> Yes. I agree. Actually, if we just need special treatment for
> English locales, then I don't think it should be too hard. We
> can use the Haskell unicode-transforms library (already a
> dependency of pandoc) to normalize the text and then remove
> accents:
>
> Prelude Data.Text.Normalize Data.Text Data.Char> Data.Text.filter (not .
> isMark) $ normalize NFD "dérégler"
> "deregler"
>
> We could sort on the result of that transform.
>
> (This method would also affect non-Western scripts, though, and
> I don't know what the rules around those are...)
>
> For non-English locales, would we want to fall back to RFC 5051?
>
> I'm not sure what all the relevant rules are; if it's not too
> terribly complicated, I wonder if a pure Haskell library could
> be cooked up. It's a shame that there's no way to do proper
> unicode collation in Haskell without the difficult icu4
> dependency.
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5035db2e-16b9-4923-8e38-d95b81d27840n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 5230 bytes --]
next prev parent reply other threads:[~2021-03-22 20:29 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-21 13:04 'Nick Bart' via pandoc-discuss
2021-03-22 5:55 ` John MacFarlane
[not found] ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 20:29 ` jcr [this message]
[not found] ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-23 19:04 ` John MacFarlane
[not found] ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-23 19:53 ` 'Nick Bart' via pandoc-discuss
2021-03-25 19:45 ` John MacFarlane
[not found] ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-04 18:52 ` John MacFarlane
[not found] ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-05 23:17 ` John MacFarlane
[not found] ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 9:21 ` 'Nick Bart' via pandoc-discuss
2021-04-06 16:18 ` John MacFarlane
[not found] ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 16:42 ` 'Nick Bart' via pandoc-discuss
2021-04-06 18:14 ` Bastien DUMONT
2021-04-06 23:38 ` John MacFarlane
[not found] ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-07 7:52 ` BPJ
[not found] ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-07 9:37 ` BPJ
2021-04-07 9:35 ` 'Nick Bart' via pandoc-discuss
2021-04-07 10:02 ` Bastien DUMONT
2021-04-07 12:32 ` BPJ
2021-04-08 1:41 ` John MacFarlane
[not found] ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08 2:23 ` John MacFarlane
[not found] ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08 7:12 ` Bastien DUMONT
2021-04-09 15:34 ` John MacFarlane
2021-03-22 5:59 ` John MacFarlane
[not found] ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 6:08 ` John MacFarlane
[not found] ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 14:29 ` 'Nick Bart' via pandoc-discuss
2021-04-17 23:19 ` John MacFarlane
[not found] ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 9:54 ` 'Nick Bart' via pandoc-discuss
2021-04-19 11:10 ` Bastien DUMONT
2021-04-19 12:56 ` 'Nick Bart' via pandoc-discuss
2021-04-19 13:16 ` Bastien DUMONT
2021-04-19 16:19 ` John MacFarlane
2021-04-19 16:16 ` John MacFarlane
[not found] ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 16:31 ` 'Nick Bart' via pandoc-discuss
2021-04-19 18:08 ` John MacFarlane
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5035db2e-16b9-4923-8e38-d95b81d27840n@googlegroups.com \
--to=ffi.appdev-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).