public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>
To: jcr <ffi.appdev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	pandoc-discuss
	<pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Error compiling with icu support / possible workaround?
Date: Tue, 23 Mar 2021 12:04:55 -0700	[thread overview]
Message-ID: <m2o8f9ofmw.fsf@MacBook-Pro.hsd1.ca.comcast.net> (raw)
In-Reply-To: <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


Just a note: I've started working on a library that does this.

The basics are mostly working now (about 4x slower than text-icu but not
too bad).  But I haven't yet implemented the locale-sensitive
sorting hints.



jcr <ffi.appdev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I'm not an expert in this, but I believe a pure Haskell solution mean 
> implementing the Unicode Collation Algorithm 
> <https://unicode.org/reports/tr10/>. The Unicode Common Locale Data 
> Repository <http://cldr.unicode.org/> contains the per-locale settings to 
> configure the algorithm to sort according to the locale's rules. This is 
> what ICU does.
>
> On Monday, March 22, 2021 at 6:56:04 AM UTC+1 John MacFarlane wrote:
>
>> "'Nick Bart' via pandoc-discuss"
>> <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> writes:
>>
>> > An unofficial fork of text-icu claims to have fixed the issue (
>> https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd6b
>> ).
>> >
>> > I wonder if anyone could indicate how to tweak the pandoc install 
>> command to include, for the time being, the WorldSEnder/text-icu fork 
>> rather than the official one - or whether there is anything else I could 
>> try to fix this issue on the pandoc side. (I tried downgrading icu4c via 
>> homebrew, but apparenty no formulae for earlier versions are available.)
>>
>> Replace stack.yaml with this:
>>
>>
>> ``` stack.yaml
>> flags:
>> pandoc:
>> trypandoc: false
>> embed_data_files: true
>> QuickCheck:
>> old-random: false
>> citeproc:
>> icu: true
>> packages:
>> - '.'
>> extra-deps:
>> - hslua-1.3.0
>> - hslua-module-path-0.1.0
>> - jira-wiki-markup-1.3.4
>> - skylighting-core-0.10.5
>> - skylighting-0.10.5
>> - doclayout-0.3.0.2
>> - citeproc-0.3.0.9
>> - texmath-0.12.2
>> - random-1.2.0
>> - git: https://github.com/WorldSEnder/text-icu
>> commit: 7657227a7ca8ad13be86db5c190b806774a5fd6b
>> ghc-options:
>> "$locals": -fhide-source-paths -Wno-missing-home-modules
>> resolver: lts-17.5
>> nix:
>> packages: [zlib]
>> ```
>>
>> Then stack install.
>>
>> > As an aside, while I fully understand the wish not having to include a 
>> huge external C library by default, I feel that pandoc’s default sorting 
>> algorithm, currently based on “i;unicode-casemap” (RFC 5051), is somewhat 
>> below par. In particular, it does not even comply with mainstream 
>> English-language rules as far accented characters are concerned. The 
>> Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: “Words 
>> beginning with or including accented letters are alphabetized as though 
>> they were unaccented.” One of their examples gives the sort order “Ubeda – 
>> Über – Ubina“. Without icu support, pandoc incorrectly sort this as “Ubeda 
>> – Ubina – Über“.
>>
>> Yes. I agree. Actually, if we just need special treatment for
>> English locales, then I don't think it should be too hard. We
>> can use the Haskell unicode-transforms library (already a
>> dependency of pandoc) to normalize the text and then remove
>> accents:
>>
>> Prelude Data.Text.Normalize Data.Text Data.Char> Data.Text.filter (not . 
>> isMark) $ normalize NFD "dérégler"
>> "deregler"
>>
>> We could sort on the result of that transform.
>>
>> (This method would also affect non-Western scripts, though, and
>> I don't know what the rules around those are...)
>>
>> For non-English locales, would we want to fall back to RFC 5051?
>>
>> I'm not sure what all the relevant rules are; if it's not too
>> terribly complicated, I wonder if a pure Haskell library could
>> be cooked up. It's a shame that there's no way to do proper
>> unicode collation in Haskell without the difficult icu4
>> dependency.
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5035db2e-16b9-4923-8e38-d95b81d27840n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o8f9ofmw.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


  parent reply	other threads:[~2021-03-23 19:04 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-21 13:04 'Nick Bart' via pandoc-discuss
2021-03-22  5:55 ` John MacFarlane
     [not found]   ` <m25z1jpw9n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 20:29     ` jcr
     [not found]       ` <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-23 19:04         ` John MacFarlane [this message]
     [not found]           ` <m2o8f9ofmw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-23 19:53             ` 'Nick Bart' via pandoc-discuss
2021-03-25 19:45               ` John MacFarlane
     [not found]                 ` <m2pmznm2zk.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-04 18:52                   ` John MacFarlane
     [not found]                     ` <m2sg457ugn.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-05 23:17                       ` John MacFarlane
     [not found]                         ` <m21rbos4nd.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06  9:21                           ` 'Nick Bart' via pandoc-discuss
2021-04-06 16:18                             ` John MacFarlane
     [not found]                               ` <m27dlfqtd1.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-06 16:42                                 ` 'Nick Bart' via pandoc-discuss
2021-04-06 18:14                                   ` Bastien DUMONT
2021-04-06 23:38                                     ` John MacFarlane
     [not found]                                       ` <m2h7kjoueo.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-07  7:52                                         ` BPJ
     [not found]                                           ` <CADAJKhBpFS7Mq7NriLc8wexqwwLsEy+9OmBiNWbPaMgYKy8jbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-07  9:37                                             ` BPJ
2021-04-07  9:35                                   ` 'Nick Bart' via pandoc-discuss
2021-04-07 10:02                                     ` Bastien DUMONT
2021-04-07 12:32                                     ` BPJ
2021-04-08  1:41                                     ` John MacFarlane
     [not found]                                       ` <m2wntdo8m2.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  2:23                                         ` John MacFarlane
     [not found]                                           ` <m2o8epo6p8.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-08  7:12                                             ` Bastien DUMONT
2021-04-09 15:34                                             ` John MacFarlane
2021-03-22  5:59 ` John MacFarlane
     [not found]   ` <m235wnpw3l.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22  6:08     ` John MacFarlane
     [not found]       ` <m2wntzoh3n.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-22 14:29         ` 'Nick Bart' via pandoc-discuss
2021-04-17 23:19           ` John MacFarlane
     [not found]             ` <m2eef8ebyx.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19  9:54               ` 'Nick Bart' via pandoc-discuss
2021-04-19 11:10                 ` Bastien DUMONT
2021-04-19 12:56                   ` 'Nick Bart' via pandoc-discuss
2021-04-19 13:16                     ` Bastien DUMONT
2021-04-19 16:19                       ` John MacFarlane
2021-04-19 16:16                 ` John MacFarlane
     [not found]                   ` <m235vmdzbh.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-19 16:31                     ` 'Nick Bart' via pandoc-discuss
2021-04-19 18:08                       ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2o8f9ofmw.fsf@MacBook-Pro.hsd1.ca.comcast.net \
    --to=jgm-tvlzxgkolnx2fbvcvol8/a@public.gmane.org \
    --cc=ffi.appdev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).