From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/27995 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Error compiling with icu support / possible workaround? Date: Tue, 23 Mar 2021 12:04:55 -0700 Message-ID: References: <5035db2e-16b9-4923-8e38-d95b81d27840n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30732"; mail-complaints-to="usenet@ciao.gmane.io" To: jcr , pandoc-discuss Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBZHX5CBAMGQE5CTDS3Y-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Mar 23 20:05:11 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-pj1-f62.google.com ([209.85.216.62]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lOmL1-0007pX-LC for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 23 Mar 2021 20:05:11 +0100 Original-Received: by mail-pj1-f62.google.com with SMTP id h17sf2830153pjz.3 for ; Tue, 23 Mar 2021 12:05:11 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1616526310; cv=pass; d=google.com; s=arc-20160816; b=cJpukJX+bPnquhFxiNE1U0Zs6Dt/dCJaqtuDouK72/v+Sw/NzRm5spS0KeDOZHeCv4 MXu6bPOz4saJI1OdoX+0ivkliFXLvPmlxagLW+O6DbLudcPSjnBAq7XL29wiTfphsX7D P+ysD7T6Nzk7Y/8MlYTenMma2LtdA/QUweBQAHF58rvPcZqmEz+fTAE7XgCjS4HhFyOj uumKUPX8MS97NEH/fFr/OXN03iWsHP2Xqjhbcs2xcWJtXwjwN2WLhA+tcHaXFpQ7Z/Aa CR/WT5x6Tj5Z97OkpZqpknnXpT1wvNpm9cnCWEjIX5r+9cPtqRRHv5GdAzgmDNaTHHSc bYBg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:message-id:date:references:in-reply-to:subject:to:from :sender:dkim-signature; bh=9hN6DdwYWDiRIIRVXPtL4txXFc02xG3KCa8EJg7BLls=; b=eFbjT6rm21slaKtu1YWWGtnqmgOsHJrzKRuh7agx7pNWAzDlnWuWPYDr40i/rAus7N CKaaw/iBBhKyDf38arBMjd+4F5DeNxiibJ88kSW/ju6O2lTBqCbxv4sUz0ct0Y/gI5fg qPmnUNo/ncg0ORJrszI0Er647R4lN/UQeDiI423Dgz5T3wQi2oHw1AtS1s67Xa1cAm8S Kyw07yw3f2NlhGnsf9hg9V4C0sN0PIHWPDx/oTryzi9gkDu/gSVSNc3QPoeFsvRaZgPS MxMhyD0K1B4ukx2vRu42WkVHEveV+6xJNnUC8izL+BVAspjZe5N9qk0uGZrhuXVS61H9 iuAg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=oI66pUyD; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::102f as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=9hN6DdwYWDiRIIRVXPtL4txXFc02xG3KCa8EJg7BLls=; b=aGqpPB3b4h/qiJMUEE/O+3VrkAMxvEM+0IRyYuNgnGK9iDAESrUY+Gdcuj2OqY5XHa daAau2iGsgIBzNC6uP2WJRLwW9NAGqxEqMgRMTJmsh32z16dbAhfGdpJgz54/qSpnpLL wV+q8F2F8LiXj5jIhSdrjw6zZt2AEsUjHRCwInJbuH5MB7OkOAUYD4Z1ESNuPC9YVwQI Vki/Oe70D75y0J+w1/2L4fjwDGnVwC25MKc+OBE7KF+dqOYWWp+pfYInRZYtv/h2P5xi 1OXMfy2ChMlst0lU99+HEF3S+qgwg0Cjj/b7ySBGZHemqA75wwICzb5tDW6ctQ226XqR o4ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=9hN6DdwYWDiRIIRVXPtL4txXFc02xG3KCa8EJg7BLls=; b=K1oYFoCYpPbAChUfu9h/C9lyHZN34Zz6sGvXabYA6zBrm9FxcOVJQ5G3bfET74jeXo aHMMvV7fVpiHoSWDPsPyusUMGU9LVH3m6l85Z3b+Zbh4WvCzGIDqGCA+tI8r2CbdKlsY LRBn/I189g1oMrYCdiVD/CpytTsVq2U28iloiIjtqBGzQ+hW3QtLCpzmpzfpy+6Z98Uu HvM6UKoKyOgaLbtwpKHCckY6On15AQMU8Ahc0qX1eeX0lrwYV8cgF7xztf40HelUnTXF MLH2PGhAJ4gIPOnWUjmeCYUta/kT09O5kJ/4maYdHXwBc/xj+RFgiGMbVaylSq0fwBSk Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM533pex8r+F/nEq3smbocTyKAc5pWODrNv8W9UQw0IgIN9mx298c/ GT7J3zy40VijO0g8Rn6BCAA= X-Google-Smtp-Source: ABdhPJz0yOrjNQJ0gP5cyoVH59ini7RHPfoXeXmPx3OAD6L44Ud0AOR/sPeg6nwtGFyjHGGRI6M8cQ== X-Received: by 2002:a17:902:9002:b029:e6:c95f:2a1d with SMTP id a2-20020a1709029002b02900e6c95f2a1dmr7344273plp.79.1616526310394; Tue, 23 Mar 2021 12:05:10 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a17:90a:bd8b:: with SMTP id z11ls1928444pjr.1.gmail; Tue, 23 Mar 2021 12:05:08 -0700 (PDT) X-Received: by 2002:a17:90b:a0d:: with SMTP id gg13mr5688962pjb.29.1616526307993; Tue, 23 Mar 2021 12:05:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616526307; cv=none; d=google.com; s=arc-20160816; b=uTpH5v2dfkqXjAo+BdJJwgWgQmXqlu4PKH/EseGPcAOWwm2qLvVUEDUau1Knjzo30g VH/pgluqOsDACboz5MmLq7Yo0XquREo2AJMu5YDd+UVVRIqnVclBW0ebO8BLT/Svb8QK xsKsrzdYz9l6VswE8Ai71zEiZttIQXP3M34Hw3hXyp6wrYIYgS2Q0cNtSrAtp3V39s4o pYvnEQdN6v0lzvgufPpsA/RUVej9Ek4TkMxloq4Lsjgluct7Oo1hqvaMQKkseF62VzKv y1SEN4Tkp7PcUMbVh657pEu+Gkv7V/PAxwqnm9IhHsMj/jW4JTppN0yHbg4kJSPO3LrA R3fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:to:from:dkim-signature; bh=PlbJ1MTQ3xvpcPiAGXC4XGVM/Xryyv1QBWE1p4W3rGQ=; b=KWOlMOHAj7miQjPVtJXAmUSYkaq5K4l81+WIhcwugzRODBvMPyFpCT1KImchHKSes+ dYVEJ/46L6Xna5NkQwJi9ytaTxj37adsRa0s9pzj10Rt5f5HAVtGAQN5PWCpugyfEDyz p0RLP0oiaZWZiRp79UubtBA8hW+DNcz8ZZ9qDReufUL9KHmB0URvLD8U8phID/7dbX8a sLwdW5FGEo2f+AWzNlckTCd7BSqfr78KTYR/Q30UCH0E4wzaUaw5k8DsUQegzDOWyKhb CedKTHb8WUo86hkAmzMuSBEJskPezV7FoxjxN9HZNhYgyxQ3VtD9V5yG748S8SMb81P5 Vmcg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=oI66pUyD; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::102f as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com. [2607:f8b0:4864:20::102f]) by gmr-mx.google.com with ESMTPS id e15si186143pjm.3.2021.03.23.12.05.07 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 23 Mar 2021 12:05:07 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::102f as permitted sender) client-ip=2607:f8b0:4864:20::102f; Original-Received: by mail-pj1-x102f.google.com with SMTP id lr1-20020a17090b4b81b02900ea0a3f38c1so1488703pjb.0 for ; Tue, 23 Mar 2021 12:05:07 -0700 (PDT) X-Received: by 2002:a17:90a:987:: with SMTP id 7mr5921488pjo.61.1616526307483; Tue, 23 Mar 2021 12:05:07 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id y15sm19262066pgi.31.2021.03.23.12.05.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Mar 2021 12:05:06 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 8EC75A182; Tue, 23 Mar 2021 15:04:55 -0400 (EDT) In-Reply-To: <5035db2e-16b9-4923-8e38-d95b81d27840n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=oI66pUyD; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::102f as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:27995 Archived-At: Just a note: I've started working on a library that does this. The basics are mostly working now (about 4x slower than text-icu but not too bad). But I haven't yet implemented the locale-sensitive sorting hints. jcr writes: > I'm not an expert in this, but I believe a pure Haskell solution mean=20 > implementing the Unicode Collation Algorithm=20 > . The Unicode Common Locale Data=20 > Repository contains the per-locale settings to= =20 > configure the algorithm to sort according to the locale's rules. This is= =20 > what ICU does. > > On Monday, March 22, 2021 at 6:56:04 AM UTC+1 John MacFarlane wrote: > >> "'Nick Bart' via pandoc-discuss" >> writes: >> >> > An unofficial fork of text-icu claims to have fixed the issue ( >> https://github.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c1= 90b806774a5fd6b >> ). >> > >> > I wonder if anyone could indicate how to tweak the pandoc install=20 >> command to include, for the time being, the WorldSEnder/text-icu fork=20 >> rather than the official one - or whether there is anything else I could= =20 >> try to fix this issue on the pandoc side. (I tried downgrading icu4c via= =20 >> homebrew, but apparenty no formulae for earlier versions are available.) >> >> Replace stack.yaml with this: >> >> >> ``` stack.yaml >> flags: >> pandoc: >> trypandoc: false >> embed_data_files: true >> QuickCheck: >> old-random: false >> citeproc: >> icu: true >> packages: >> - '.' >> extra-deps: >> - hslua-1.3.0 >> - hslua-module-path-0.1.0 >> - jira-wiki-markup-1.3.4 >> - skylighting-core-0.10.5 >> - skylighting-0.10.5 >> - doclayout-0.3.0.2 >> - citeproc-0.3.0.9 >> - texmath-0.12.2 >> - random-1.2.0 >> - git: https://github.com/WorldSEnder/text-icu >> commit: 7657227a7ca8ad13be86db5c190b806774a5fd6b >> ghc-options: >> "$locals": -fhide-source-paths -Wno-missing-home-modules >> resolver: lts-17.5 >> nix: >> packages: [zlib] >> ``` >> >> Then stack install. >> >> > As an aside, while I fully understand the wish not having to include a= =20 >> huge external C library by default, I feel that pandoc=E2=80=99s default= sorting=20 >> algorithm, currently based on =E2=80=9Ci;unicode-casemap=E2=80=9D (RFC 5= 051), is somewhat=20 >> below par. In particular, it does not even comply with mainstream=20 >> English-language rules as far accented characters are concerned. The=20 >> Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: =E2=80= =9CWords=20 >> beginning with or including accented letters are alphabetized as though= =20 >> they were unaccented.=E2=80=9D One of their examples gives the sort orde= r =E2=80=9CUbeda =E2=80=93=20 >> =C3=9Cber =E2=80=93 Ubina=E2=80=9C. Without icu support, pandoc incorrec= tly sort this as =E2=80=9CUbeda=20 >> =E2=80=93 Ubina =E2=80=93 =C3=9Cber=E2=80=9C. >> >> Yes. I agree. Actually, if we just need special treatment for >> English locales, then I don't think it should be too hard. We >> can use the Haskell unicode-transforms library (already a >> dependency of pandoc) to normalize the text and then remove >> accents: >> >> Prelude Data.Text.Normalize Data.Text Data.Char> Data.Text.filter (not .= =20 >> isMark) $ normalize NFD "d=C3=A9r=C3=A9gler" >> "deregler" >> >> We could sort on the result of that transform. >> >> (This method would also affect non-Western scripts, though, and >> I don't know what the rules around those are...) >> >> For non-English locales, would we want to fall back to RFC 5051? >> >> I'm not sure what all the relevant rules are; if it's not too >> terribly complicated, I wonder if a pure Haskell library could >> be cooked up. It's a shame that there's no way to do proper >> unicode collation in Haskell without the difficult icu4 >> dependency. >> > > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/5035db2e-16b9-4923-8e38-d95b81d27840n%40googlegroups.com. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/m2o8f9ofmw.fsf%40MacBook-Pro.hsd1.ca.comcast.net.