From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/27984 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Error compiling with icu support / possible workaround? Date: Sun, 21 Mar 2021 22:55:48 -0700 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12935"; mail-complaints-to="usenet@ciao.gmane.io" To: 'Nick Bart' via pandoc-discuss , "pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org" Original-X-From: pandoc-discuss+bncBCJZJHG45QDBB4XC4CBAMGQEE2O6OXY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Mar 22 06:56:04 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-pj1-f64.google.com ([209.85.216.64]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lODXo-0003Fd-QG for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 22 Mar 2021 06:56:04 +0100 Original-Received: by mail-pj1-f64.google.com with SMTP id jp20sf27172497pjb.2 for ; Sun, 21 Mar 2021 22:56:04 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1616392563; cv=pass; d=google.com; s=arc-20160816; b=DT+ItB+O70JciOwE0DmWS8c6EsgGLuWf496qZW7XHuVyUTU8FhNZxDz9IhgtFia855 HHyj/jw3RFS4FLH06/pq+2k5XSY2IVzm8E6ogeXNvufQ1OqfpDtOBTP3jl7xHNw+BivF XsB5cjz+gcL4jI1SDeNncJMXXruf1wBnrzNW/IWLk+LKv0wrVgydFc+e9g5j5gB8ZVH1 jk7AT0fapYNdKWpyAwIsl0j7pKlBBb0Kkrhj2eXWyfT1TTnpKMMyu9EBq3Xc+TNPQvv+ +1oU39VNMVeHqvEhRlxKWuO0VraIsyFdHHEiqZMSco0hucn3buCKZ66Bw6iX57H19Cok 0wdA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:message-id:date:references:in-reply-to:subject:to:from :sender:dkim-signature; bh=rT6K7lzlX56VtgRMv/z59IavRg1ULiNyUDVMSolvwOo=; b=oPMGxXfeiKhVe0htUuKAJlDHmlIzuHpTWJY2pX9IG9dBks12fFuNfy/q8pn6nRXNBR hwZfgAmuKvvOge5lxDreXUURWJjEBti728bbMu1YLT3EiIzdKrWtPGlQD9LzfBqlOacO WML/KLx3P8meaNZNarsbAJxJugwKSkxzvOtYJQk/DhZhYSRr78hD0eWCGpyMonkgTNFA 88qx83Cak7eVPpP1SLZfUzkL7bEqZQ4ILIwXXhQAy8V9Bhf1PyuFtbz+40ZaZWo1qMfH hY69VDCox9gPjyGfQCUIUG1TYmlwcgFMSjh7o2w4B8DDCbEba5Sv2uFNfUKuLcPLeh3L Y4ow== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=mRypmesj; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::633 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:to:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=rT6K7lzlX56VtgRMv/z59IavRg1ULiNyUDVMSolvwOo=; b=sUhTHTyKkCTclEW/o/wfcXrPzfB5fheaGnCzK/JQXl9WfMLE+++McLwlgPc4IPiTWC SR6adrTFYnrxl9lcZUVyhcFL7QdCs552xuCNE4QJbXkpXh3G4SDwT1cQBhsSNAh70ZN7 mRz0KeRpOmzXuvElY/O5K/KDqqsJdNDCiaS45N0Mt5k6CAZnVGcDeAqulDMR6yY4B1JY FOdxREXNZ2KBztU0dAdhMpwjVHOvRmO8Ze/Lsd745gY0sGHEcBSzUjxwMQN9s0v6eRev XggFbGO1zVOZ2jS/qCD8+qpqNEjIIX3td4DRaYJsBx0ZDMUb1wycWTUtWjkjv6u+hqfn zEgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:to:subject:in-reply-to:references :date:message-id:mime-version:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=rT6K7lzlX56VtgRMv/z59IavRg1ULiNyUDVMSolvwOo=; b=TiWwBmjq/UKYX3ffPwFKPGv7NdTVTtZF5Nlueuz3yo0cOD2WOMpvHiKCkrdEgq4s4m vXXo57Fc3FaLI82YtcCgKV9s+D9O5TQNUQfCRXOjWGw3Qce9dG5lxPX5Rs1XJFY2o+SH //JOkxY+JC0gaJ/Kiu/wDHE4gOczRA+JH19yrYzeWvhpqRC6DET75clKDYLfbaOi3gDz ajZmQpiGxH5WzKNHUV55GZGfIGAwQKtpxPoNBJOmJ3Vb/LKXcLueL55gvac+kBFhVSdb /BwGQYAuN1R7hwMs9rfoLMEe+OIjT1tAOJN8aANIm+Gl92VOf82W48EnQlVLEVvpl45Y Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM530/iJJBseOdGV/hEcoeS5F7PcQdtFSnVfPK8uznOj2J87UC8rfV N4+gMNBLNOO8V8gEZhKJiaM= X-Google-Smtp-Source: ABdhPJwiTgdIj7JcK9s3uNyJHx2sul7pFA74/aY5HZTtc2QVgvM7P+rU80GjiF2CbfIc+rUUJm+NqA== X-Received: by 2002:a17:90a:5b11:: with SMTP id o17mr11905250pji.32.1616392563628; Sun, 21 Mar 2021 22:56:03 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a62:7f10:: with SMTP id a16ls4985385pfd.7.gmail; Sun, 21 Mar 2021 22:56:01 -0700 (PDT) X-Received: by 2002:a63:2bc4:: with SMTP id r187mr21507899pgr.131.1616392561589; Sun, 21 Mar 2021 22:56:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616392561; cv=none; d=google.com; s=arc-20160816; b=Mq653NSNBxpDCxT3zOheq4vq/VeXt3TwU1cnqgTUFO+YcAzRAeCPhC2DIofgvZSKdC gseYE4aC687207Wadcm9SFGRcM5Mnwu4D6XHt9l9QzevS7vARAOqlhqxuKWdidvq1g5N p/m2Y5s+NvyBdheLAK1LZqm0KTrjAU/bFtreChO8KTddrCAXxxeh4ng9sUXmKMrlyWso 8KA2uhHjYMSfDDsqYJFUu6+qBVeUFbHIpQltpMydAW0DKO7vNlQygDdxFqfDXW80rPbI yBrLlnSNmvjXuejzEorDje9ekVP0J54wPSth4c8s53npGAOzEo69lA03K0EMZ7X/qHyn sSCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:to:from:dkim-signature; bh=++TGsPv/lhczp6vgBh9zQPpQB84VPCpOgvye/AoIKxc=; b=M0iVHPFdBUK5OZJwa7G1bRXZW9qtSS5DnAaEe1mlVnwMn1BC/cue8L2+ukxqCqbD+W Aphm6ccMNBkg9DiXI92ieMV/4ebvIBZCTX8+ZNZdLIsxPqMO0M/RKpkmI1MmLaShBSOU GXbIWBBE1YJutl+qYn6wny0In15MOY3LLmGA1CW0ye2c5/r3GJd9J+tjPQtIG0M7TcRo jizG43R1QjBejqO1jEda3h9Y8IjACGNltAXeIdkzAoC5qttWn0ERMrL/hvjQfOT+8cm5 ZLcDwCICxmHzNpfnY3j7K8auR9DNYfG8zJYiyR01MfVX9++YH/x9TtEH6pYpNTnXqDnJ kcXQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=mRypmesj; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::633 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com. [2607:f8b0:4864:20::633]) by gmr-mx.google.com with ESMTPS id j6si588696pjg.0.2021.03.21.22.56.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 21 Mar 2021 22:56:01 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::633 as permitted sender) client-ip=2607:f8b0:4864:20::633; Original-Received: by mail-pl1-x633.google.com with SMTP id v23so5963994ple.9 for ; Sun, 21 Mar 2021 22:56:01 -0700 (PDT) X-Received: by 2002:a17:90a:5d10:: with SMTP id s16mr11596414pji.126.1616392560763; Sun, 21 Mar 2021 22:56:00 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id k27sm12907919pfg.95.2021.03.21.22.55.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Mar 2021 22:55:59 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id 69041A182; Mon, 22 Mar 2021 01:55:48 -0400 (EDT) In-Reply-To: X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.s=20150623 header.b=mRypmesj; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:4864:20::633 as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:27984 Archived-At: "'Nick Bart' via pandoc-discuss" writes: > An unofficial fork of text-icu claims to have fixed the issue (https://gi= thub.com/WorldSEnder/text-icu/commit/7657227a7ca8ad13be86db5c190b806774a5fd= 6b). > > I wonder if anyone could indicate how to tweak the pandoc install command= to include, for the time being, the WorldSEnder/text-icu fork rather than = the official one - or whether there is anything else I could try to fix thi= s issue on the pandoc side. (I tried downgrading icu4c via homebrew, but ap= parenty no formulae for earlier versions are available.) Replace stack.yaml with this: ``` stack.yaml flags: pandoc: trypandoc: false embed_data_files: true QuickCheck: old-random: false citeproc: icu: true packages: - '.' extra-deps: - hslua-1.3.0 - hslua-module-path-0.1.0 - jira-wiki-markup-1.3.4 - skylighting-core-0.10.5 - skylighting-0.10.5 - doclayout-0.3.0.2 - citeproc-0.3.0.9 - texmath-0.12.2 - random-1.2.0 - git: https://github.com/WorldSEnder/text-icu commit: 7657227a7ca8ad13be86db5c190b806774a5fd6b ghc-options: "$locals": -fhide-source-paths -Wno-missing-home-modules resolver: lts-17.5 nix: packages: [zlib] ``` Then stack install. > As an aside, while I fully understand the wish not having to include a hu= ge external C library by default, I feel that pandoc=E2=80=99s default sort= ing algorithm, currently based on =E2=80=9Ci;unicode-casemap=E2=80=9D (RFC = 5051), is somewhat below par. In particular, it does not even comply with m= ainstream English-language rules as far accented characters are concerned. = The Chicago Manual of Style (17e, 2017, 16.67) unambiguously states: =E2=80= =9CWords beginning with or including accented letters are alphabetized as t= hough they were unaccented.=E2=80=9D One of their examples gives the sort o= rder =E2=80=9CUbeda =E2=80=93 =C3=9Cber =E2=80=93 Ubina=E2=80=9C. Without i= cu support, pandoc incorrectly sort this as =E2=80=9CUbeda =E2=80=93 Ubina = =E2=80=93 =C3=9Cber=E2=80=9C. Yes. I agree. Actually, if we just need special treatment for English locales, then I don't think it should be too hard. We can use the Haskell unicode-transforms library (already a dependency of pandoc) to normalize the text and then remove accents: Prelude Data.Text.Normalize Data.Text Data.Char> Data.Text.filter (not . is= Mark) $ normalize NFD "d=C3=A9r=C3=A9gler" "deregler" We could sort on the result of that transform. (This method would also affect non-Western scripts, though, and I don't know what the rules around those are...) For non-English locales, would we want to fall back to RFC 5051? I'm not sure what all the relevant rules are; if it's not too terribly complicated, I wonder if a pure Haskell library could be cooked up. It's a shame that there's no way to do proper unicode collation in Haskell without the difficult icu4 dependency. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/m25z1jpw9n.fsf%40MacBook-Pro.hsd1.ca.comcast.net.