From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/27990 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "'Nick Bart' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: Error compiling with icu support / possible workaround? Date: Mon, 22 Mar 2021 14:29:28 +0000 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8888"; mail-complaints-to="usenet@ciao.gmane.io" To: "pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org" Original-X-From: pandoc-discuss+bncBDR4BGVI44MRBTGT4KBAMGQEMCPQSUY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Mar 22 15:29:35 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lj1-f192.google.com ([209.85.208.192]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lOLYl-0002DI-Iu for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 22 Mar 2021 15:29:35 +0100 Original-Received: by mail-lj1-f192.google.com with SMTP id z18sf21582672ljz.7 for ; Mon, 22 Mar 2021 07:29:35 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1616423375; cv=pass; d=google.com; s=arc-20160816; b=BwwcliWmMlR0F4lH12wZ0t4bPN6nY457zQa/N+9nQT0la9JS5zwV8nd9rEk9cDH4Rw bI7LktXitAB/tNCj3MF9J1hK9zxaa0TZJirowUJQO72npqvC+JkZyWT7Vun/bhtIC7HR 8lkIxC2kF+/S2kug9SEAOcz4LNa5n+ZAvwloI/oDu6OmIq22FWr6UajNgZEQ7ErCEKf8 WqeRwHcx0rfVlJQs5JRAp9G8+1WG5Nvxr+sMhvs6PEaBXPXkiEmNTTrfKAJtnqALhAma kp56H1euwlkX7+5WyvgIGPxvyH8R4cvXV9TOFKwXb7D+o1DI6RclAOQ5RHcvTU/CTM4U sOZw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:subject:reply-to :from:to:date:dkim-signature; bh=KW8ar1dba2NPom7PLjEQa/h/fDdTY19OYv3hOW9cHyg=; b=YAyjvkdX7zoWYrnASmAlYE5obcWVODl8f1gwYd1iQgbx0rulHjwLgS9c/Npeb5oHYB DIpHgcwqpaSj1beBvKEhXFG0t1aV+D1DTpUUxpfVH/1La+O/vHAOQR3uQmPK8j2r1VF3 WJNZ5qaTXdw68PvLtmYZhApYh7oO+LQsjmuW/c9S3JSk9AP207UT2qWnFCzuO2osLzNf YKLxEJ/8I+GxIfmnHG4GRuO1agm58mzPeveVSFKM8X4WInFL/Kj21zkPmmNLis573FnN 0UCEdUBP5G38OFbc5/Qz7lqwR2XTOsFX0Lu8DxpEGTnz/jwytjsL6uH4RThLMFGcqUpG e4Qg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@protonmail.com header.s=protonmail header.b=Ug9CQVAe; spf=pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.137 as permitted sender) smtp.mailfrom=njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=protonmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=date:to:from:reply-to:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=KW8ar1dba2NPom7PLjEQa/h/fDdTY19OYv3hOW9cHyg=; b=K3pyRUVVo+umaM/ZHhcKw6rzshHG94GERwvBPq9nyu58Zizrsqbnc/FTupEkZwoQju xvm0PfV9BBQedxjosMijLVmb3sv1Z7IBVwt9C3QcYrGVCe8TBWUOlxN9SSo46z23Q5uL o6TGwC+1e0j56bUGRmPu6YSSajrp7bf4XfNWe/kxc3xaz2Z4HvXwuYNBR5kVYhIlgx6c 2A/8iHl0TnJjx5dA+jFXL0NF+98lq3e1mr6M9p3MVqm8P7yO2EbDC8e2eVQZEIrU3iVQ MsaPLi1RZ/YBD8/cK4tES/wAAp6fbjs7i3wSdi/5w7MN+iG+EPYk7NF+mzgvKUV57MtX 5svQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:to:from:reply-to:subject:message-id :in-reply-to:references:mime-version:content-transfer-encoding :x-original-sender:x-original-authentication-results:precedence :mailing-list:list-id:x-spam-checked-in-group:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=KW8ar1dba2NPom7PLjEQa/h/fDdTY19OYv3hOW9cHyg=; b=NrCM5symvseb7ZM+hC6ebfVKkh9XXWBhlb9CiJnq765vusWURhnpHHtoxOxj83IM9q BsaqgMPGu7MD9mCHR4n7khIxdERzgyWzVvh60upHut9rO28ityQVizS+0VihalzLTPIr Hl1OzOMOWp/kntjygl8LPMwDh8VfzEkZ7K6uuEuW/ucLYMRlFAj9jaS7vTOn8NmJM20q sQGv5iFypFH8f5aOC+gS/bPTeUhimeJ1TJtdKNCThrvcuzbTXCLANUQTcZ/v48rzJzBw ajK2eTyXX4PZH2DYDTsytuPnCh8ER3EZp8y398CKfzyGUKnCxu3OGx2RBPbQVZ2W+Aa2 CGO X-Gm-Message-State: AOAM531T3U8o6bKjXKXfIafDFLycK9WDcolVsZK6VSnRiWdT0e4qTLIw eo3jKPRb4u/8RaaKziObGsc= X-Google-Smtp-Source: ABdhPJxhn9FCssuDzOOohFk9wkCOFkWN52KTY9BddvAYUMHpH+itk1iY7RDYiU9oADoK+Zg7xTlGVw== X-Received: by 2002:a2e:9ec6:: with SMTP id h6mr9765397ljk.12.1616423375159; Mon, 22 Mar 2021 07:29:35 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6512:3d16:: with SMTP id d22ls2454084lfv.1.gmail; Mon, 22 Mar 2021 07:29:32 -0700 (PDT) X-Received: by 2002:a05:6512:3301:: with SMTP id k1mr9419619lfe.327.1616423372118; Mon, 22 Mar 2021 07:29:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616423372; cv=none; d=google.com; s=arc-20160816; b=0l74NpJMHwuGfRHFqZabooAZ3uOmj2VVj6cfAajBvk8uFWyPFtALgHvP9HhCu9oI+m uqiMPh9mVBe6Wgg6aeBo3VKb6hgpP/QDP/FIR55PoRU6XxM1j88LOgefaHBgEDGxbk5F 3cjG5Ey4JOSEYjomLxHgqx50mVMHbQvqA198A6YIFl+UKmGlLxh7x/iZW/OBUV+YeBg/ +0AXSIrI+KqGjgBBjQNW5Q+xu8IrjGfcBLh5nDR33sSAkOLWNzoNdT1+Ik+hWZpTMaIT U80dkKxnYzJ7FOEhk39zIrcwN6bu2UT+UCBhMWBKrHzWBatsYOx8dx4qe2JC1wWIXtYE nTqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:reply-to:from:to:dkim-signature:date; bh=aY/PVVNQXTjauZQDiEAkSka2SZ4T+t9maliw4hS9xbI=; b=Awx98B7JMrFoKBNzXJN1fpe02gOBzr+ZpVtlK/jIbrWds1C78GhZkSoVtoKhglytKf PwKIYUm96JU1RwP1hwjtxSSq9wsmgGH4K2eyhTrrwZEMAGM5jEhbem1povGDSPWkLvMs jHOmYhlF2UoIM96iofyaySY6eRGplIiKM2Bd/DNfDpVsxY7aPdkDiNSEgoToo8HrDZLs a7FDa8KncIXs54LS7O4Nhn1v8AKAQiu+ak13X3EgzovdHU7V+vSr7yjTSODIkGQcNzl/ ANaTV+nnfjCI0N4nBYv7nkczYylG/A44rH9QV8jPQ5SFAsrAR60WfniWVfJ0WCcf2Wga Ifbg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@protonmail.com header.s=protonmail header.b=Ug9CQVAe; spf=pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.137 as permitted sender) smtp.mailfrom=njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=protonmail.com Original-Received: from mail-40137.protonmail.ch (mail-40137.protonmail.ch. [185.70.40.137]) by gmr-mx.google.com with ESMTPS id v203si419057lfa.10.2021.03.22.07.29.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 07:29:32 -0700 (PDT) Received-SPF: pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.137 as permitted sender) client-ip=185.70.40.137; In-Reply-To: X-Original-Sender: njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@protonmail.com header.s=protonmail header.b=Ug9CQVAe; spf=pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.137 as permitted sender) smtp.mailfrom=njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=protonmail.com X-Original-From: Nick Bart Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:27990 Archived-At: With a stack.yaml file modified according to your suggestions I succeeded b= uilding pandoc with icu support. Many thanks. As to non-icu approaches: For English locales removing accents before sorting would mean an improveme= nt, and actually that=E2=80=99s all that seems to be required to comply wit= h the CMOS=E2=80=99s rules. A few other languages might benefit from this approach, too - but as far as= I can see this would be limited to Dutch, Portuguese, and German (where, i= n addition to removing accents, =E2=80=9C=C3=9F=E2=80=9D would have to be t= ransformed to =E2=80=9Css=E2=80=9D before sorting). Caveat: I have only che= cked https://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_con= ventions - which may or may not be authoritative, and only covers languages= using an =E2=80=9Cextended Latin alphabet=E2=80=9D. Other languages=E2=80=99 rules typically seem to be much more involved, and= removing accents before sorting might actually worsen things compared to r= elying on =E2=80=9Ci;unicode-casemap=E2=80=9D (RFC 5051). One example is Sp= anish, where =E2=80=9C=C3=B1=E2=80=9C should definitely be sorted as distin= ct letter *after* =E2=80=9Cn=E2=80=9D. So, yes, if not using icu4c, falling= back to RFC 5051 for those languages where we are not reasonably sure remo= ving accents before sorting is useful seems to make sense. Still, my conclusion is that for most languages listed in https://en.wikipe= dia.org/wiki/Alphabetical_order#Language-specific_conventions - let alone t= hose using non-Latin alphabets - =E2=80=9Ci;unicode-casemap=E2=80=9D (RFC 5= 051) just won=E2=80=99t be adequate. The only readily available and robust solution - short of reimplementing pa= rts of it in Haskell - seems to be icu4c. I for one wouldn=E2=80=99t mind a= t all if you decided to include it into the pandoc binaries by default. As to fixes to the official text-icu branch, it seems we=E2=80=99re getting= at least some attention already, see https://github.com/haskell/text-icu/i= ssues/49#issuecomment-804097813. Let=E2=80=99s see ... --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/lidHEiNPa68F2kffS3J03CKtQ-u1OPPEtAClLKPcdu4_8AQ5AdkUnFss7zQE= lbbw14QMD_P8bp7MzgiN3ew78EqYzEKbQJZaZ3aAA9By2vQ%3D%40protonmail.com.