From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/23406 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "'Nick Bart' via pandoc-discuss" Newsgroups: gmane.text.pandoc Subject: Re: Transliterated and original titles/names in citations Date: Sun, 08 Sep 2019 06:37:59 +0000 Message-ID: References: <0c05fcec-fbb7-aed6-c1ec-e84610bcdd96@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="b1_4884062c6799427a866a65c5e475bdca" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="209356"; mail-complaints-to="usenet@blaine.gmane.org" To: "pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org" Original-X-From: pandoc-discuss+bncBDR4BGVI44MRBUGD2LVQKGQE4E6Y7ZI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sun Sep 08 08:38:09 2019 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-lj1-f184.google.com ([209.85.208.184]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1i6qps-000sP3-SD for gtp-pandoc-discuss@m.gmane.org; Sun, 08 Sep 2019 08:38:08 +0200 Original-Received: by mail-lj1-f184.google.com with SMTP id a9sf598283ljb.5 for ; Sat, 07 Sep 2019 23:38:08 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1567924688; cv=pass; d=google.com; s=arc-20160816; b=Yorfb3UCsQ/itbJFnk4LBLCS3IEfybxVrsvLHDXtboGycnEgnHLiHYP29d4pNodmQp O3jFRja8iBhWOo4MF4Lk2wJyUuljLUDoaeevroHHn8RRc89b3yHQWkfoJmzxXd/MQau0 LV8bV1PS+T/s/KVM53SLtlt4WgAmjWaEs0076k2Uu+JUz4Sxmd7dKI/95Tip4frVGJv2 J7NwpsnGo7oWA6S/5eQ1QVjhVBE92vLPvOWTM7i896SnWRt2T3WQ1HKMfTeVEt9z/1Lb bvGEe09jxqhaETWSVFz/6s/isOSTWBQB/jrkzF/tu8WlWcz0AseBPxilQqa52WUc/ZQi /LVA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:mime-version:feedback-id:references :in-reply-to:message-id:subject:reply-to:from:to:date:dkim-signature; bh=9FI5+EkmRU15+O4evCxclFyBU5vhB9PRd/yNBktH/T0=; b=FpYbm424Jv6z5NN2QxywAS82W4TTbMnaWbOCzcXKen+jWcpUYew17u2KAN8AKSFYDO BnysME0uBH3kbJsm8dIem2ZXOP4Y2VDUJp4R8qkKgVTlm9jPDXN6tSXDwgP/4XIhrDHB cRFDg31S1c+kO4QLmPVa9yDSqAVVfwRy4A2jZUn08no7XN2sme1+m5AgAf4cbFbHksni 1DiT1d473lvOn6+9wj/bj8+8Ej2vKUJdCDHya/Z3gtuDHTa9kJnKay25w4i85Fo9dn+Z lCNBQyzH9WHcp+D2mCkASmiYFvBjH09tLkrVoalJC4tbvueFxH5vW//DjbTG/MoVDqTH 60ZA== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@protonmail.com header.s=default header.b=gl5lGsdc; spf=pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.136 as permitted sender) smtp.mailfrom=njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=protonmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=date:to:from:reply-to:subject:message-id:in-reply-to:references :feedback-id:mime-version:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=9FI5+EkmRU15+O4evCxclFyBU5vhB9PRd/yNBktH/T0=; b=rPxgSvoOCVJ8GOkRbgRfUS4ro9rPwxJhDOt91OqRPAZmecCCTAtuGLEhQPq/M3TmOW +GCOmnjtdQOLGyMqck61ZRv5Rynaz8LpnTa7L8+6pcW+dxp2WhiimooTPKwMBTfOxll4 UH0NlezCUbRZOKviwcaHznMVdrMYoPvssBBRYOPvvQSRmthtefgY2HIvj2sd305Lj55d v+K/kluXi41GJzQNxl6r2m40vSNYqp84A/3hp+VcsbxtTRhW03kr6bDo3oV8vgKSdDlY yX2cB0RbUI/fBPegcxUjY5PBemf97CsiF2LSkpE6u2DFHzmigJ/QzjFZUGvrYYbdJqX6 q97g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:to:from:reply-to:subject:message-id :in-reply-to:references:feedback-id:mime-version:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=9FI5+EkmRU15+O4evCxclFyBU5vhB9PRd/yNBktH/T0=; b=gy4lEHPLZ79HFLtuEicpeRu1lSQJeUe2ASP/v1I3ie/iQ+on/vMOrSWV3FX3vWW+7s YKy8RUW4ogg0v8BH+YJ4FfrMDW6j2YIR6VSKDsyRBPLps/V5eL657QOlxTlKSq3xGDDt P57lBmVHH78/QKl1ssXvJnumuaqiooapYvWejaGW+6FHoFyaoBns0Shxu6qtq0HHOwh2 ZvkQ3w3uvYDQD6gwI2m4zuU/bf3s+nR0t65ZV8Cf6nrbOIRsccVepG5lzGHK1mIuVB7O 87izX71SOAtkitLROdGg6Sy0gK9lpU7BpdMAK+imh8rujub8zPQHhNi+jUo2JiGiMP3z FZlw== X-Gm-Message-State: APjAAAVI1xzg7j+OO3/y70Jo2QenENti9dTF/KYZpsyPVbhS7Jxq6kQx qQQn3+PbbhN8rg8SYDR0PQE= X-Google-Smtp-Source: APXvYqyQDNXSOVW8hB/IPJ32mFIeqyxSBjyOmWco3DV4mY7vXRpJ0K6PiGU5hXw4qORFTT1bQGxEIA== X-Received: by 2002:a2e:81ca:: with SMTP id s10mr11473328ljg.181.1567924688517; Sat, 07 Sep 2019 23:38:08 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a2e:934e:: with SMTP id m14ls277343ljh.4.gmail; Sat, 07 Sep 2019 23:38:07 -0700 (PDT) X-Received: by 2002:a2e:90c7:: with SMTP id o7mr11351487ljg.73.1567924687817; Sat, 07 Sep 2019 23:38:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567924687; cv=none; d=google.com; s=arc-20160816; b=e7LHEZgKY7uWv7duABhxs+15IpMZ0uBxR10/2z17h+dpxfxbelbRvzJSwb9xXGW5+4 VqJSX0T2vAPKa+Be0kiR9n07Smlh26mxJDwk7Df4zEqWX1J+8p5w38n2NoGXq/FO/Ugb +S9bfNVKSG42n66S+NnkRRK+yqgY5CFhQi2p8W2ZCxbH7WaDWGgmwNFIqtVNtSfZeCuE Dh05uW9pKe1OX1oMQHd1pW5oet4VAYGmTOtQEg4+L/7y9O+xMNI/wLYx4mOPg7MhfX4l 6tZmh8EvWpL6b+dSSi/pWaoFEqREh+cj0R5gdOtGbwcNED0t91AOi9kknKC4jy3vtx44 5eZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:feedback-id:references:in-reply-to:message-id:subject :reply-to:from:to:dkim-signature:date; bh=23907HAuAebAWfuyCz8E/BSoIPEzd264t9ErIUdqkhQ=; b=JMYTPFls1tFFMzt82wLkiKrF4f++ZBDTQkF0qOkmfwni+NokgeudHlcUuHmfm+EXQ9 Xvdw2mlaQqH3AU8DyLrIQbjzTLATETR0SALh5lqHFQVr5iccnXvdfnrwlUeN1mOQmuAx iz2QleLTuEWEmpmalCaLzfcbYVK/jmwGZYqsgJXRW2LUy0M2pImX4WbX3G2qxmTOj9/t kQx0hW2cC4XqiZf8Wa6R6ALbHC4IY9xg3iBDNRIxl6CFlpu9Tls43apq9fViIBcI+Qpk jf8WzRlAjipwwSwu93P2DkfpxpIydjpnc4aOM1wjIRlZOjju3Xqmy/IPMW7xjVfZIMBY iSXw== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@protonmail.com header.s=default header.b=gl5lGsdc; spf=pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.136 as permitted sender) smtp.mailfrom=njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=protonmail.com Original-Received: from mail-40136.protonmail.ch (mail-40136.protonmail.ch. [185.70.40.136]) by gmr-mx.google.com with ESMTPS id x18si464813ljm.4.2019.09.07.23.38.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 07 Sep 2019 23:38:07 -0700 (PDT) Received-SPF: pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.136 as permitted sender) client-ip=185.70.40.136; In-Reply-To: Feedback-ID: T7I7uPR9W9evC_OcG8CBOCN9yIeN-ZcL5eoBW0Yo0yb1KBuFiWJHPOG_haM_8t-U8KFedKZ5MEsM7XTOyk4nhw==:Ext:ProtonMail X-Original-Sender: njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@protonmail.com header.s=default header.b=gl5lGsdc; spf=pass (google.com: domain of njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org designates 185.70.40.136 as permitted sender) smtp.mailfrom=njbart-g/b1ySJe57IN+BqQ9rBEUg@public.gmane.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=protonmail.com X-Original-From: Nick Bart Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:23406 Archived-At: This is a multi-part message in MIME format. --b1_4884062c6799427a866a65c5e475bdca Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On sorting/collation, from https://pandoc.org/installing.html (note that yo= u=E2=80=99ll have to compile pandoc-citeproc yourself; it=E2=80=99s not con= tained in the precompiled packages): > By default pandoc-citeproc uses the "i;unicode-casemap" method to sort > bibliography entries (RFC 5051). If you would like to use the > locale-sensitive unicode collation algorithm instead, specify the > `unicode_collation` flag: > cabal install pandoc-citeproc -funicode_collation > Note that this requires the text-icu library, which in turn depends on > the C library icu4c. Installation directions vary by platform. Here is > how it might work on macOS with Homebrew: > brew install icu4c > cabal install --extra-lib-dirs=3D/usr/local/Cellar/icu4c/51.1/lib \ > --extra-include-dirs=3D/usr/local/Cellar/icu4c/51.1/include \ > -funicode_collation text-icu pandoc-citeproc Some of these paths on longer seem to be accurate, and I=E2=80=99m using st= ack rather than cabal, so my current incantation on macOS is: ``` stack install pandoc-citeproc \ --flag "pandoc-citeproc:unicode_collation" \ --extra-lib-dirs=3D/usr/local/opt/icu4c/lib \ --extra-include-dirs=3D/usr/local/opt/icu4c/include ``` I=E2=80=99ve been using this for quite a while now, and the resulting colla= tion in, e.g., Danish, German, Spanish, seems to be absolutely correct. =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Saturday, September 7, 2019 10:10 AM, BPJ wrote: > Oops, forgot the link: > > https://metacpan.org/pod/Unicode::Collate > > Den l=C3=B6r 7 sep. 2019 12:05BPJ skrev: > >> I just realized two things which make matters much worse: >> >> 1. Not all publications accept the same transliteration schemes. Just b= y surveying one author's references to his own works in one bibliography I = find that his surname, =D0=AF=D0=B1=D0=BB=D0=BE=D0=BD=D1=81=D0=BA=D0=B8=D0= =B9, can be transliterated in five different ways (although two predominate= )! So I'll need both a `transliterated title` field and a field `transliter= ated authors` field with (in each item) a mapping of alternative transliter= ations. Even Icelandic needs to be transliterated sometimes, e.g. =C3=9E=C3= =B3r=C3=B0ur becoming Th=C3=B3rdur (with data loss!) >> >> 2. Sorting. Latin letters like _=C4=8D, =C5=A1, =C5=BE_ need to sort as= _c, s, z_ and probably _=C3=9E_ must sometimes sort like _Th_ and sometime= s after _z_! This needs sometimes tailored locale dependent sorting! Accent= ed letters can ideally be handled by entering things in NFC and hoping that= sort algorithms ignore combining marks, but then e.g. in Scandinavian lang= uages _=C3=B6_ sorts not as _o_ but at the end of the alphabet (ideally _= =C3=BE, =C3=A6, =C3=B8, =C3=A5, =C3=A4, =C3=B6_ go at the end of the alphab= et in that order, but often _=C3=A6/=C3=A4, =C3=B8/=C3=B6_ are conflated ei= ther before or after _=C3=A5_!). Anyway it seems CSL has no customizable so= rt key field. I know how to handle these things myself with [Unicode::Colla= te][] but that at least means some postprocessing of as yet unknown complex= ity. >> >> Den ons 4 sep. 2019 09:33BPJ skrev: >> >>> Does anyone know how to handle transliterated titles and names in >>> citations, when you want to include both the transliteration and the >>> original? Does CSL have any fields for that? >>> >>> TIA, >>> >>> /bpj > > -- > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [https://groups.google.com/d/msg= id/pandoc-discuss/CADAJKhDm8bibSjJfrM6W69qM_j1N9tPHEgRwaic4bZmrsB1CVw%40mai= l.gmail.com](https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDm8bib= SjJfrM6W69qM_j1N9tPHEgRwaic4bZmrsB1CVw%40mail.gmail.com?utm_medium=3Demail&= utm_source=3Dfooter). --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/fIV6s9kzbISHgHoeDhgEEA9GgrDoyN6BgQZqqDi3ZqPGCKYi77R0CfaRAzDi= zlcsyySuTW-hUJ09r490q5YnNTuTK0A-Ag8aRrD7Fca9gfM%3D%40protonmail.com. --b1_4884062c6799427a866a65c5e475bdca Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On sorting/collation, from https://pandoc.org/installing.html (note that you=E2=80=99ll have t= o compile pandoc-citeproc yourself; it=E2=80=99s not contained in the preco= mpiled packages):

> By default pandoc-citep= roc uses the "i;unicode-casemap" method to sort
> bibliogr= aphy entries (RFC 5051). If you would like to use the
> lo= cale-sensitive unicode collation algorithm instead, specify the
> `unicode_collation` flag:

> &n= bsp;   cabal install pandoc-citeproc -funicode_collation

> Note that this requires the text-icu library, whi= ch in turn depends on
> the C library icu4c. Installation = directions vary by platform. Here is
> how it might work o= n macOS with Homebrew:

>   &= nbsp; brew install icu4c
>     cabal i= nstall --extra-lib-dirs=3D/usr/local/Cellar/icu4c/51.1/lib \
= >     --extra-include-dirs=3D/usr/local/Cellar/icu4c= /51.1/include \
>     -funicode_collat= ion text-icu pandoc-citeproc

Some of these pat= hs on longer seem to be accurate, and I=E2=80=99m using stack rather than c= abal, so my current incantation on macOS is:

`= ``
  stack install pandoc-citeproc \
 = ;   --flag "pandoc-citeproc:unicode_collation" \
&n= bsp;   --extra-lib-dirs=3D/usr/local/opt/icu4c/lib \
    --extra-include-dirs=3D/usr/local/opt/icu4c/include
```

I=E2=80=99ve been using this f= or quite a while now, and the resulting collation in, e.g., Danish, German,= Spanish, seems to be absolutely correct.

=E2= =80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Messa= ge =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90
On Saturday, September 7, 2019 10:10 AM, BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org&g= t; wrote:

Oops, forgot the link:

D= en l=C3=B6r 7 sep. 2019 12:05BPJ <m= elroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
I just realized two things which make= matters much worse:

1.  Not all publications accept the same transliteration schemes. Jus= t by surveying one author's references to his own works in one bibliography= I find that his surname, =D0=AF=D0=B1=D0=BB=D0=BE=D0=BD=D1=81=D0=BA=D0=B8= =D0=B9, can be transliterated in five different ways (although two predomin= ate)! So I'll need both a `transliterated title` field and a field `transli= terated authors` field with (in each item) a mapping of alternative transli= terations. Even Icelandic needs to be transliterated sometimes, e.g. =C3=9E= =C3=B3r=C3=B0ur becoming Th=C3=B3rdur (with data loss!)

2. Sorting. Latin letters like _=C4= =8D, =C5=A1,  =C5=BE_ need to sort as _c, s, z_ and probably _=C3=9E_ = must sometimes sort like _Th_ and sometimes after _z_! This needs sometimes= tailored locale dependent sorting! Accented letters can ideally be handled= by entering things in NFC and hoping that sort algorithms ignore combining= marks, but then e.g. in Scandinavian languages _=C3=B6_ sorts not as _o_ b= ut at the end of the alphabet (ideally _=C3=BE, =C3=A6, =C3=B8, =C3=A5, =C3= =A4, =C3=B6_ go at the end of the alphabet in that order, but often _=C3=A6= /=C3=A4, =C3=B8/=C3=B6_ are conflated either before or after _=C3=A5_!). An= yway it seems CSL has no customizable sort key field. I know how to handle = these things myself with [Unicode::Collate][] but that at least means some = postprocessing of as yet unknown complexity.

Den ons 4 sep. 2019 09:33BPJ <melro= ch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
Do= es anyone know how to handle transliterated titles and names in
<= div> citations, when you want to include both the transliteration and the <= br>
original? Does CSL have any fields for that?
=
TIA,

/bpj


--
= You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group.
To unsubscribe from this group and st= op receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.c= om.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msg= id/pandoc-discuss/fIV6s9kzbISHgHoeDhgEEA9GgrDoyN6BgQZqqDi3ZqPGCKYi77R0CfaRA= zDizlcsyySuTW-hUJ09r490q5YnNTuTK0A-Ag8aRrD7Fca9gfM%3D%40protonmail.com.=
--b1_4884062c6799427a866a65c5e475bdca--