From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/23402 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Denis Maier Newsgroups: gmane.text.pandoc Subject: Re: Transliterated and original titles/names in citations Date: Sat, 7 Sep 2019 14:41:11 -0700 (PDT) Message-ID: References: <0c05fcec-fbb7-aed6-c1ec-e84610bcdd96@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1078_1386457443.1567892471745" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="42719"; mail-complaints-to="usenet@blaine.gmane.org" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCQOJL5Q2QMRB6GH2DVQKGQEEBJ7FJI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Sep 07 23:41:16 2019 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ot1-f64.google.com ([209.85.210.64]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1i6iSJ-000Avr-I5 for gtp-pandoc-discuss@m.gmane.org; Sat, 07 Sep 2019 23:41:15 +0200 Original-Received: by mail-ot1-f64.google.com with SMTP id k70sf5740601otk.6 for ; Sat, 07 Sep 2019 14:41:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=s8zDKjBwoWj/svfNh4HVU4+Pg7uIQYh3f55hJj7RM3Y=; b=bY0uCMH5yXbku92wqi+Zb3RPcRi+TcbT4Rf9K0EPn+++38EaiXrSQI6hdjjdUjGdVg NSGpGObTYnBmRFQvPxlqp2vg/MGrfFfJg6R2Qzrz+/yRMaSP31QzogLHpNE5ajKaXAaE Ai3+cJdXVl65hmJJa57fJnDbNHSWdgXdgdkG2eIZFdhXL/srKq4spbU9dihmM1Ibg8N9 P5kP0HLxcA910rWkrtxwv5cjyGrJfhvHmEZIpu+FDdAkCZIeGPlwH/eR59+7eh5SL01p LNNFqlrT0f/4+f8Y0n5jKL0VT/vm7kszTX9st5hVtecrGqRamBQeEdL6LjkA5+So/PIo NHfA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=s8zDKjBwoWj/svfNh4HVU4+Pg7uIQYh3f55hJj7RM3Y=; b=V0x+FRn8dLZYCO6Qmq2veMZCHwt90rSMjA3bqtBIhvmzt8Pg5VV26zOZ+NTtyEvoPp 4TJjYdP866Oxeb7g6UEUD08BOBfN6hVo+JCNg8Am5OkufBZTyZxZOLW9AnE36VFDRP48 /Vz6DxKHOOBR6cxWoEUBU67GzUmZz+z4pmgCQZtHFrjB1SRCuYl2ADJA77r/HVOKH7rk Se0MLlA32aDqD5v5YjKykZ/0ZtM57IiXf2CbNNLWnEIAyx4qHKfFw+L6JPN3J5csPenP KYhS3b7fyh/0RUGf7W7EEgM95WPv4JXBUisSOQZ7C8EEE+tnp5kjXxnIrPfR/gR0W5gU p+OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=s8zDKjBwoWj/svfNh4HVU4+Pg7uIQYh3f55hJj7RM3Y=; b=tZWikDYkKTGW5S8erqVjD40yVYufgOPZ7h2xSj28oVj5KQKg7sTiiCaPC3bSVXEcqV bpj7LSUO1mfH0nJelEgR1O+sFTNpA31l9HmRqwDvMfSyuzUs7cBrExUYCLU7x49VYBc/ Vz0CR4AqHWFsAMq57XNVBdCBIAmBkNkqs8Pr+S5ZPNniza/Ueg+/VP+Sqld7cx6ZIqfg vVZ/sqnsqs81NRxCNuSnQKJBL2Sr1mJbQahxl7+G+98onlMkUQvLsgd4q1K7rLI5uAg0 W1c9Jx5Gwp1ZJFA2uOev3Rq0oNCuNF+dDWc7z1xEfqbHhAgQay6wTw1wwHDTcpUN/5zu yyJQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: APjAAAXsPdkbeSHpuQh7lrQGvcJZj7hZ23juS9axH2XCL67CidkE3V9t Ki+/pClfbtKgWxNoF8hNWXc= X-Google-Smtp-Source: APXvYqx1D81v1EXfSLCAYqQiLtxqX87bV9Kp5gz3vDuDA8QBffL/cpVz4BPfIZO6GzJetO+U4GWnhw== X-Received: by 2002:aca:1711:: with SMTP id j17mr6366376oii.19.1567892473627; Sat, 07 Sep 2019 14:41:13 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6830:2095:: with SMTP id y21ls54698otq.5.gmail; Sat, 07 Sep 2019 14:41:12 -0700 (PDT) X-Received: by 2002:a05:6830:1217:: with SMTP id r23mr1318502otp.104.1567892472319; Sat, 07 Sep 2019 14:41:12 -0700 (PDT) In-Reply-To: X-Original-Sender: maier.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:23402 Archived-At: ------=_Part_1078_1386457443.1567892471745 Content-Type: multipart/alternative; boundary="----=_Part_1079_1407914917.1567892471745" ------=_Part_1079_1407914917.1567892471745 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Regarding 1. you can perhaps look into how Juris-M deals with language=20 variants:=20 https://juris-m.readthedocs.io/en/latest/dev-sync-simplification.html Looks quite solid. Basically it looks like this: field--language-variant title--he-alalc97 =3D> title, Hebrew, transliterated according to the Libra= ry=20 of Congress rules. Best, Denis Am Samstag, 7. September 2019 12:05:48 UTC+2 schrieb BP: > > I just realized two things which make matters much worse: > > 1. Not all publications accept the same transliteration schemes. Just by= =20 > surveying one author's references to his own works in one bibliography I= =20 > find that his surname, =D0=AF=D0=B1=D0=BB=D0=BE=D0=BD=D1=81=D0=BA=D0=B8= =D0=B9, can be transliterated in five different=20 > ways (although two predominate)! So I'll need both a `transliterated titl= e`=20 > field and a field `transliterated authors` field with (in each item) a=20 > mapping of alternative transliterations. Even Icelandic needs to be=20 > transliterated sometimes, e.g. =C3=9E=C3=B3r=C3=B0ur becoming Th=C3=B3rdu= r (with data loss!) > > 2. Sorting. Latin letters like _=C4=8D, =C5=A1, =C5=BE_ need to sort as = _c, s, z_ and=20 > probably _=C3=9E_ must sometimes sort like _Th_ and sometimes after _z_! = This=20 > needs sometimes tailored locale dependent sorting! Accented letters can= =20 > ideally be handled by entering things in NFC and hoping that sort=20 > algorithms ignore combining marks, but then e.g. in Scandinavian language= s=20 > _=C3=B6_ sorts not as _o_ but at the end of the alphabet (ideally _=C3=BE= , =C3=A6, =C3=B8, =C3=A5,=20 > =C3=A4, =C3=B6_ go at the end of the alphabet in that order, but often _= =C3=A6/=C3=A4, =C3=B8/=C3=B6_ are=20 > conflated either before or after _=C3=A5_!). Anyway it seems CSL has no= =20 > customizable sort key field. I know how to handle these things myself wit= h=20 > [Unicode::Collate][] but that at least means some postprocessing of as ye= t=20 > unknown complexity. > > Den ons 4 sep. 2019 09:33BPJ > skrev: > >> Does anyone know how to handle transliterated titles and names in=20 >> citations, when you want to include both the transliteration and the=20 >> original? Does CSL have any fields for that? >> >> TIA, >> >> /bpj >> > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/bb1e7b0b-d02e-4059-9842-46e7d29004ec%40googlegroups.com. ------=_Part_1079_1407914917.1567892471745 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Regarding 1. you can perhaps look into how Juris-M deals w= ith language variants:=C2=A0https://juris-m.readthedocs.io/en/latest= /dev-sync-simplification.html
Looks quite solid. Basically it looks= like this:
field--language-variant
title--he-alalc97 = =3D> title, Hebrew, transliterated according to the Library of Congress = rules.

Best,
Denis

Am Samstag, 7. September 2019 12:05:48 UTC+2 schrieb BP:
= I just realized two things which make matters much worse:

1.=C2=A0 Not all publications accept the = same transliteration schemes. Just by surveying one author's references= to his own works in one bibliography I find that his surname, =D0=AF=D0=B1= =D0=BB=D0=BE=D0=BD=D1=81=D0=BA=D0=B8=D0=B9, can be transliterated in five d= ifferent ways (although two predominate)! So I'll need both a `translit= erated title` field and a field `transliterated authors` field with (in eac= h item) a mapping of alternative transliterations. Even Icelandic needs to = be transliterated sometimes, e.g. =C3=9E=C3=B3r=C3=B0ur becoming Th=C3=B3rd= ur (with data loss!)

2. = Sorting. Latin letters like _=C4=8D, =C5=A1,=C2=A0 =C5=BE_ need to sort as = _c, s, z_ and probably _=C3=9E_ must sometimes sort like _Th_ and sometimes= after _z_! This needs sometimes tailored locale dependent sorting! Accente= d letters can ideally be handled by entering things in NFC and hoping that = sort algorithms ignore combining marks, but then e.g. in Scandinavian langu= ages _=C3=B6_ sorts not as _o_ but at the end of the alphabet (ideally _=C3= =BE, =C3=A6, =C3=B8, =C3=A5, =C3=A4, =C3=B6_ go at the end of the alphabet = in that order, but often _=C3=A6/=C3=A4, =C3=B8/=C3=B6_ are conflated eithe= r before or after _=C3=A5_!). Anyway it seems CSL has no customizable sort = key field. I know how to handle these things myself with [Unicode::Collate]= [] but that at least means some postprocessing of as yet unknown complexity= .

Den ons 4 sep. 2019 = 09:33BPJ <mel...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
Does anyone know how to handle transliterated titles and names in
citations, when you want to include both the transliteration and the
original? Does CSL have any fields for that?

TIA,

/bpj

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/bb1e7b0b-d02e-4059-9842-46e7d29004ec%40googlegroups.co= m.
------=_Part_1079_1407914917.1567892471745-- ------=_Part_1078_1386457443.1567892471745--