From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/114337 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?q?Ivan_Pe=C5=A1i=C4=87_via_ntg-context?= Newsgroups: gmane.comp.tex.context Subject: Transliteration Date: Thu, 3 Feb 2022 23:15:28 +0400 Message-ID: Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------53E517AF366DE9A94104A958" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23999"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 Cc: =?UTF-8?B?SXZhbiBQZcWhacSH?= To: ntg-context@ntg.nl Original-X-From: ntg-context-bounces@ntg.nl Thu Feb 03 20:16:06 2022 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane-mx.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nFhaP-0005wr-Rn for gctc-ntg-context-518@m.gmane-mx.org; Thu, 03 Feb 2022 20:16:05 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 28E1A2A2818; Thu, 3 Feb 2022 20:15:35 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wIUvrkyLOSV0; Thu, 3 Feb 2022 20:15:34 +0100 (CET) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id E511C2A2801; Thu, 3 Feb 2022 20:15:33 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 66F8D2A2800 for ; Thu, 3 Feb 2022 20:15:32 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zEtsNLKS70ch for ; Thu, 3 Feb 2022 20:15:30 +0100 (CET) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.221.45; helo=mail-wr1-f45.google.com; envelope-from=ivan.pesic@gmail.com; receiver= Original-Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id B7A8A2A17DC for ; Thu, 3 Feb 2022 20:15:30 +0100 (CET) Original-Received: by mail-wr1-f45.google.com with SMTP id v13so6904617wrv.10 for ; Thu, 03 Feb 2022 11:15:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:from:subject:message-id:date:user-agent:mime-version :content-language; bh=XJocSoRT0yUoXI8r/2WtCcayf44xzKZF+nfCQLSdRAE=; b=EWKoLSdOMtDhpCqSMI4KjunIFMHw1kcqS1OdbOB//CEEZNyKacLfty4qb19mwtOJbk 1kxYDy639ykkJsCtopKXM4JhX9G/VmE3fQdXMYF2bADJU16y30EMdL5qBxTv6l5P2v28 Yt58Iod072m1xIHru+oBstdWKX4RNldQGYp4TYnNTKoAFVFozjof4cxH+q+SQymn24ZO Y8I2kjGiTQzBSBswTTAZo/dES6Qsojba97k44fzkkIIZtT4DJfcb89GcUXmB8zXJeUrf mOO2yYK+UICVdnuWfCHtbPq3QR1F6p28XVl5wXr31JjjYy8gq9wj0/JAc8SPlUDWJnxG N+WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-language; bh=XJocSoRT0yUoXI8r/2WtCcayf44xzKZF+nfCQLSdRAE=; b=Kj+R5efQAJwt/S5j7ZtOFZGCLPKjwhKQdTgDErbMdqGtI6SGxV3Zwd8Nfqz8f1zm4R /aGZI9XRB4QU+fqNzhv25iTQFV4PoKRwmACp6LCx5LD3CrMMHrcSKwcZDsZLD9TnMh0r zWEcLXwmhZqS1/sa6A/PpRp6DlLDFVNnkVwN/bt1s6c/uOG3nkvfoHgeUOtabiO5oG4c hjitFa8rJhIgHCm9AtyEoDmuEPCVhSPPj3ijET6FqzqkVjgdRd+OCfzwdyIHPxt0A6qK LjVBfeTZW02dSR2LrNo/ERmbMCUHIN30pTEsjPeza/nqeOjoQOPc/6S0DQ2h6rd9i6Cr sC5A== X-Gm-Message-State: AOAM531LMQyEu7EnPzbH1RCOS65Z63JLFKKKr5rdsNbykoX+p3X2OpER J9DNeZc95jurTlYmGJndh56EbyhWkZw= X-Google-Smtp-Source: ABdhPJz/ZWtQF16Yccwwv09Uaq++wqggJqvyq3sO6z/lQOfCoHFDNmspTEtsEAQSalcuTbcGokBrnw== X-Received: by 2002:adf:d1c5:: with SMTP id b5mr19275597wrd.465.1643915729961; Thu, 03 Feb 2022 11:15:29 -0800 (PST) Original-Received: from [192.168.201.122] ([94.201.112.139]) by smtp.googlemail.com with ESMTPSA id p8sm23405335wre.72.2022.02.03.11.15.28 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 03 Feb 2022 11:15:29 -0800 (PST) Content-Language: en-US X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.io gmane.comp.tex.context:114337 Archived-At: This is a multi-part message in MIME format. --------------53E517AF366DE9A94104A958 Content-Type: multipart/alternative; boundary="------------5319040A21FB57C755825623" --------------5319040A21FB57C755825623 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hello! I've been working on a Serbian book and I had to transliterate it from=20 cyrillic to latin. There's been some nice improvement in transliteration, and I would like=20 to propose a small change. One of the peculiarities that current transliteration mechanisms (both=20 internal one and the 3rd party module from Philipp Gesang) don't process is that =D0=89, =D0=8A and =D0=8F are transliterated to Lj,= Nj and D=C5=BE in=20 normal words that start the sentence, or in names that normally start=20 with a capital letter, but in titles written in all capitals they should be transliterated to=20 LJ, NJ and D=C5=BD. So, the quick solution was to update the current mapping vector and add=20 another one (that is attached) that maps cyrillic capitals to LJ, NJ and = D=C5=BD and set the correct 30 letters used in Serbian language. It requires a bit more manual work to set the correct mapping for all=20 capitals text, but it works. I have also merged the Serbian hyphenation patterns, so there is no need = to switch the language in order to have hyphenation in transliterated tex= t. That was possible because cyrillic and latin scripts use different code=20 points, and there are no conflicts in patterns. So I suggest merging the patterns for Serbian cyrillic and latin. There is another issue if one wants to use a dropcap and the rest of=20 that first word, and several following words are to be typeset in small=20 caps. If that first letter is =D0=89 (or other two letters that transliterate a= s=20 digraphs), then the second letter of the digraph is not typeset in small = caps because it gets injected before the group that turns on small caps. For example: \placeinitial =D0=89{\sc =D1=83=D0=B4=D0=B8 =D0=BD=D0=B8=D1=81=D1=83 =D0=B7=D0=BD=D0= =B0=D0=BB=D0=B8} but this is quite a special case... Regards, Ivan --------------5319040A21FB57C755825623 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit Hello!
I've been working on a Serbian book and I had to transliterate it from cyrillic to latin.
There's been some nice improvement in transliteration, and I would like to propose a small change.
One of the peculiarities that current transliteration mechanisms (both internal one and the 3rd party module from Philipp Gesang)
don't process is that Љ, Њ and Џ are transliterated to Lj, Nj and Dž in normal words that start the sentence, or in names that normally start with a capital letter,
but in titles written in all capitals they should be transliterated to LJ, NJ and DŽ.
So, the quick solution was to update the current mapping vector and add another one (that is attached) that maps cyrillic capitals to LJ, NJ and DŽ
and set the correct 30 letters used in Serbian language.
It requires a bit more manual work to set the correct mapping for all capitals text, but it works.
I have also merged the Serbian hyphenation patterns, so there is no need to switch the language in order to have hyphenation in transliterated text.
That was possible because cyrillic and latin scripts use different code points, and there are no conflicts in patterns.
So I suggest merging the patterns for Serbian cyrillic and latin.

There is another issue if one wants to use a dropcap and the rest of that first word, and several following words are to be typeset in small caps.
If that first letter is Љ (or other two letters that transliterate as digraphs), then the second letter of the digraph is not typeset in small caps because
it gets injected before the group that turns on small caps.
For example:
\placeinitial
Љ{\sc уди нису знали}

but this is quite a special case...

Regards,
Ivan
--------------5319040A21FB57C755825623-- --------------53E517AF366DE9A94104A958 Content-Type: text/plain; charset=UTF-8; name="lang-imp-serbian.lua" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="lang-imp-serbian.lua" cmV0dXJuIHsNCiAgdHJhbnNsaXRlcmF0aW9ucyA9IHsNCiAgICBbImMybCJdID0gew0KICAg ICAgICBtYXBwaW5nID0gew0KICAgICAgICBbItCQIl0gPSAiQSIsICBbItCwIl0gPSAiYSIs DQogICAgICAgIFsi0JEiXSA9ICJCIiwgIFsi0LEiXSA9ICJiIiwNCiAgICAgICAgWyLQkiJd ID0gIlYiLCAgWyLQsiJdID0gInYiLA0KICAgICAgICBbItCTIl0gPSAiRyIsICBbItCzIl0g PSAiZyIsDQogICAgICAgIFsi0JQiXSA9ICJEIiwgIFsi0LQiXSA9ICJkIiwNCiAgICAgICAg WyLQgiJdID0gIsSQIiwgIFsi0ZIiXSA9ICLEkSIsDQogICAgICAgIFsi0JUiXSA9ICJFIiwg IFsi0LUiXSA9ICJlIiwNCiAgICAgICAgWyLQliJdID0gIsW9IiwgIFsi0LYiXSA9ICLFviIs DQogICAgICAgIFsi0JciXSA9ICJaIiwgIFsi0LciXSA9ICJ6IiwNCiAgICAgICAgWyLQmCJd ID0gIkkiLCAgWyLQuCJdID0gImkiLA0KICAgICAgICBbItCIIl0gPSAiSiIsICBbItGYIl0g PSAiaiIsDQogICAgICAgIFsi0JoiXSA9ICJLIiwgIFsi0LoiXSA9ICJrIiwNCiAgICAgICAg WyLQmyJdID0gIkwiLCAgWyLQuyJdID0gImwiLA0KICAgICAgICBbItCJIl0gPSAiTGoiLCAg WyLRmSJdID0gImxqIiwNCiAgICAgICAgWyLQnCJdID0gIk0iLCAgWyLQvCJdID0gIm0iLA0K ICAgICAgICBbItCdIl0gPSAiTiIsICBbItC9Il0gPSAibiIsDQogICAgICAgIFsi0IoiXSA9 ICJOaiIsICBbItGaIl0gPSAibmoiLA0KICAgICAgICBbItCeIl0gPSAiTyIsICBbItC+Il0g PSAibyIsDQogICAgICAgIFsi0J8iXSA9ICJQIiwgIFsi0L8iXSA9ICJwIiwNCiAgICAgICAg WyLQoCJdID0gIlIiLCAgWyLRgCJdID0gInIiLA0KICAgICAgICBbItChIl0gPSAiUyIsICBb ItGBIl0gPSAicyIsDQogICAgICAgIFsi0KIiXSA9ICJUIiwgWyLRgiJdID0gInQiLA0KICAg ICAgICBbItCLIl0gPSAixIYiLCAgWyLRmyJdID0gIsSHIiwNCiAgICAgICAgWyLQoyJdID0g IlUiLCAgWyLRgyJdID0gInUiLA0KICAgICAgICBbItCkIl0gPSAiRiIsICBbItGEIl0gPSAi ZiIsDQogICAgICAgIFsi0KUiXSA9ICJIIiwgWyLRhSJdID0gImgiLA0KICAgICAgICBbItCm Il0gPSAiQyIsICBbItGGIl0gPSAiYyIsDQogICAgICAgIFsi0KciXSA9ICLEjCIsICBbItGH Il0gPSAixI0iLA0KICAgICAgICBbItCPIl0gPSAiRMW+IiwgWyLRnyJdID0gImTFviIsDQog ICAgICAgIFsi0KgiXSA9ICLFoCIsIFsi0YgiXSA9ICLFoSIsDQogICAgICAgIH0NCiAgICB9 LA0KICAgIFsiQzJMIl0gPSB7DQogICAgICAgIG1hcHBpbmcgPSB7DQogICAgICAgIFsi0JAi XSA9ICJBIiwgIFsi0LAiXSA9ICJhIiwNCiAgICAgICAgWyLQkSJdID0gIkIiLCAgWyLQsSJd ID0gImIiLA0KICAgICAgICBbItCSIl0gPSAiViIsICBbItCyIl0gPSAidiIsDQogICAgICAg IFsi0JMiXSA9ICJHIiwgIFsi0LMiXSA9ICJnIiwNCiAgICAgICAgWyLQlCJdID0gIkQiLCAg WyLQtCJdID0gImQiLA0KICAgICAgICBbItCCIl0gPSAixJAiLCAgWyLRkiJdID0gIsSRIiwN CiAgICAgICAgWyLQlSJdID0gIkUiLCAgWyLQtSJdID0gImUiLA0KICAgICAgICBbItCWIl0g PSAixb0iLCAgWyLQtiJdID0gIsW+IiwNCiAgICAgICAgWyLQlyJdID0gIloiLCAgWyLQtyJd ID0gInoiLA0KICAgICAgICBbItCYIl0gPSAiSSIsICBbItC4Il0gPSAiaSIsDQogICAgICAg IFsi0IgiXSA9ICJKIiwgIFsi0ZgiXSA9ICJqIiwNCiAgICAgICAgWyLQmiJdID0gIksiLCAg WyLQuiJdID0gImsiLA0KICAgICAgICBbItCbIl0gPSAiTCIsICBbItC7Il0gPSAibCIsDQog ICAgICAgIFsi0IkiXSA9ICJMSiIsICBbItGZIl0gPSAibGoiLA0KICAgICAgICBbItCcIl0g PSAiTSIsICBbItC8Il0gPSAibSIsDQogICAgICAgIFsi0J0iXSA9ICJOIiwgIFsi0L0iXSA9 ICJuIiwNCiAgICAgICAgWyLQiiJdID0gIk5KIiwgIFsi0ZoiXSA9ICJuaiIsDQogICAgICAg IFsi0J4iXSA9ICJPIiwgIFsi0L4iXSA9ICJvIiwNCiAgICAgICAgWyLQnyJdID0gIlAiLCAg WyLQvyJdID0gInAiLA0KICAgICAgICBbItCgIl0gPSAiUiIsICBbItGAIl0gPSAiciIsDQog ICAgICAgIFsi0KEiXSA9ICJTIiwgIFsi0YEiXSA9ICJzIiwNCiAgICAgICAgWyLQoiJdID0g IlQiLCBbItGCIl0gPSAidCIsDQogICAgICAgIFsi0IsiXSA9ICLEhiIsICBbItGbIl0gPSAi xIciLA0KICAgICAgICBbItCjIl0gPSAiVSIsICBbItGDIl0gPSAidSIsDQogICAgICAgIFsi 0KQiXSA9ICJGIiwgIFsi0YQiXSA9ICJmIiwNCiAgICAgICAgWyLQpSJdID0gIkgiLCBbItGF Il0gPSAiaCIsDQogICAgICAgIFsi0KYiXSA9ICJDIiwgIFsi0YYiXSA9ICJjIiwNCiAgICAg ICAgWyLQpyJdID0gIsSMIiwgIFsi0YciXSA9ICLEjSIsDQogICAgICAgIFsi0I8iXSA9ICJE xb0iLCBbItGfIl0gPSAiZMW+IiwNCiAgICAgICAgWyLQqCJdID0gIsWgIiwgWyLRiCJdID0g IsWhIiwNCiAgICAgICAgfQ0KICAgICB9DQogIH0NCn0NCg== --------------53E517AF366DE9A94104A958 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly9jb250ZXh0LmFhbmhldC5uZXQKYXJjaGl2ZSAgOiBodHRwczovL2JpdGJ1Y2tldC5v cmcvcGhnL2NvbnRleHQtbWlycm9yL2NvbW1pdHMvCndpa2kgICAgIDogaHR0cDovL2NvbnRleHRn YXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --------------53E517AF366DE9A94104A958--