* Transliteration
@ 2022-02-03 19:15 Ivan Pešić via ntg-context
2022-02-03 20:41 ` Transliteration Hans Hagen via ntg-context
0 siblings, 1 reply; 4+ messages in thread
From: Ivan Pešić via ntg-context @ 2022-02-03 19:15 UTC (permalink / raw)
To: ntg-context; +Cc: Ivan Pešić
[-- Attachment #1.1: Type: text/plain, Size: 1763 bytes --]
Hello!
I've been working on a Serbian book and I had to transliterate it from
cyrillic to latin.
There's been some nice improvement in transliteration, and I would like
to propose a small change.
One of the peculiarities that current transliteration mechanisms (both
internal one and the 3rd party module from Philipp Gesang)
don't process is that Љ, Њ and Џ are transliterated to Lj, Nj and Dž in
normal words that start the sentence, or in names that normally start
with a capital letter,
but in titles written in all capitals they should be transliterated to
LJ, NJ and DŽ.
So, the quick solution was to update the current mapping vector and add
another one (that is attached) that maps cyrillic capitals to LJ, NJ and DŽ
and set the correct 30 letters used in Serbian language.
It requires a bit more manual work to set the correct mapping for all
capitals text, but it works.
I have also merged the Serbian hyphenation patterns, so there is no need
to switch the language in order to have hyphenation in transliterated text.
That was possible because cyrillic and latin scripts use different code
points, and there are no conflicts in patterns.
So I suggest merging the patterns for Serbian cyrillic and latin.
There is another issue if one wants to use a dropcap and the rest of
that first word, and several following words are to be typeset in small
caps.
If that first letter is Љ (or other two letters that transliterate as
digraphs), then the second letter of the digraph is not typeset in small
caps because
it gets injected before the group that turns on small caps.
For example:
\placeinitial
Љ{\sc уди нису знали}
but this is quite a special case...
Regards,
Ivan
[-- Attachment #1.2: Type: text/html, Size: 2138 bytes --]
[-- Attachment #2: lang-imp-serbian.lua --]
[-- Type: text/plain, Size: 2464 bytes --]
return {
transliterations = {
["c2l"] = {
mapping = {
["А"] = "A", ["а"] = "a",
["Б"] = "B", ["б"] = "b",
["В"] = "V", ["в"] = "v",
["Г"] = "G", ["г"] = "g",
["Д"] = "D", ["д"] = "d",
["Ђ"] = "Đ", ["ђ"] = "đ",
["Е"] = "E", ["е"] = "e",
["Ж"] = "Ž", ["ж"] = "ž",
["З"] = "Z", ["з"] = "z",
["И"] = "I", ["и"] = "i",
["Ј"] = "J", ["ј"] = "j",
["К"] = "K", ["к"] = "k",
["Л"] = "L", ["л"] = "l",
["Љ"] = "Lj", ["љ"] = "lj",
["М"] = "M", ["м"] = "m",
["Н"] = "N", ["н"] = "n",
["Њ"] = "Nj", ["њ"] = "nj",
["О"] = "O", ["о"] = "o",
["П"] = "P", ["п"] = "p",
["Р"] = "R", ["р"] = "r",
["С"] = "S", ["с"] = "s",
["Т"] = "T", ["т"] = "t",
["Ћ"] = "Ć", ["ћ"] = "ć",
["У"] = "U", ["у"] = "u",
["Ф"] = "F", ["ф"] = "f",
["Х"] = "H", ["х"] = "h",
["Ц"] = "C", ["ц"] = "c",
["Ч"] = "Č", ["ч"] = "č",
["Џ"] = "Dž", ["џ"] = "dž",
["Ш"] = "Š", ["ш"] = "š",
}
},
["C2L"] = {
mapping = {
["А"] = "A", ["а"] = "a",
["Б"] = "B", ["б"] = "b",
["В"] = "V", ["в"] = "v",
["Г"] = "G", ["г"] = "g",
["Д"] = "D", ["д"] = "d",
["Ђ"] = "Đ", ["ђ"] = "đ",
["Е"] = "E", ["е"] = "e",
["Ж"] = "Ž", ["ж"] = "ž",
["З"] = "Z", ["з"] = "z",
["И"] = "I", ["и"] = "i",
["Ј"] = "J", ["ј"] = "j",
["К"] = "K", ["к"] = "k",
["Л"] = "L", ["л"] = "l",
["Љ"] = "LJ", ["љ"] = "lj",
["М"] = "M", ["м"] = "m",
["Н"] = "N", ["н"] = "n",
["Њ"] = "NJ", ["њ"] = "nj",
["О"] = "O", ["о"] = "o",
["П"] = "P", ["п"] = "p",
["Р"] = "R", ["р"] = "r",
["С"] = "S", ["с"] = "s",
["Т"] = "T", ["т"] = "t",
["Ћ"] = "Ć", ["ћ"] = "ć",
["У"] = "U", ["у"] = "u",
["Ф"] = "F", ["ф"] = "f",
["Х"] = "H", ["х"] = "h",
["Ц"] = "C", ["ц"] = "c",
["Ч"] = "Č", ["ч"] = "č",
["Џ"] = "DŽ", ["џ"] = "dž",
["Ш"] = "Š", ["ш"] = "š",
}
}
}
}
[-- Attachment #3: Type: text/plain, Size: 493 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Transliteration
2022-02-03 19:15 Transliteration Ivan Pešić via ntg-context
@ 2022-02-03 20:41 ` Hans Hagen via ntg-context
2022-02-03 21:01 ` Transliteration Mojca Miklavec via ntg-context
0 siblings, 1 reply; 4+ messages in thread
From: Hans Hagen via ntg-context @ 2022-02-03 20:41 UTC (permalink / raw)
To: mailing list for ConTeXt users; +Cc: Hans Hagen, Mojca Miklavec
On 2/3/2022 8:15 PM, Ivan Pešić via ntg-context wrote:
> Hello!
> I've been working on a Serbian book and I had to transliterate it from
> cyrillic to latin.
> There's been some nice improvement in transliteration, and I would like
> to propose a small change.
> One of the peculiarities that current transliteration mechanisms (both
> internal one and the 3rd party module from Philipp Gesang)
> don't process is that Љ, Њ and Џ are transliterated to Lj, Nj and Dž in
> normal words that start the sentence, or in names that normally start
> with a capital letter,
> but in titles written in all capitals they should be transliterated to
> LJ, NJ and DŽ.
> So, the quick solution was to update the current mapping vector and add
> another one (that is attached) that maps cyrillic capitals to LJ, NJ and DŽ
> and set the correct 30 letters used in Serbian language.
> It requires a bit more manual work to set the correct mapping for all
> capitals text, but it works.
> I have also merged the Serbian hyphenation patterns, so there is no need
> to switch the language in order to have hyphenation in transliterated text.
> That was possible because cyrillic and latin scripts use different code
> points, and there are no conflicts in patterns.
> So I suggest merging the patterns for Serbian cyrillic and latin.
I'd like to hear Arthur / Mojca on that .... we can of course load them
both but if that is an upstream merge i'll wait for that
you can actually map multiple to multiple in the tranmsliteration tables
["foo"] = "oof"
and such and there is in the next version also an exception mechanism
that permits clone a transliteration and add exceptions
> There is another issue if one wants to use a dropcap and the rest of
> that first word, and several following words are to be typeset in small
> caps.
> If that first letter is Љ (or other two letters that transliterate as
> digraphs), then the second letter of the digraph is not typeset in small
> caps because
> it gets injected before the group that turns on small caps.
> For example:
>
> \placeinitial
> Љ{\sc уди нису знали}
>
> but this is quite a special case...
you can use \settransliteration{name} locally so as part of a style
specification (there is also \resettransliteration)
the next upload has some more that Sreeram is currently documenting on
the wiki
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Transliteration
2022-02-03 20:41 ` Transliteration Hans Hagen via ntg-context
@ 2022-02-03 21:01 ` Mojca Miklavec via ntg-context
2022-02-03 21:11 ` Transliteration Hans Hagen via ntg-context
0 siblings, 1 reply; 4+ messages in thread
From: Mojca Miklavec via ntg-context @ 2022-02-03 21:01 UTC (permalink / raw)
To: Hans Hagen; +Cc: Mojca Miklavec, mailing list for ConTeXt users
On Thu, 3 Feb 2022 at 21:41, Hans Hagen wrote:
>
> > I have also merged the Serbian hyphenation patterns, so there is no need
> > to switch the language in order to have hyphenation in transliterated text.
> > That was possible because cyrillic and latin scripts use different code
> > points, and there are no conflicts in patterns.
> > So I suggest merging the patterns for Serbian cyrillic and latin.
>
> I'd like to hear Arthur / Mojca on that .... we can of course load them
> both but if that is an upstream merge i'll wait for that
Yes, loading both patterns at once is definitely the correct approach.
That's what the rest of the TeX world already does (at least LuaTeX
and XeTeX; pdfTeX not of course), see
https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/loadhyph/loadhyph-sr-latn.tex
We have two sets of Cyrillic patterns (and several Latin ones as
well), so composing a single file was a bit of a (somewhat political)
challenge.
Now at least in theory the users are free to choose which of the two
sets of patterns they want.
I never checked what ConTeXt was doing with the Serbian patterns.
Personally I would suggest taking hyph-sh-cyrl.pat.txt and hyph-sh-latn.pat.txt.
Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Transliteration
2022-02-03 21:01 ` Transliteration Mojca Miklavec via ntg-context
@ 2022-02-03 21:11 ` Hans Hagen via ntg-context
0 siblings, 0 replies; 4+ messages in thread
From: Hans Hagen via ntg-context @ 2022-02-03 21:11 UTC (permalink / raw)
To: Mojca Miklavec; +Cc: Hans Hagen, mailing list for ConTeXt users
On 2/3/2022 10:01 PM, Mojca Miklavec wrote:
> On Thu, 3 Feb 2022 at 21:41, Hans Hagen wrote:
>>
>>> I have also merged the Serbian hyphenation patterns, so there is no need
>>> to switch the language in order to have hyphenation in transliterated text.
>>> That was possible because cyrillic and latin scripts use different code
>>> points, and there are no conflicts in patterns.
>>> So I suggest merging the patterns for Serbian cyrillic and latin.
>>
>> I'd like to hear Arthur / Mojca on that .... we can of course load them
>> both but if that is an upstream merge i'll wait for that
>
> Yes, loading both patterns at once is definitely the correct approach.
> That's what the rest of the TeX world already does (at least LuaTeX
> and XeTeX; pdfTeX not of course), see
> https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/loadhyph/loadhyph-sr-latn.tex
>
> We have two sets of Cyrillic patterns (and several Latin ones as
> well), so composing a single file was a bit of a (somewhat political)
> challenge.
> Now at least in theory the users are free to choose which of the two
> sets of patterns they want.
>
> I never checked what ConTeXt was doing with the Serbian patterns.
> Personally I would suggest taking hyph-sh-cyrl.pat.txt and hyph-sh-latn.pat.txt.
we currently do this:
{ "sr", "hyph-sr", "serbian", false, { "hyph-sr-cyrl",
"hyph-sr-latn" }, },
so you suggest to replace that by the "sh" variants
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-03 21:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 19:15 Transliteration Ivan Pešić via ntg-context
2022-02-03 20:41 ` Transliteration Hans Hagen via ntg-context
2022-02-03 21:01 ` Transliteration Mojca Miklavec via ntg-context
2022-02-03 21:11 ` Transliteration Hans Hagen via ntg-context
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).