ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* transliteration russian
@ 2010-10-29 11:18 Steffen Wolfrum
  2010-10-29 11:58 ` Thomas A. Schmitz
  2010-10-29 21:25 ` Mojca Miklavec
  0 siblings, 2 replies; 16+ messages in thread
From: Steffen Wolfrum @ 2010-10-29 11:18 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Mojca Miklavec, Vyatcheslav Yatskovsky

Hi all,

I am just about to typeset a book of a russian author written in english, but with a lot of russian literature listed in the bibliography:
The titles of theses sources are russian but in latin transliteration, like this ...
O koordinacii mezhdunarodnyh i vneshnejekonomicheskih svjazej subjektov Rossijskoj Federacii

But even though I assigned "\language[ru]" the word "vneshnejekonomicheskih" eg. does not get hyphenated.
And there are some dozen titles more that show the same problem ...

Is this (to not hyphenate) because of the transliteration?
Do I have to choose another \language key?

Yours,
Steffen
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 11:18 transliteration russian Steffen Wolfrum
@ 2010-10-29 11:58 ` Thomas A. Schmitz
  2010-10-29 13:44   ` Jano Kula
  2010-10-29 21:25 ` Mojca Miklavec
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas A. Schmitz @ 2010-10-29 11:58 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Oct 29, 2010, at 1:18 PM, Steffen Wolfrum wrote:

> Hi all,
> 
> I am just about to typeset a book of a russian author written in english, but with a lot of russian literature listed in the bibliography:
> The titles of theses sources are russian but in latin transliteration, like this ...
> O koordinacii mezhdunarodnyh i vneshnejekonomicheskih svjazej subjektov Rossijskoj Federacii
> 
> But even though I assigned "\language[ru]" the word "vneshnejekonomicheskih" eg. does not get hyphenated.
> And there are some dozen titles more that show the same problem ...
> 
> Is this (to not hyphenate) because of the transliteration?
> Do I have to choose another \language key?
> 
Of course. To the luaTeX parser, the transliterated Russian is just gobbledygook, the hyphenation patterns expect proper unicode input.

Thomas

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 11:58 ` Thomas A. Schmitz
@ 2010-10-29 13:44   ` Jano Kula
  0 siblings, 0 replies; 16+ messages in thread
From: Jano Kula @ 2010-10-29 13:44 UTC (permalink / raw)
  To: ntg-context

On 10/29/2010 01:58 PM, Thomas A. Schmitz wrote:
>
> On Oct 29, 2010, at 1:18 PM, Steffen Wolfrum wrote:
>
>> Hi all,
>>
>> I am just about to typeset a book of a russian author written in english, but with a lot of russian literature listed in the bibliography:
>> The titles of theses sources are russian but in latin transliteration, like this ...
>> O koordinacii mezhdunarodnyh i vneshnejekonomicheskih svjazej subjektov Rossijskoj Federacii
>>
>> But even though I assigned "\language[ru]" the word "vneshnejekonomicheskih" eg. does not get hyphenated.
>> And there are some dozen titles more that show the same problem ...
>>
>> Is this (to not hyphenate) because of the transliteration?
>> Do I have to choose another \language key?

I would expect slavic languages (cz, pl) to give better results in 
hyphenation of this transliterated text, though they will not give 
perfect results and exceptions will be needed. I'm assuming the reader 
how expects Russian hyphenation rules in these cases.

Jano

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 11:18 transliteration russian Steffen Wolfrum
  2010-10-29 11:58 ` Thomas A. Schmitz
@ 2010-10-29 21:25 ` Mojca Miklavec
  2010-10-29 22:05   ` Khaled Hosny
                     ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Mojca Miklavec @ 2010-10-29 21:25 UTC (permalink / raw)
  To: Steffen Wolfrum; +Cc: mailing list for ConTeXt users

On Fri, Oct 29, 2010 at 13:18, Steffen Wolfrum wrote:
> Hi all,
>
> I am just about to typeset a book of a russian author written in english, but with a lot of russian literature listed in the bibliography:
> The titles of theses sources are russian but in latin transliteration, like this ...
> O koordinacii mezhdunarodnyh i vneshnejekonomicheskih svjazej subjektov Rossijskoj Federacii
>
> But even though I assigned "\language[ru]" the word "vneshnejekonomicheskih" eg. does not get hyphenated.
> And there are some dozen titles more that show the same problem ...
>
> Is this (to not hyphenate) because of the transliteration?
> Do I have to choose another \language key?

Dear Steffen,

The Russian patterns only cover the Cyrillic part. Serbian patterns
are the only ones that cover both scripts, but even then the patterns
themselves are seen as two different languages by TeX.

The best thing to do would be to transliterate Russian patterns into
Latin script (under one condition: transliteration needs to be
one-to-one; if one cyrillic glyph transliterates into two latin
characters, that doesn't help you). If you use LuaTeX you may then
load the patterns on the fly.

Another "easy" option would be to load any other slavic patterns as
Jano suggested and then add exceptions where needed. I'm not sure if
transliterated patterns belong to hyph-utf8. (If nothing else, Russian
is transliterated differently into Slovenian for example, so one would
formally then need "transliteration from Russian to any other given
language written in Cyrillic script").

[still under assumption that you use LuaTeX and that transliteration
is one-to-one]
By far the easiest and most portable solution would be if you could
convince Taco to implement something like "latin a is equivalent to
cyrillic a as far as hyphenation is concerned" (which could also solve
many other problems that we have). Actually, you can already do that
by redefining \lccode of latin a to point to cyrillic a (and do that
for the whole alphabet), but then you need to make sure that you don't
use any commands for lowercasing/uppercasing words. If you need
details, I can help you out, but first exact transliteration rules are
needed.

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 21:25 ` Mojca Miklavec
@ 2010-10-29 22:05   ` Khaled Hosny
  2010-10-30  8:17     ` Hans Hagen
  2010-10-29 22:15   ` Andrzej Orłowski-Skoczyk
  2010-10-29 22:47   ` Philipp Gesang
  2 siblings, 1 reply; 16+ messages in thread
From: Khaled Hosny @ 2010-10-29 22:05 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Steffen Wolfrum

On Fri, Oct 29, 2010 at 11:25:20PM +0200, Mojca Miklavec wrote:
> By far the easiest and most portable solution would be if you could
> convince Taco to implement something like "latin a is equivalent to
> cyrillic a as far as hyphenation is concerned" (which could also solve
> many other problems that we have). Actually, you can already do that
> by redefining \lccode of latin a to point to cyrillic a (and do that
> for the whole alphabet), but then you need to make sure that you don't
> use any commands for lowercasing/uppercasing words. If you need
> details, I can help you out, but first exact transliteration rules are
> needed.

I was thinking, since using \lccode for hyphenation is really a wired
choice (I'm sure don has a good reason back then, but such things are
usually no longer relevant), and since it is used in a sort of
controlled environment (playing with \lccode's for hyphenation is not
ever one's toy), may be luatex can break the backward compatibility in
the hyphenation area and have a dedicated new code, \hycode or
something, only for hyphenation purposes (may be backward compatibility
can be kept by using it in addition to \lccode, maybe).

What do you think?

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 21:25 ` Mojca Miklavec
  2010-10-29 22:05   ` Khaled Hosny
@ 2010-10-29 22:15   ` Andrzej Orłowski-Skoczyk
  2010-10-29 22:31     ` Mojca Miklavec
  2010-10-30 14:24     ` Steffen Wolfrum
  2010-10-29 22:47   ` Philipp Gesang
  2 siblings, 2 replies; 16+ messages in thread
From: Andrzej Orłowski-Skoczyk @ 2010-10-29 22:15 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 10/29/2010 11:25 PM, Mojca Miklavec wrote:
> The best thing to do would be to transliterate Russian patterns into
> Latin script (under one condition: transliteration needs to be
> one-to-one; if one cyrillic glyph transliterates into two latin
> characters, that doesn't help you). If you use LuaTeX you may then
> load the patterns on the fly.

Warning: the transliteration used in Steffen's document is (or at least
the example is) lossy and as such will likely produce wrong hyphenation
output no matter the applied method of making TeX hyphenate it.

The transliteration (in the example) is also inconsistent - if you tried
to reverse transliterate it to Cyrillic, you would not only miss some
characters, but you would also get some other characters wrong.

Examples:
- 'subjektov' is 'субъектов',
- 'vneshnejekonomicheskih' is 'внешнеэкономических',
thus 'je' stands for both 'ъе' and for 'э'.

This however could be just the authors typo. In such case 'subjektov'
should be corrected to 'sub"ektov'.


The way to achieve a univocal (one-to-one) transliteration would be
first to reverse transliterate it to Cyrillic, and then transliterate
back to Latin using ISO 9 transliteration standard:
http://en.wikipedia.org/wiki/ISO_9
The example 'О координации международных и внешнеэкономических связей
субъектов Российской Федерации' would then output 'O koordinacii
meždunarodnyh i vnešneèkonomičeskih svâzej sub"ektov Rossijskoj
Federacii'. This however I wouldn't consider a very human-readable output.

A very handy tool for experiments can be found here:
http://translit.cc/

On the margin: Wouldn't it be much better to use just Cyrillic for that?
-- 
Andrzej Orłowski-Skoczyk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 22:15   ` Andrzej Orłowski-Skoczyk
@ 2010-10-29 22:31     ` Mojca Miklavec
  2010-10-30 14:24     ` Steffen Wolfrum
  1 sibling, 0 replies; 16+ messages in thread
From: Mojca Miklavec @ 2010-10-29 22:31 UTC (permalink / raw)
  To: mailing list for ConTeXt users

2010/10/30 Andrzej Orłowski-Skoczyk wrote:
> On 10/29/2010 11:25 PM, Mojca Miklavec wrote:
>> The best thing to do would be to transliterate Russian patterns into
>> Latin script (under one condition: transliteration needs to be
>> one-to-one; if one cyrillic glyph transliterates into two latin
>> characters, that doesn't help you). If you use LuaTeX you may then
>> load the patterns on the fly.
>
> Warning: the transliteration used in Steffen's document is (or at least
> the example is) lossy and as such will likely produce wrong hyphenation
> output no matter the applied method of making TeX hyphenate it.

I didn't inspect the transliteration, but now that you point it out -
true, to achieve perfect results, one would need to completely
redesign the patterns.

... or simply use a random slavic language and fix the wrong
hyphenations one-by-one (in particular, words with sh/ch could easily
break even though they represent a single letter).

> The example 'О координации международных и внешнеэкономических связей
> субъектов Российской Федерации' would then output 'O koordinacii
> meždunarodnyh i vnešneèkonomičeskih svâzej sub"ektov Rossijskoj
> Federacii'. This however I wouldn't consider a very human-readable output.

... it depends on who the human is. Slavic-speaking countries have no
problem pronouncing čšž ... :) :) :) Quotation marks are a bit weird
though ...

Maybe the most sensible solution (assuming LuaTeX) that would work
perfectly but would not be easy to write could be to input the title
in Cyrillic script, let TeX hyphenate it, and finally output
automatically transliterated string.

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 21:25 ` Mojca Miklavec
  2010-10-29 22:05   ` Khaled Hosny
  2010-10-29 22:15   ` Andrzej Orłowski-Skoczyk
@ 2010-10-29 22:47   ` Philipp Gesang
  2010-10-29 23:06     ` Andrzej Orłowski-Skoczyk
  2 siblings, 1 reply; 16+ messages in thread
From: Philipp Gesang @ 2010-10-29 22:47 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2239 bytes --]

On 2010-10-29 <23:25:20>, Mojca Miklavec wrote:
> 
> The best thing to do would be to transliterate Russian patterns into
> Latin script (under one condition: transliteration needs to be
> one-to-one; if one cyrillic glyph transliterates into two latin

The one in question is rather a transcription (‘romanization’)
than a transliteration, thus unfortunately there is no
bijective mapping (e.g. ‘я’->‘ja’, ‘ш’->‘sh’ etc.). It seems to
be a hybrid between the standard Library of Congress-style
transcription and an older ISO or ΓΟСТ transliteration. Also, ‘j’
occurs in very odd positions. Whatever it is, we would need the
complete transcription mapping.

As others already pointed out, with a small number of strings
Steffen might get acceptable results by using the patterns of a
similar language. Although real transliterations work best with
Czech or Slovak, this peculiar transcription might be better off
with Polish or even (judging by the use of ‘sh’) standard
English.

@Steffen, if you could convince the author to supply the original
Russian text and if he would agree to use a more common style,
you could let the transliteration module do the job instead
(http://bitbucket.org/phg/transliterator).

> By far the easiest and most portable solution would be if you could
> convince Taco to implement something like "latin a is equivalent to
> cyrillic a as far as hyphenation is concerned" (which could also solve
> many other problems that we have).

+1. This would be a great feature.

Good night all, Philipp



> 
> Mojca
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 486 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 22:47   ` Philipp Gesang
@ 2010-10-29 23:06     ` Andrzej Orłowski-Skoczyk
  2010-10-30  9:43       ` Philipp Gesang
  0 siblings, 1 reply; 16+ messages in thread
From: Andrzej Orłowski-Skoczyk @ 2010-10-29 23:06 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 10/30/2010 12:47 AM, Philipp Gesang wrote:
> As others already pointed out, with a small number of strings
> Steffen might get acceptable results by using the patterns of a
> similar language. Although real transliterations work best with
> Czech or Slovak, this peculiar transcription might be better off
> with Polish or even (judging by the use of ‘sh’) standard
> English.

I'm afraid Polish will not do (Polish always hyphenates sz-cz, though in
Russian shch is one character; and such).

I'm afraid none Slavic language will do unless there is one that uses
Latin script _and_ soft/hard sign (yer) - these are tricky, not similar
to anything you meet in Polish/Czech and so on.
-- 
Andrzej Orłowski-Skoczyk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 22:05   ` Khaled Hosny
@ 2010-10-30  8:17     ` Hans Hagen
  2010-10-30  8:34       ` Taco Hoekwater
  2010-10-30  9:34       ` Khaled Hosny
  0 siblings, 2 replies; 16+ messages in thread
From: Hans Hagen @ 2010-10-30  8:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Steffen Wolfrum

On 30-10-2010 12:05, Khaled Hosny wrote:
> On Fri, Oct 29, 2010 at 11:25:20PM +0200, Mojca Miklavec wrote:
>> By far the easiest and most portable solution would be if you could
>> convince Taco to implement something like "latin a is equivalent to
>> cyrillic a as far as hyphenation is concerned" (which could also solve
>> many other problems that we have). Actually, you can already do that
>> by redefining \lccode of latin a to point to cyrillic a (and do that
>> for the whole alphabet), but then you need to make sure that you don't
>> use any commands for lowercasing/uppercasing words. If you need
>> details, I can help you out, but first exact transliteration rules are
>> needed.
>
> I was thinking, since using \lccode for hyphenation is really a wired
> choice (I'm sure don has a good reason back then, but such things are
> usually no longer relevant), and since it is used in a sort of
> controlled environment (playing with \lccode's for hyphenation is not
> ever one's toy), may be luatex can break the backward compatibility in
> the hyphenation area and have a dedicated new code, \hycode or
> something, only for hyphenation purposes (may be backward compatibility
> can be kept by using it in addition to \lccode, maybe).
>
> What do you think?

just any letter (catcode letter) would do and the rest is to be 
controlled by the patterns

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-30  8:17     ` Hans Hagen
@ 2010-10-30  8:34       ` Taco Hoekwater
  2010-10-30  9:34       ` Khaled Hosny
  1 sibling, 0 replies; 16+ messages in thread
From: Taco Hoekwater @ 2010-10-30  8:34 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Hans Hagen, Steffen Wolfrum

On 10/30/2010 10:17 AM, Hans Hagen wrote:
> On 30-10-2010 12:05, Khaled Hosny wrote:
>> On Fri, Oct 29, 2010 at 11:25:20PM +0200, Mojca Miklavec wrote:
>>> By far the easiest and most portable solution would be if you could
>>> convince Taco to implement something like "latin a is equivalent to
>>> cyrillic a as far as hyphenation is concerned"

You could try to convince me, but that would take considerable effort
because that is a form of cheating that I am not comfortable with.

Besides, in the non-trivial cases, a single cyrillic letter maps to
multiple latin ones, and setting that up as an internal remapping
is not trivial.

There is a simpler solution, I think: treat transliterations as a
separate language on the macro side. Generating the patterns for
that new language is simple if the transliteration rules are correct;
just do the replacements like so:

   ‘я’->‘j8a’

Best wishes,
Taco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-30  8:17     ` Hans Hagen
  2010-10-30  8:34       ` Taco Hoekwater
@ 2010-10-30  9:34       ` Khaled Hosny
  2010-10-31 18:12         ` Jano Kula
  1 sibling, 1 reply; 16+ messages in thread
From: Khaled Hosny @ 2010-10-30  9:34 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users, Steffen Wolfrum

On Sat, Oct 30, 2010 at 10:17:11AM +0200, Hans Hagen wrote:
> On 30-10-2010 12:05, Khaled Hosny wrote:
> >On Fri, Oct 29, 2010 at 11:25:20PM +0200, Mojca Miklavec wrote:
> >>By far the easiest and most portable solution would be if you could
> >>convince Taco to implement something like "latin a is equivalent to
> >>cyrillic a as far as hyphenation is concerned" (which could also solve
> >>many other problems that we have). Actually, you can already do that
> >>by redefining \lccode of latin a to point to cyrillic a (and do that
> >>for the whole alphabet), but then you need to make sure that you don't
> >>use any commands for lowercasing/uppercasing words. If you need
> >>details, I can help you out, but first exact transliteration rules are
> >>needed.
> >
> >I was thinking, since using \lccode for hyphenation is really a wired
> >choice (I'm sure don has a good reason back then, but such things are
> >usually no longer relevant), and since it is used in a sort of
> >controlled environment (playing with \lccode's for hyphenation is not
> >ever one's toy), may be luatex can break the backward compatibility in
> >the hyphenation area and have a dedicated new code, \hycode or
> >something, only for hyphenation purposes (may be backward compatibility
> >can be kept by using it in addition to \lccode, maybe).
> >
> >What do you think?
> 
> just any letter (catcode letter) would do and the rest is to be
> controlled by the patterns

The issue here is that we want to make some character equivalent to each
other, e.g. ' and ’ which are needed for some languages, without the
need to duplicate the patterns.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 23:06     ` Andrzej Orłowski-Skoczyk
@ 2010-10-30  9:43       ` Philipp Gesang
  0 siblings, 0 replies; 16+ messages in thread
From: Philipp Gesang @ 2010-10-30  9:43 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2136 bytes --]

On 2010-10-30 <01:06:33>, Andrzej Orłowski-Skoczyk wrote:
> On 10/30/2010 12:47 AM, Philipp Gesang wrote:
> > As others already pointed out, with a small number of strings
> > Steffen might get acceptable results by using the patterns of a
> > similar language. Although real transliterations work best with
> > Czech or Slovak, this peculiar transcription might be better off
> > with Polish or even (judging by the use of ‘sh’) standard
> > English.
> 
> I'm afraid Polish will not do (Polish always hyphenates sz-cz, though in
> Russian shch is one character; and such).

Of course, your point is clear. Still I think Polish would be of
more use than Czech in this case because it shares more
similarities withe the transcribed Russian. E.g. Russian and
Polish have ‘ks’ where Czech has ‘x’; both Ru&Pl allow ‘ki’ and
‘gi’ which is illegal in Cz; and Czech lacks a native ‘g’, while
others have kept it. Thus you can hope for more valid hyphenation
points if you use the Polish patterns, don’t you?

> 
> I'm afraid none Slavic language will do unless there is one that uses
> Latin script _and_ soft/hard sign (yer) - these are tricky, not similar
> to anything you meet in Polish/Czech and so on.

None of them are perfect, but most cases don’t require
perfection. Trans[cription|literation] rarely occurs in masses,
so often I just insert the break points by hand and forget about
it.

Regards, Philipp


> -- 
> Andrzej Orłowski-Skoczyk
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 486 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-29 22:15   ` Andrzej Orłowski-Skoczyk
  2010-10-29 22:31     ` Mojca Miklavec
@ 2010-10-30 14:24     ` Steffen Wolfrum
  1 sibling, 0 replies; 16+ messages in thread
From: Steffen Wolfrum @ 2010-10-30 14:24 UTC (permalink / raw)
  To: mailing list for ConTeXt users


Am 30.10.2010 um 00:15 schrieb Andrzej Orłowski-Skoczyk:

> 
> Warning: the transliteration used in Steffen's document is (or at least
> the example is) lossy and as such will likely produce wrong hyphenation
> output no matter the applied method of making TeX hyphenate it.
> 
> The transliteration (in the example) is also inconsistent - if you tried
> to reverse transliterate it to Cyrillic, you would not only miss some
> characters, but you would also get some other characters wrong.
> 




Andrzej,

thanks for your statement! 

Thus I will leave it to the author to draw in the appropriate break points when reading the first proof. After all it's her text.

I think the results were not in proportion to the effort, when we were trying to work on a general solution on the context/luatex side. At least not for this specific project.


My question starting this thread was made under the assumption of a good transliteration ...


Thank you all for your very interesting hints and notes!

Steffen
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-30  9:34       ` Khaled Hosny
@ 2010-10-31 18:12         ` Jano Kula
  2010-10-31 18:47           ` Khaled Hosny
  0 siblings, 1 reply; 16+ messages in thread
From: Jano Kula @ 2010-10-31 18:12 UTC (permalink / raw)
  To: ntg-context

Hi!

On 10/30/2010 11:34 AM, Khaled Hosny wrote:
> On Sat, Oct 30, 2010 at 10:17:11AM +0200, Hans Hagen wrote:
>> On 30-10-2010 12:05, Khaled Hosny wrote:
>>> On Fri, Oct 29, 2010 at 11:25:20PM +0200, Mojca Miklavec wrote:
>>>> By far the easiest and most portable solution would be if you could
>>>> convince Taco to implement something like "latin a is equivalent to
>>>> cyrillic a as far as hyphenation is concerned" (which could also solve
>>>> many other problems that we have). Actually, you can already do that
>>>> by redefining \lccode of latin a to point to cyrillic a (and do that
>>>> for the whole alphabet), but then you need to make sure that you don't
>>>> use any commands for lowercasing/uppercasing words. If you need
>>>> details, I can help you out, but first exact transliteration rules are
>>>> needed.
>>>
>>> I was thinking, since using \lccode for hyphenation is really a wired
>>> choice (I'm sure don has a good reason back then, but such things are
>>> usually no longer relevant), and since it is used in a sort of
>>> controlled environment (playing with \lccode's for hyphenation is not
>>> ever one's toy), may be luatex can break the backward compatibility in
>>> the hyphenation area and have a dedicated new code, \hycode or
>>> something, only for hyphenation purposes (may be backward compatibility
>>> can be kept by using it in addition to \lccode, maybe).
>>>
>>> What do you think?
>>
>> just any letter (catcode letter) would do and the rest is to be
>> controlled by the patterns
>
> The issue here is that we want to make some character equivalent to each
> other, e.g. ' and ’ which are needed for some languages, without the
> need to duplicate the patterns.

Before jumping too deep to the subject, consider if it really worth an 
effort. There is not much more then, titles written in the 
transliterated text. No continuous reading.

My experience says, whatever language is the original title, reader 
usually expects hyphenation similar to the language of the main text. 
Whenever I've used English patterns in English titles (even citations), 
they where changed by the Czech proofreader -- though they were 
perfectly correct in English -- to resemble Czech patterns. I'm not 
saying it is the right approach, but from the readers' and proofreaders' 
point of view if he reads in Czech and doesn't now English patterns or 
even English, patterns different from Czech are disturbing.

Jano

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: transliteration russian
  2010-10-31 18:12         ` Jano Kula
@ 2010-10-31 18:47           ` Khaled Hosny
  0 siblings, 0 replies; 16+ messages in thread
From: Khaled Hosny @ 2010-10-31 18:47 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Sun, Oct 31, 2010 at 07:12:20PM +0100, Jano Kula wrote:
> Hi!
> 
> On 10/30/2010 11:34 AM, Khaled Hosny wrote:
> >On Sat, Oct 30, 2010 at 10:17:11AM +0200, Hans Hagen wrote:
> >>On 30-10-2010 12:05, Khaled Hosny wrote:
> >>>On Fri, Oct 29, 2010 at 11:25:20PM +0200, Mojca Miklavec wrote:
> >>>>By far the easiest and most portable solution would be if you could
> >>>>convince Taco to implement something like "latin a is equivalent to
> >>>>cyrillic a as far as hyphenation is concerned" (which could also solve
> >>>>many other problems that we have). Actually, you can already do that
> >>>>by redefining \lccode of latin a to point to cyrillic a (and do that
> >>>>for the whole alphabet), but then you need to make sure that you don't
> >>>>use any commands for lowercasing/uppercasing words. If you need
> >>>>details, I can help you out, but first exact transliteration rules are
> >>>>needed.
> >>>
> >>>I was thinking, since using \lccode for hyphenation is really a wired
> >>>choice (I'm sure don has a good reason back then, but such things are
> >>>usually no longer relevant), and since it is used in a sort of
> >>>controlled environment (playing with \lccode's for hyphenation is not
> >>>ever one's toy), may be luatex can break the backward compatibility in
> >>>the hyphenation area and have a dedicated new code, \hycode or
> >>>something, only for hyphenation purposes (may be backward compatibility
> >>>can be kept by using it in addition to \lccode, maybe).
> >>>
> >>>What do you think?
> >>
> >>just any letter (catcode letter) would do and the rest is to be
> >>controlled by the patterns
> >
> >The issue here is that we want to make some character equivalent to each
> >other, e.g. ' and ’ which are needed for some languages, without the
> >need to duplicate the patterns.
> 
> Before jumping too deep to the subject, consider if it really worth
> an effort. There is not much more then, titles written in the
> transliterated text. No continuous reading.

It not about the problem in this thread specifically, but rather another
issue that were brought recently in xetex mailing list; basically if one
is using the curly apostrophe (’) all hyphenation patterns depends on the
ASCII one (') will not be taken into account.

Regards,
 Khaled

-- 
 Khaled Hosny
 Arabic localiser and member of Arabeyes.org team
 Free font developer
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-10-31 18:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-29 11:18 transliteration russian Steffen Wolfrum
2010-10-29 11:58 ` Thomas A. Schmitz
2010-10-29 13:44   ` Jano Kula
2010-10-29 21:25 ` Mojca Miklavec
2010-10-29 22:05   ` Khaled Hosny
2010-10-30  8:17     ` Hans Hagen
2010-10-30  8:34       ` Taco Hoekwater
2010-10-30  9:34       ` Khaled Hosny
2010-10-31 18:12         ` Jano Kula
2010-10-31 18:47           ` Khaled Hosny
2010-10-29 22:15   ` Andrzej Orłowski-Skoczyk
2010-10-29 22:31     ` Mojca Miklavec
2010-10-30 14:24     ` Steffen Wolfrum
2010-10-29 22:47   ` Philipp Gesang
2010-10-29 23:06     ` Andrzej Orłowski-Skoczyk
2010-10-30  9:43       ` Philipp Gesang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).