ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Greek hyphenation patterns
@ 2006-06-29 21:43 Peter Heslin
  2006-06-29 22:06 ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Heslin @ 2006-06-29 21:43 UTC (permalink / raw)



A few weeks ago, I looked at Context, because I wanted utf-8 hyphenation
patterns for ancient Greek, but then I saw that the patterns shipped
with Context have serious bugs.  I had hoped to patch ctxtools, but the
required changes went beyond my knowledge of Ruby.

I recently posted a Perl script to the xetex mailing list that should
perform the conversion to utf-8 correctly.  I would be happy to modify
the script to make the output more useful to Context users, but I don't
use Context myself.  Feedback is welcome.

The essential problem with the patterns shipped with Context is that it
is the result of a simple conversion, but the hyphenation rules in Greek
are based on the definition of vowels and consonants, which changes in
utf-8.  The original 8-bit patterns of Dimitrios Filippou depend on the
fact that in the Babel encoding accents come before the vowel (except
for iota subscript), whereas in Unicode they are either combined with
the vowel or come after it, depending on whether you use precomposed
characters or not.

-- 
Peter Heslin (http://www.dur.ac.uk/p.j.heslin)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Greek hyphenation patterns
  2006-06-29 21:43 Greek hyphenation patterns Peter Heslin
@ 2006-06-29 22:06 ` Hans Hagen
  2006-06-30  7:14   ` Thomas A. Schmitz
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Hagen @ 2006-06-29 22:06 UTC (permalink / raw)


Hi Peter
> I recently posted a Perl script to the xetex mailing list that should
> perform the conversion to utf-8 correctly.  I would be happy to modify
> the script to make the output more useful to Context users, but I don't
> use Context myself.  Feedback is welcome.
>   
i leave that to the ones using greek ... we only need the conversion 
rules; adding them to the relevant section of ctxtools is then no bug deal
> The essential problem with the patterns shipped with Context is that it
> is the result of a simple conversion, but the hyphenation rules in Greek
> are based on the definition of vowels and consonants, which changes in
> utf-8.  The original 8-bit patterns of Dimitrios Filippou depend on the
> fact that in the Babel encoding accents come before the vowel (except
> for iota subscript), whereas in Unicode they are either combined with
> the vowel or come after it, depending on whether you use precomposed
> characters or not.
>
>   
hm, so those original patterns were latex dependent ... even more reason 
to ship patterns with context; of course bugs need to be fixed, (or if i 
uderstand, extended with the additional combinations)

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Greek hyphenation patterns
  2006-06-29 22:06 ` Hans Hagen
@ 2006-06-30  7:14   ` Thomas A. Schmitz
  2006-06-30  7:50     ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas A. Schmitz @ 2006-06-30  7:14 UTC (permalink / raw)
  Cc: pj


On Jun 30, 2006, at 12:06 AM, Hans Hagen wrote:

> Hi Peter
>> I recently posted a Perl script to the xetex mailing list that should
>> perform the conversion to utf-8 correctly.  I would be happy to  
>> modify
>> the script to make the output more useful to Context users, but I  
>> don't
>> use Context myself.  Feedback is welcome.
>>
> i leave that to the ones using greek ... we only need the conversion
> rules; adding them to the relevant section of ctxtools is then no  
> bug deal
>> The essential problem with the patterns shipped with Context is  
>> that it
>> is the result of a simple conversion, but the hyphenation rules in  
>> Greek
>> are based on the definition of vowels and consonants, which  
>> changes in
>> utf-8.  The original 8-bit patterns of Dimitrios Filippou depend  
>> on the
>> fact that in the Babel encoding accents come before the vowel (except
>> for iota subscript), whereas in Unicode they are either combined with
>> the vowel or come after it, depending on whether you use precomposed
>> characters or not.
>>
>>
> hm, so those original patterns were latex dependent ... even more  
> reason
> to ship patterns with context; of course bugs need to be fixed, (or  
> if i
> uderstand, extended with the additional combinations)
>
> Hans
>

Peter, Hans,

thanks for looking into this. I had realized something was fishy with  
the ConTeXt converted patterns, so I'd be extremely grateful if we  
could have a corrected version. Hans: do we need the actual  
conversion rules, or would it be enough if Peter or I included the  
actual patterns into lang-agr.hyp? That may be faster and less work  
for you.

Best

Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Greek hyphenation patterns
  2006-06-30  7:14   ` Thomas A. Schmitz
@ 2006-06-30  7:50     ` Hans Hagen
  2006-06-30 10:51       ` Peter Heslin
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Hagen @ 2006-06-30  7:50 UTC (permalink / raw)
  Cc: pj

Thomas A. Schmitz wrote:
>
> thanks for looking into this. I had realized something was fishy with  
> the ConTeXt converted patterns, so I'd be extremely grateful if we  
> could have a corrected version. Hans: do we need the actual  
> conversion rules, or would it be enough if Peter or I included the  
>   
i prefer the rules, so if you can sort that out with peter
> actual patterns into lang-agr.hyp? That may be faster and less work  
> for you.
>   

since there is no infrastructure for patterns, and since i want to 
independent of anything happening in that area (keep in mind that we've 
been bitten by that too often: renaming, disappearing, funny internals, 
latex specific, limited encodings, etc)

it's easier for me to occasionally run ctxtools on the originals and 
maintain that than to keep track of files

Hans

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Greek hyphenation patterns
  2006-06-30  7:50     ` Hans Hagen
@ 2006-06-30 10:51       ` Peter Heslin
  2006-06-30 17:23         ` Hans Hagen
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Heslin @ 2006-06-30 10:51 UTC (permalink / raw)


Hans Hagen <pragma@wxs.nl> writes:

> i prefer the rules, so if you can sort that out with peter

In that case, you can examine the internals of my Perl script
elhyph-utf8 and translate its logic to Ruby in ctxtools.  But that is a
non-trivial effort, and I cannot do it.  A better alternative may be to
have ctxtools simply call elhyph-utf8 as an external script.  Does
Context still have a dependency on Perl?  If so, it would be much easier
just to call the Perl script.  I would be happy to ensure that
elhyph-utf8 remains format-neutral.

[A footnote: the original patterns are not Latex-specific, as you said,
but are specific to the LGR encoding, which Latex Babel happens to use;
but that Greek encoding is older than Babel, I think, and is also used
elsewhere in the TeX world.]

> since there is no infrastructure for patterns, and since i want to 
> independent of anything happening in that area (keep in mind that we've 
> been bitten by that too often: renaming, disappearing, funny internals, 
> latex specific, limited encodings, etc)

I can appreciate your pain, but I'm sure that you are aware that there
is also a danger in having Context fork its own patterns: that you may
introduce bugs (as happened in this case), or that you may not pick up
on upstream bug-fixes.  Jonathan Kew has suggested that it might be
desirable to have a set of general-purpose utf-8 hyphenation patterns in
the texmf tree, which could be used by various TeX applications.  From
your comments it is clear that, in order for the Context community to
buy into such a scheme, it would be necessary for this collection of
patterns to be managed carefully, by consensus, and in a format-neutral
manner, with good advance communication of any changes.  If this were to
happen, the advantage for Context is that the dangers I mentioned above
could be minimized.  But it is up to you to balance the potential risks
and benefits for Context.

-- 
Peter Heslin (http://www.dur.ac.uk/p.j.heslin)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Greek hyphenation patterns
  2006-06-30 10:51       ` Peter Heslin
@ 2006-06-30 17:23         ` Hans Hagen
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Hagen @ 2006-06-30 17:23 UTC (permalink / raw)


Peter Heslin wrote:
> Hans Hagen <pragma@wxs.nl> writes:
>
>   
>> i prefer the rules, so if you can sort that out with peter
>>     
>
> In that case, you can examine the internals of my Perl script
> elhyph-utf8 and translate its logic to Ruby in ctxtools.  But that is a
> non-trivial effort, and I cannot do it.  A better alternative may be to
> have ctxtools simply call elhyph-utf8 as an external script.  Does
> Context still have a dependency on Perl?  If so, it would be much easier
> just to call the Perl script.  I would be happy to ensure that
> elhyph-utf8 remains format-neutral.
>   
let's first look at the logic, i'm sure that Thomas can extend the 
conversion then (after all, it's logic -)

the dependency on perl is mostly gone and will be completely gone in the 
future
> I can appreciate your pain, but I'm sure that you are aware that there
> is also a danger in having Context fork its own patterns: that you may
> introduce bugs (as happened in this case), or that you may not pick up
> on upstream bug-fixes.  Jonathan Kew has suggested that it might be
>   
sure, but i've been bitten too often; context nowadays comes with a 
truckload of tools and methods, and if we had to adapt to something else 
everytime that latex is ready for it we quickly become improductive; 
keep in mind that in that case we not only had to eal with you, but also 
with another 20 pattern people; now we can just pick up and rearrange 
the bits and pieces; (a similar things happens with fonts, context had 
built in map file support before things like updmap (useless for context 
anyway) came around, so adapting to yet another method was 
counterproductive; so, context has its own  encoding naming scheme -if 
only because the  number of metrics that really ship is not that large-)

[another nice example: context supported lm fonts right from the start, 
and in the end changes in names of map files took place because of other 
packages needs; so, again we are forces to ship our own stuff]
> desirable to have a set of general-purpose utf-8 hyphenation patterns in
> the texmf tree, which could be used by various TeX applications.  From
>   
take alone the names ... every package has different preferences, for 
years i *did* use the (hardly) generic patterns that and each year 
something else was broken; context is used in production environments 
and we need stability in those areas


> your comments it is clear that, in order for the Context community to
> buy into such a scheme, it would be necessary for this collection of
> patterns to be managed carefully, by consensus, and in a format-neutral
>   
sure, that's the ideal world, but 25 years have learned that this is 
near to impossible; actually i tried to start such an effort, starting 
with the names, but i gave up on it simply because i foresaw waste of time

btw, already quite some years ago i published a method for encoding 
neutral patterns, but i never got any response on that, 
http://www.pragma-ade.com/general/manuals/mpattern.pdf, also published 
in tugboat (and i did some presentations about it)
> manner, with good advance communication of any changes.  If this were to
> happen, the advantage for Context is that the dangers I mentioned above
> could be minimized.  But it is up to you to balance the potential risks
> and benefits for Context.
>   
we will gladly use your stuff but quite probably package in the context 
way (maybe ctxtools will simply copy the existing utf ones, repackaged 
in a context way);  btw,  context uses utf patterns also in non utf 
mode, i.e. in pdftex etc

Hans

-- 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-06-30 17:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-29 21:43 Greek hyphenation patterns Peter Heslin
2006-06-29 22:06 ` Hans Hagen
2006-06-30  7:14   ` Thomas A. Schmitz
2006-06-30  7:50     ` Hans Hagen
2006-06-30 10:51       ` Peter Heslin
2006-06-30 17:23         ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).