* Greek hyphenation patterns
@ 2006-06-29 21:43 Peter Heslin
2006-06-29 22:06 ` Hans Hagen
0 siblings, 1 reply; 6+ messages in thread
From: Peter Heslin @ 2006-06-29 21:43 UTC (permalink / raw)
A few weeks ago, I looked at Context, because I wanted utf-8 hyphenation
patterns for ancient Greek, but then I saw that the patterns shipped
with Context have serious bugs. I had hoped to patch ctxtools, but the
required changes went beyond my knowledge of Ruby.
I recently posted a Perl script to the xetex mailing list that should
perform the conversion to utf-8 correctly. I would be happy to modify
the script to make the output more useful to Context users, but I don't
use Context myself. Feedback is welcome.
The essential problem with the patterns shipped with Context is that it
is the result of a simple conversion, but the hyphenation rules in Greek
are based on the definition of vowels and consonants, which changes in
utf-8. The original 8-bit patterns of Dimitrios Filippou depend on the
fact that in the Babel encoding accents come before the vowel (except
for iota subscript), whereas in Unicode they are either combined with
the vowel or come after it, depending on whether you use precomposed
characters or not.
--
Peter Heslin (http://www.dur.ac.uk/p.j.heslin)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Greek hyphenation patterns
2006-06-29 21:43 Greek hyphenation patterns Peter Heslin
@ 2006-06-29 22:06 ` Hans Hagen
2006-06-30 7:14 ` Thomas A. Schmitz
0 siblings, 1 reply; 6+ messages in thread
From: Hans Hagen @ 2006-06-29 22:06 UTC (permalink / raw)
Hi Peter
> I recently posted a Perl script to the xetex mailing list that should
> perform the conversion to utf-8 correctly. I would be happy to modify
> the script to make the output more useful to Context users, but I don't
> use Context myself. Feedback is welcome.
>
i leave that to the ones using greek ... we only need the conversion
rules; adding them to the relevant section of ctxtools is then no bug deal
> The essential problem with the patterns shipped with Context is that it
> is the result of a simple conversion, but the hyphenation rules in Greek
> are based on the definition of vowels and consonants, which changes in
> utf-8. The original 8-bit patterns of Dimitrios Filippou depend on the
> fact that in the Babel encoding accents come before the vowel (except
> for iota subscript), whereas in Unicode they are either combined with
> the vowel or come after it, depending on whether you use precomposed
> characters or not.
>
>
hm, so those original patterns were latex dependent ... even more reason
to ship patterns with context; of course bugs need to be fixed, (or if i
uderstand, extended with the additional combinations)
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Greek hyphenation patterns
2006-06-29 22:06 ` Hans Hagen
@ 2006-06-30 7:14 ` Thomas A. Schmitz
2006-06-30 7:50 ` Hans Hagen
0 siblings, 1 reply; 6+ messages in thread
From: Thomas A. Schmitz @ 2006-06-30 7:14 UTC (permalink / raw)
Cc: pj
On Jun 30, 2006, at 12:06 AM, Hans Hagen wrote:
> Hi Peter
>> I recently posted a Perl script to the xetex mailing list that should
>> perform the conversion to utf-8 correctly. I would be happy to
>> modify
>> the script to make the output more useful to Context users, but I
>> don't
>> use Context myself. Feedback is welcome.
>>
> i leave that to the ones using greek ... we only need the conversion
> rules; adding them to the relevant section of ctxtools is then no
> bug deal
>> The essential problem with the patterns shipped with Context is
>> that it
>> is the result of a simple conversion, but the hyphenation rules in
>> Greek
>> are based on the definition of vowels and consonants, which
>> changes in
>> utf-8. The original 8-bit patterns of Dimitrios Filippou depend
>> on the
>> fact that in the Babel encoding accents come before the vowel (except
>> for iota subscript), whereas in Unicode they are either combined with
>> the vowel or come after it, depending on whether you use precomposed
>> characters or not.
>>
>>
> hm, so those original patterns were latex dependent ... even more
> reason
> to ship patterns with context; of course bugs need to be fixed, (or
> if i
> uderstand, extended with the additional combinations)
>
> Hans
>
Peter, Hans,
thanks for looking into this. I had realized something was fishy with
the ConTeXt converted patterns, so I'd be extremely grateful if we
could have a corrected version. Hans: do we need the actual
conversion rules, or would it be enough if Peter or I included the
actual patterns into lang-agr.hyp? That may be faster and less work
for you.
Best
Thomas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Greek hyphenation patterns
2006-06-30 7:14 ` Thomas A. Schmitz
@ 2006-06-30 7:50 ` Hans Hagen
2006-06-30 10:51 ` Peter Heslin
0 siblings, 1 reply; 6+ messages in thread
From: Hans Hagen @ 2006-06-30 7:50 UTC (permalink / raw)
Cc: pj
Thomas A. Schmitz wrote:
>
> thanks for looking into this. I had realized something was fishy with
> the ConTeXt converted patterns, so I'd be extremely grateful if we
> could have a corrected version. Hans: do we need the actual
> conversion rules, or would it be enough if Peter or I included the
>
i prefer the rules, so if you can sort that out with peter
> actual patterns into lang-agr.hyp? That may be faster and less work
> for you.
>
since there is no infrastructure for patterns, and since i want to
independent of anything happening in that area (keep in mind that we've
been bitten by that too often: renaming, disappearing, funny internals,
latex specific, limited encodings, etc)
it's easier for me to occasionally run ctxtools on the originals and
maintain that than to keep track of files
Hans
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Greek hyphenation patterns
2006-06-30 7:50 ` Hans Hagen
@ 2006-06-30 10:51 ` Peter Heslin
2006-06-30 17:23 ` Hans Hagen
0 siblings, 1 reply; 6+ messages in thread
From: Peter Heslin @ 2006-06-30 10:51 UTC (permalink / raw)
Hans Hagen <pragma@wxs.nl> writes:
> i prefer the rules, so if you can sort that out with peter
In that case, you can examine the internals of my Perl script
elhyph-utf8 and translate its logic to Ruby in ctxtools. But that is a
non-trivial effort, and I cannot do it. A better alternative may be to
have ctxtools simply call elhyph-utf8 as an external script. Does
Context still have a dependency on Perl? If so, it would be much easier
just to call the Perl script. I would be happy to ensure that
elhyph-utf8 remains format-neutral.
[A footnote: the original patterns are not Latex-specific, as you said,
but are specific to the LGR encoding, which Latex Babel happens to use;
but that Greek encoding is older than Babel, I think, and is also used
elsewhere in the TeX world.]
> since there is no infrastructure for patterns, and since i want to
> independent of anything happening in that area (keep in mind that we've
> been bitten by that too often: renaming, disappearing, funny internals,
> latex specific, limited encodings, etc)
I can appreciate your pain, but I'm sure that you are aware that there
is also a danger in having Context fork its own patterns: that you may
introduce bugs (as happened in this case), or that you may not pick up
on upstream bug-fixes. Jonathan Kew has suggested that it might be
desirable to have a set of general-purpose utf-8 hyphenation patterns in
the texmf tree, which could be used by various TeX applications. From
your comments it is clear that, in order for the Context community to
buy into such a scheme, it would be necessary for this collection of
patterns to be managed carefully, by consensus, and in a format-neutral
manner, with good advance communication of any changes. If this were to
happen, the advantage for Context is that the dangers I mentioned above
could be minimized. But it is up to you to balance the potential risks
and benefits for Context.
--
Peter Heslin (http://www.dur.ac.uk/p.j.heslin)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Greek hyphenation patterns
2006-06-30 10:51 ` Peter Heslin
@ 2006-06-30 17:23 ` Hans Hagen
0 siblings, 0 replies; 6+ messages in thread
From: Hans Hagen @ 2006-06-30 17:23 UTC (permalink / raw)
Peter Heslin wrote:
> Hans Hagen <pragma@wxs.nl> writes:
>
>
>> i prefer the rules, so if you can sort that out with peter
>>
>
> In that case, you can examine the internals of my Perl script
> elhyph-utf8 and translate its logic to Ruby in ctxtools. But that is a
> non-trivial effort, and I cannot do it. A better alternative may be to
> have ctxtools simply call elhyph-utf8 as an external script. Does
> Context still have a dependency on Perl? If so, it would be much easier
> just to call the Perl script. I would be happy to ensure that
> elhyph-utf8 remains format-neutral.
>
let's first look at the logic, i'm sure that Thomas can extend the
conversion then (after all, it's logic -)
the dependency on perl is mostly gone and will be completely gone in the
future
> I can appreciate your pain, but I'm sure that you are aware that there
> is also a danger in having Context fork its own patterns: that you may
> introduce bugs (as happened in this case), or that you may not pick up
> on upstream bug-fixes. Jonathan Kew has suggested that it might be
>
sure, but i've been bitten too often; context nowadays comes with a
truckload of tools and methods, and if we had to adapt to something else
everytime that latex is ready for it we quickly become improductive;
keep in mind that in that case we not only had to eal with you, but also
with another 20 pattern people; now we can just pick up and rearrange
the bits and pieces; (a similar things happens with fonts, context had
built in map file support before things like updmap (useless for context
anyway) came around, so adapting to yet another method was
counterproductive; so, context has its own encoding naming scheme -if
only because the number of metrics that really ship is not that large-)
[another nice example: context supported lm fonts right from the start,
and in the end changes in names of map files took place because of other
packages needs; so, again we are forces to ship our own stuff]
> desirable to have a set of general-purpose utf-8 hyphenation patterns in
> the texmf tree, which could be used by various TeX applications. From
>
take alone the names ... every package has different preferences, for
years i *did* use the (hardly) generic patterns that and each year
something else was broken; context is used in production environments
and we need stability in those areas
> your comments it is clear that, in order for the Context community to
> buy into such a scheme, it would be necessary for this collection of
> patterns to be managed carefully, by consensus, and in a format-neutral
>
sure, that's the ideal world, but 25 years have learned that this is
near to impossible; actually i tried to start such an effort, starting
with the names, but i gave up on it simply because i foresaw waste of time
btw, already quite some years ago i published a method for encoding
neutral patterns, but i never got any response on that,
http://www.pragma-ade.com/general/manuals/mpattern.pdf, also published
in tugboat (and i did some presentations about it)
> manner, with good advance communication of any changes. If this were to
> happen, the advantage for Context is that the dangers I mentioned above
> could be minimized. But it is up to you to balance the potential risks
> and benefits for Context.
>
we will gladly use your stuff but quite probably package in the context
way (maybe ctxtools will simply copy the existing utf ones, repackaged
in a context way); btw, context uses utf patterns also in non utf
mode, i.e. in pdftex etc
Hans
--
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-06-30 17:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-29 21:43 Greek hyphenation patterns Peter Heslin
2006-06-29 22:06 ` Hans Hagen
2006-06-30 7:14 ` Thomas A. Schmitz
2006-06-30 7:50 ` Hans Hagen
2006-06-30 10:51 ` Peter Heslin
2006-06-30 17:23 ` Hans Hagen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).