ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Hyphenation patterns
@ 2020-10-08 15:41 Denis Maier
  2020-10-08 16:20 ` Tomas Hala
  2020-10-08 17:05 ` Henning Hraban Ramm
  0 siblings, 2 replies; 18+ messages in thread
From: Denis Maier @ 2020-10-08 15:41 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi,

where can I find the hyphenation patterns used by ConTeXt? I have two 
wrongly hyphenated words, and I want to check whether this is due to 
incorrect patterns. (I tried the source browser... not much luck so 
far.) The words are:
1. applicable => hyphenated as applic-able
2. obligated => hyphenated as oblig-ated

I know I can use \hyphenation to correct that, but I wanted to check the 
patterns nevertheless.

Best,
Denis
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-08 15:41 Hyphenation patterns Denis Maier
@ 2020-10-08 16:20 ` Tomas Hala
  2020-10-08 17:05 ` Henning Hraban Ramm
  1 sibling, 0 replies; 18+ messages in thread
From: Tomas Hala @ 2020-10-08 16:20 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi,

you can find patterns on this directory:

texlive/2020/texmf-dist/tex/context/patterns/mkiv/

Best wishes,

Tomáš 

Thu, Oct 08, 2020 ve 05:41:09PM +0200 Denis Maier napsal(a):
# Hi,
# 
# where can I find the hyphenation patterns used by ConTeXt? I have
# two wrongly hyphenated words, and I want to check whether this is
# due to incorrect patterns. (I tried the source browser... not much
# luck so far.) The words are:
# 1. applicable => hyphenated as applic-able
# 2. obligated => hyphenated as oblig-ated
# 
# I know I can use \hyphenation to correct that, but I wanted to check
# the patterns nevertheless.
# 
# Best,
# Denis
# ___________________________________________________________________________________
# If your question is of interest to others as well, please add an entry to the Wiki!
# 
# maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
# webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
# archive  : https://bitbucket.org/phg/context-mirror/commits/
# wiki     : http://contextgarden.net
# ___________________________________________________________________________________

                                         Tomáš Hála
--------------------------------------------------------------------
Mendelova univerzita, Provozně ekonomická fakulta, ústav informatiky
Zemědělská 1, CZ-613 00 Brno,  tel. +420 545 13 22 28
--------------------------------------------------------------------
http://akela.mendelu.cz/~thala
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-08 15:41 Hyphenation patterns Denis Maier
  2020-10-08 16:20 ` Tomas Hala
@ 2020-10-08 17:05 ` Henning Hraban Ramm
  2020-10-09  6:52   ` Denis Maier
  2020-10-09  8:54   ` Hans Hagen
  1 sibling, 2 replies; 18+ messages in thread
From: Henning Hraban Ramm @ 2020-10-08 17:05 UTC (permalink / raw)
  To: mailing list for ConTeXt users


> Am 08.10.2020 um 17:41 schrieb Denis Maier <denismaier@mailbox.org>:
> 
> where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are:
> 1. applicable => hyphenated as applic-able
> 2. obligated => hyphenated as oblig-ated
> 
> I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless.

I guess it’s just a valid option.
You can check possible hyphenations like this:

\starttext

{EN: \en\hyphenatedcoloredword{applicable}}

{DE: \de\hyphenatedcoloredword{applicable}}

\stoptext


Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-08 17:05 ` Henning Hraban Ramm
@ 2020-10-09  6:52   ` Denis Maier
  2020-10-09  6:57     ` Taco Hoekwater
  2020-10-09  8:15     ` Henning Hraban Ramm
  2020-10-09  8:54   ` Hans Hagen
  1 sibling, 2 replies; 18+ messages in thread
From: Denis Maier @ 2020-10-09  6:52 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Henning Hraban Ramm


[-- Attachment #1.1: Type: text/plain, Size: 616 bytes --]

Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
> \starttext
>
> {EN: \en\hyphenatedcoloredword{applicable}}
>
> {DE: \de\hyphenatedcoloredword{applicable}}
>
> \stoptext
Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble"
According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".

{EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
According to Meriam-Webster it should be "ob·​li·​gate".

I've had a look at the files mentioned by Tomáš, but as these are not 
just wordlists I can not really tell what is happening.

So, is that a bug?

Best,
Denis

[-- Attachment #1.2: Type: text/html, Size: 1312 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-09  6:52   ` Denis Maier
@ 2020-10-09  6:57     ` Taco Hoekwater
  2020-10-09  7:01       ` Denis Maier
  2020-10-09  8:15     ` Henning Hraban Ramm
  1 sibling, 1 reply; 18+ messages in thread
From: Taco Hoekwater @ 2020-10-09  6:57 UTC (permalink / raw)
  To: mailing list for ConTeXt users



> On 9 Oct 2020, at 08:52, Denis Maier <denismaier@mailbox.org> wrote:
> 
> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
>> \starttext
>> 
>> {EN: \en\hyphenatedcoloredword{applicable}}
>> 
>> {DE: \de\hyphenatedcoloredword{applicable}}
>> 
>> \stoptext
>> 
> Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble"
> According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".
> 
> {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
> According to Meriam-Webster it should be "ob·​li·​gate".
> 
> I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening.
> 
> So, is that a bug? 

Not really. hyphenation patterns are a bit like applying JPEG compression to 
a dictionary. It makes the data size smaller by recognising patterns while
ignoring outliers. 

Occasional errors are to be expected, which is why \hyphenation exists.

Best wishes,
Taco

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-09  6:57     ` Taco Hoekwater
@ 2020-10-09  7:01       ` Denis Maier
  2020-10-09 12:48         ` Hans Hagen
  0 siblings, 1 reply; 18+ messages in thread
From: Denis Maier @ 2020-10-09  7:01 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Taco Hoekwater

Am 09.10.2020 um 08:57 schrieb Taco Hoekwater:
>
>> On 9 Oct 2020, at 08:52, Denis Maier <denismaier@mailbox.org> wrote:
>>
>> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
>>> \starttext
>>>
>>> {EN: \en\hyphenatedcoloredword{applicable}}
>>>
>>> {DE: \de\hyphenatedcoloredword{applicable}}
>>>
>>> \stoptext
>>>
>> Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble"
>> According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".
>>
>> {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
>> According to Meriam-Webster it should be "ob·​li·​gate".
>>
>> I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening.
>>
>> So, is that a bug?
> Not really. hyphenation patterns are a bit like applying JPEG compression to
> a dictionary. It makes the data size smaller by recognising patterns while
> ignoring outliers.
>
> Occasional errors are to be expected, which is why \hyphenation exists.
>
>
I see. I've noticed lang-us.lua has a list of exceptions in it:
  ["exceptions"]={
   ["characters"]="abcdefghijlmnoprstuyz",
   ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory 
phil-an-thropic present presents project projects reci-procity 
re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble",
   ["length"]=168,
   ["n"]=14,
  },

Would it be possible to add more exceptions to that list as they come 
up? Or is that inappropriate?

Denis
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-09  6:52   ` Denis Maier
  2020-10-09  6:57     ` Taco Hoekwater
@ 2020-10-09  8:15     ` Henning Hraban Ramm
  2020-10-09  8:59       ` Hans Hagen
  2021-04-09 21:57       ` Arthur Rosendahl
  1 sibling, 2 replies; 18+ messages in thread
From: Henning Hraban Ramm @ 2020-10-09  8:15 UTC (permalink / raw)
  To: mailing list for ConTeXt users



> Am 09.10.2020 um 08:52 schrieb Denis Maier <denismaier@mailbox.org>:
> 
> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
>> \starttext
>> 
>> {EN: \en\hyphenatedcoloredword{applicable}}
>> 
>> {DE: \de\hyphenatedcoloredword{applicable}}
>> 
>> \stoptext
>> 
> Wow, that's super helpful.

BTW \hyphenatedword works the same. I didn’t see anything colored.
There are some more commands like this, even \hyphenatedfile, see
https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=hyphenated

Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules.

Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-08 17:05 ` Henning Hraban Ramm
  2020-10-09  6:52   ` Denis Maier
@ 2020-10-09  8:54   ` Hans Hagen
  1 sibling, 0 replies; 18+ messages in thread
From: Hans Hagen @ 2020-10-09  8:54 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Henning Hraban Ramm

On 10/8/2020 7:05 PM, Henning Hraban Ramm wrote:
> 
>> Am 08.10.2020 um 17:41 schrieb Denis Maier <denismaier@mailbox.org>:
>>
>> where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are:
>> 1. applicable => hyphenated as applic-able
>> 2. obligated => hyphenated as oblig-ated
>>
>> I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless.
> 
> I guess it’s just a valid option.
> You can check possible hyphenations like this:
> 
> \starttext
> 
> {EN: \en\hyphenatedcoloredword{applicable}}
> 
> {DE: \de\hyphenatedcoloredword{applicable}}
> 
> \stoptext
americans and brits hyphnetate differently

\starttext
     {\language[usenglish]  {\tt US \number\normallanguage}: 
\hyphenatedcoloredword{applicable}}\par
     {\language[ukenglish]  {\tt UK \number\normallanguage}: 
\hyphenatedcoloredword{applicable}}\par
\stoptext

syllable vs stem (but I bet Arthur can explain better)

hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-09  8:15     ` Henning Hraban Ramm
@ 2020-10-09  8:59       ` Hans Hagen
  2021-04-09 21:57       ` Arthur Rosendahl
  1 sibling, 0 replies; 18+ messages in thread
From: Hans Hagen @ 2020-10-09  8:59 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Henning Hraban Ramm

On 10/9/2020 10:15 AM, Henning Hraban Ramm wrote:
> 
> 
>> Am 09.10.2020 um 08:52 schrieb Denis Maier <denismaier@mailbox.org>:
>>
>> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
>>> \starttext
>>>
>>> {EN: \en\hyphenatedcoloredword{applicable}}
>>>
>>> {DE: \de\hyphenatedcoloredword{applicable}}
>>>
>>> \stoptext
>>>
>> Wow, that's super helpful.
> 
> BTW \hyphenatedword works the same. I didn’t see anything colored.
> There are some more commands like this, even \hyphenatedfile, see
> https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=hyphenated
> 
> Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules.
ancient secret features:

 >mtxrun --script patterns --hyphenate applicable --language=gb
hyphenator      |
hyphenator      | . a p p l i c a b l e .   . a p p l i c a b l e .
hyphenator      |  2a0p0                     2 0 0 0 0 0 0 0 0 0 0
hyphenator      |    4p1p2                   2 4 1 2 0 0 0 0 0 0 0
hyphenator      |      0p2l2                 2 4 1 2 2 0 0 0 0 0 0
hyphenator      |              1a0b0         2 4 1 2 2 0 1 0 0 0 0
hyphenator      |                2b0l2       2 4 1 2 2 0 1 2 0 2 0
hyphenator      |                  4l0e0.0   2 4 1 2 2 0 1 2 4 2 0
hyphenator      | .2a4p1p2l2i0c1a2b4l2e0.   . a p-p l i c-a b l e .
hyphenator      |
mtx-patterns    | gb 3 3 : applicable : applic-able

 >mtxrun --script patterns --hyphenate applicable --language=us
hyphenator      |
hyphenator      | . a p p l i c a b l e .   . a p p l i c a b l e .
hyphenator      |    4p1p0                   0 4 1 0 0 0 0 0 0 0 0
hyphenator      |      1p2l2                 0 4 1 2 2 0 0 0 0 0 0
hyphenator      |      0p0l0i2c1a0b0         0 4 1 2 2 2 1 0 0 0 0
hyphenator      |            1c0a0           0 4 1 2 2 2 1 0 0 0 0
hyphenator      |            0c0a1b0l0       0 4 1 2 2 2 1 1 0 0 0
hyphenator      |                0b2l2       0 4 1 2 2 2 1 1 2 2 0
hyphenator      |                0b4l0e0.0   0 4 1 2 2 2 1 1 4 2 0
hyphenator      | .0a4p1p2l2i2c1a1b4l2e0.   . a p-p l i c-a-b l e .
hyphenator      |
mtx-patterns    | us 3 3 : applicable : applic-a-ble

not the kind of stuff one wants to expose a new user to

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-09  7:01       ` Denis Maier
@ 2020-10-09 12:48         ` Hans Hagen
  2020-10-09 12:59           ` Denis Maier
  0 siblings, 1 reply; 18+ messages in thread
From: Hans Hagen @ 2020-10-09 12:48 UTC (permalink / raw)
  To: ntg-context

On 10/9/2020 9:01 AM, Denis Maier wrote:
> Am 09.10.2020 um 08:57 schrieb Taco Hoekwater:
>>
>>> On 9 Oct 2020, at 08:52, Denis Maier <denismaier@mailbox.org> wrote:
>>>
>>> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm:
>>>> \starttext
>>>>
>>>> {EN: \en\hyphenatedcoloredword{applicable}}
>>>>
>>>> {DE: \de\hyphenatedcoloredword{applicable}}
>>>>
>>>> \stoptext
>>>>
>>> Wow, that's super helpful. The English pattern seems to be 
>>> "ap-plic-a-ble"
>>> According to Meriam-Webster it should just be "ap·​pli·​ca·​ble".
>>>
>>> {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate"
>>> According to Meriam-Webster it should be "ob·​li·​gate".
>>>
>>> I've had a look at the files mentioned by Tomáš, but as these are not 
>>> just wordlists I can not really tell what is happening.
>>>
>>> So, is that a bug?
>> Not really. hyphenation patterns are a bit like applying JPEG 
>> compression to
>> a dictionary. It makes the data size smaller by recognising patterns 
>> while
>> ignoring outliers.
>>
>> Occasional errors are to be expected, which is why \hyphenation exists.
>>
>>
> I see. I've noticed lang-us.lua has a list of exceptions in it:
>   ["exceptions"]={
>    ["characters"]="abcdefghijlmnoprstuyz",
>    ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory 
> phil-an-thropic present presents project projects reci-procity 
> re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble",
>    ["length"]=168,
>    ["n"]=14,
>   },
> 
> Would it be possible to add more exceptions to that list as they come 
> up? Or is that inappropriate?
you can add your own runtime in a style:

\hyphenation {fo-ob-ar} \hsize 1mm foobar

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-09 12:48         ` Hans Hagen
@ 2020-10-09 12:59           ` Denis Maier
  0 siblings, 0 replies; 18+ messages in thread
From: Denis Maier @ 2020-10-09 12:59 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Hans Hagen

Am 09.10.2020 um 14:48 schrieb Hans Hagen:
> On 10/9/2020 9:01 AM, Denis Maier wrote:
>> [...]
>> I see. I've noticed lang-us.lua has a list of exceptions in it:
>>   ["exceptions"]={
>>    ["characters"]="abcdefghijlmnoprstuyz",
>>    ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory 
>> phil-an-thropic present presents project projects reci-procity 
>> re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble",
>>    ["length"]=168,
>>    ["n"]=14,
>>   },
>>
>> Would it be possible to add more exceptions to that list as they come 
>> up? Or is that inappropriate?
> you can add your own runtime in a style:
>
> \hyphenation {fo-ob-ar} \hsize 1mm foobar

Sure. I use \startexceptions[en] for that. I just thought everyone might 
benefit...

Denis
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Hyphenation patterns
  2020-10-09  8:15     ` Henning Hraban Ramm
  2020-10-09  8:59       ` Hans Hagen
@ 2021-04-09 21:57       ` Arthur Rosendahl
  1 sibling, 0 replies; 18+ messages in thread
From: Arthur Rosendahl @ 2021-04-09 21:57 UTC (permalink / raw)
  To: Mailing list for ConTeXt users

  Denis’ latest question reminded me of an earlier query he had about
hyphenation, asking why “applicable” and “obligated” were hyphenated by
ConTeXt as ap-plic-a-ble and ob-lig-at-ed, and not ap-pli-ca-ble and
ob-li-ga-te(d) like in Merriam-Webster (the discussion started at
https://mailman.ntg.nl/pipermail/ntg-context/2020/099695.html).

  First of all, I note that while Webster’s dictionary is a useful
guide, and indeed a major reference for any American typographer,
there’s no absolute rule that we have to follow it either.  The break
applic-able, for example, does look acceptable to me; oblig-ated, less
so.

  Taco reminded that when producing a set of hyphenation patterns from a
list of hyphenated words, we’re essentially compressing information, and
that some minor deviations are to be expected.  However, in my
experience, unexpected breakpoints are almost never due to chance, but
to a deliberate decision.

  Then Hraban said that:

On Fri, Oct 09, 2020 at 10:15:17AM +0200, Henning Hraban Ramm wrote:
> Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules.

  I see that my self-appointed title is catching on, nice :-)
Unfortunately the patterns are just as likely to contain errors as
anything else, and in this particular case we’ll probably never know for
sure, because the original hyphenated word list was never published (all
the word lists from which patterns were produced in the 80s and 90s have
been lost, for all languages).  We’re thus reduced to guessing the
intent of those who compiled the lists.

  We can get hints from looking at the patterns involved in the
debatable breaks.  Hans has a useful script:

	$ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate applicable
	hyphenator      |
	hyphenator      | . a p p l i c a b l e .   . a p p l i c a b l e .  
	hyphenator      |    4p1p0                   0 4 1 0 0 0 0 0 0 0 0  
	hyphenator      |      1p2l2                 0 4 1 2 2 0 0 0 0 0 0  
	hyphenator      |      0p0l0i2c1a0b0         0 4 1 2 2 2 1 0 0 0 0  
	hyphenator      |            1c0a0           0 4 1 2 2 2 1 0 0 0 0  
	hyphenator      |            0c0a1b0l0       0 4 1 2 2 2 1 1 0 0 0  
	hyphenator      |                0b2l2       0 4 1 2 2 2 1 1 2 2 0  
	hyphenator      |                0b4l0e0.0   0 4 1 2 2 2 1 1 4 2 0  
	hyphenator      | .0a4p1p2l2i2c1a1b4l2e0.   . a p-p l i c-a-b l e .  
	hyphenator      |
	mtx-patterns    | us 2 2 : applicable : ap-plic-a-ble

  That tells us that there are seven patterns involved in hyphenating
the word applicable: 4p1, 1p2l2, pli2c1ab, 1ca, ca1bl, b2l2, and b4le.
(the final dot is part of that last pattern).  The pattern responsible
for the break applic-able is pli2c1ab.  If we now refer to the source
repository for hyphenation patterns (since comments are stripped in the
ConTeXt sources): https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-en-us.tex
-- we can see line 4508

	hyphen.tex patterns end here, and additional patterns begin:

which means that the pattern pli2c1ab, line 4817, is an “additional
pattern”.  The background story is that hyphen.tex, the original
hyphenation pattern file for American English, produced in 1982-1983
from a list of hyphenated words (following mostly Webster’s), was later
augmented with more patterns that were supposed to improve hyphenation
for many words.  The person who added these new patterns apparently had
a list of words hyphenated incorrectly (according to him) by hyphen.tex,
but both that list and the one used to produce hyphen.tex are as
mentioned above now lost, probably forever.

  In any case, the pattern that causes the break applic-able was clearly
added intentionally; and as I said that break seems quite reasonable to
me.  Not so for the one in oblig-ated, so let’s have a look at that:

	$ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate obligated
	hyphenator      |
	hyphenator      | . o b l i g a t e d .   . o b l i g a t e d .  
	hyphenator      |  0o0b0l0i2g1             0 0 0 0 2 1 0 0 0 0  
	hyphenator      |    0b2l2                 0 0 2 2 2 1 0 0 0 0  
	hyphenator      |      5l0i0g0a0t0e0       0 0 5 2 2 1 0 0 0 0  
	hyphenator      |        2i0g0             0 0 5 2 2 1 0 0 0 0  
	hyphenator      |          1g0a0           0 0 5 2 2 1 0 0 0 0  
	hyphenator      |              2t1e0d0     0 0 5 2 2 1 2 1 0 0  
	hyphenator      | .0o0b5l2i2g1a2t1e0d0.   . o b-l i g-a t-e d .  
	hyphenator      |
	mtx-patterns    | us 2 2 : obligated : ob-lig-at-ed

  Here we see that the dubious break is caused by the pattern obli2g1,
also an “additional pattern” (line 4783), and here it’s not hard to
guess where it comes from: it has to be for the word obligatory,
hyphenated regularly as o-blig-a-to-ry according to M-W -- and myself ;-)
The incorrect breakpoint in obli-gated is an undesired side effect of
that.

	Best,

		ArthuR
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: hyphenation patterns
  2010-05-24  0:16     ` Mojca Miklavec
  2010-05-24  8:17       ` Hans Hagen
@ 2010-05-24 18:52       ` rogutes
  1 sibling, 0 replies; 18+ messages in thread
From: rogutes @ 2010-05-24 18:52 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: claudio.beccari, Mojca Miklavec

Mojca Miklavec (2010-05-24 02:16):
> Dear Claudio,
> 
> Thanks a lot for your prompt reply.
> 
> On Mon, May 24, 2010 at 00:39, Claudio Beccari wrote:
> > Dear Mojca,
> > no proper Italian word ends in ch (this digraph in normal Italian words is
> > pronunced as k, not as č or ć).
> > Nevertheless there are a number of surnames dating back to the old times
> > (150 years ago) when North East Italy was under Austro-Hungarian ruling,
> > when Istrian names, mainly Croatian and Slovenian, where transliterated in
> > such a way that the tipical patronimic ending  -ič or -ić (I don't know the
> > exact spelling in Latin letters of the Croatian/Slovenian names) was
> > transliterated for the Empire bureaucracy with -ich.
> 
> Thanks a lot for some more insight. I admit that I didn't know the
> details (I should be ashamed) and in my area they were more radical
> with surname changes (mine was Michelazzi and I think that most
> surnames here were "properly Romanized", for example Filipčič ->
> Filippi, so again no problems with hyphenation :) :) :).
> 
> > This spelling remained
> > when North East Italy and Istria were annexed to the Kingdom of Italy at the
> > end of WW1. After WW2 most of Istria returned mainly to Croatia and a small
> > part to Slovenia, but the Slovenians and Croatians that had moved the NE
> > Italy and had become Italian citizens maintained their surnames with the
> > Austro-Hungarian spelling.
> >
> > When I prepared the hyphen patterns for Italian ad Latin I did think to
> > this particular spelling, but I concluded that it was not so important; I
> > was wrong, and I apologize.
> 
> There's no need to apologize. First, there's an "infinite" number of
> foreign names, so that one simply cannot get all of them right. I
> guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na
> is ok), but in my opinion it's a valid argument that one should change
> the language when writing foreign names if they are to be hyphenated
> properly. I can also easily imagine Slovenian patterns that would
> hyphenate:
>     Fis-cher, Aac-hen, Go-ethe
> when not knowing that those letters represent a single "letter"/sound
> in foreign words.
> 
> Second, I have no idea, but I think it was a pure coincidence that the
> "problem" reported by Rogutės Sparnuotos is the same as that for
> surnames of a group of people on North-East (I think that the name in
> question comes from Russia with translitaration done by English). On
> the other hand if it's just a tiny pattern that solves them all ...

Thank you Mojca and Claudio for your replies.

Mojca has guessed correctly: I merely noticed that the surname Manovich is
hyphenated wrongly in the three languages I've tested. And I don't mind
using \hyphenation{} or switching language for foreign names.

I don't know how hyphenation patterns are made, so I was surprised to see
the main rule of at least Latin/Italian/Lithuanian hyphenation broken (a
syllable must contain a vowel). From your explanations it seems that
hyphenation patterns are kind of case-by-case rules, so this problem is
not suprising, since no common words end with '-ch' in these languages.

Wonder if I'll find a maintainer of the Lithuanian patterns...

-- 
--  Rogutės Sparnuotos
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: hyphenation patterns
  2010-05-23 21:38 ` Mojca Miklavec
       [not found]   ` <4BF9AE8A.6040405@gmail.com>
@ 2010-05-24 14:50   ` luigi scarso
  1 sibling, 0 replies; 18+ messages in thread
From: luigi scarso @ 2010-05-24 14:50 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Sun, May 23, 2010 at 11:38 PM, Mojca Miklavec
<mojca.miklavec.lists@gmail.com> wrote:
> hyphenate properly in Italian. Italian is a
> what-you-see-is-what-you-pronounce language (in contrast to English)
Apart some traps like

glicine vs tagliare
where syllable 'gli' is spelled in completely different way

or
anno (year) vs hanno (have in "they have")
where the sound is the same

or àncora (anchor) vs ancóra (again)
and we usually write ancora vs ancora (yes, no difference: only the
sound is different)


or péro (pear tree) vs però (but)

and so on.


-- 
luigi
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: hyphenation patterns
  2010-05-24  0:16     ` Mojca Miklavec
@ 2010-05-24  8:17       ` Hans Hagen
  2010-05-24 18:52       ` rogutes
  1 sibling, 0 replies; 18+ messages in thread
From: Hans Hagen @ 2010-05-24  8:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: claudio.beccari, Mojca Miklavec

On 24-5-2010 2:16, Mojca Miklavec wrote:

> There's no need to apologize. First, there's an "infinite" number of
> foreign names, so that one simply cannot get all of them right. I
> guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na

why not just use hyphenmin values of 3 to prevent such cases

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: hyphenation patterns
       [not found]   ` <4BF9AE8A.6040405@gmail.com>
@ 2010-05-24  0:16     ` Mojca Miklavec
  2010-05-24  8:17       ` Hans Hagen
  2010-05-24 18:52       ` rogutes
  0 siblings, 2 replies; 18+ messages in thread
From: Mojca Miklavec @ 2010-05-24  0:16 UTC (permalink / raw)
  To: claudio.beccari; +Cc: mailing list for ConTeXt users

Dear Claudio,

Thanks a lot for your prompt reply.

On Mon, May 24, 2010 at 00:39, Claudio Beccari wrote:
> Dear Mojca,
> no proper Italian word ends in ch (this digraph in normal Italian words is
> pronunced as k, not as č or ć).
> Nevertheless there are a number of surnames dating back to the old times
> (150 years ago) when North East Italy was under Austro-Hungarian ruling,
> when Istrian names, mainly Croatian and Slovenian, where transliterated in
> such a way that the tipical patronimic ending  -ič or -ić (I don't know the
> exact spelling in Latin letters of the Croatian/Slovenian names) was
> transliterated for the Empire bureaucracy with -ich.

Thanks a lot for some more insight. I admit that I didn't know the
details (I should be ashamed) and in my area they were more radical
with surname changes (mine was Michelazzi and I think that most
surnames here were "properly Romanized", for example Filipčič ->
Filippi, so again no problems with hyphenation :) :) :).

> This spelling remained
> when North East Italy and Istria were annexed to the Kingdom of Italy at the
> end of WW1. After WW2 most of Istria returned mainly to Croatia and a small
> part to Slovenia, but the Slovenians and Croatians that had moved the NE
> Italy and had become Italian citizens maintained their surnames with the
> Austro-Hungarian spelling.
>
> When I prepared the hyphen patterns for Italian ad Latin I did think to
> this particular spelling, but I concluded that it was not so important; I
> was wrong, and I apologize.

There's no need to apologize. First, there's an "infinite" number of
foreign names, so that one simply cannot get all of them right. I
guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na
is ok), but in my opinion it's a valid argument that one should change
the language when writing foreign names if they are to be hyphenated
properly. I can also easily imagine Slovenian patterns that would
hyphenate:
    Fis-cher, Aac-hen, Go-ethe
when not knowing that those letters represent a single "letter"/sound
in foreign words.

Second, I have no idea, but I think it was a pure coincidence that the
"problem" reported by Rogutės Sparnuotos is the same as that for
surnames of a group of people on North-East (I think that the name in
question comes from Russia with translitaration done by English). On
the other hand if it's just a tiny pattern that solves them all ...

> I will submit, at least for Italian, a revised
> pattern file. I doubt I should do it also for Latin, although it does not
> cost anything...

In case you do submit any updates, I would be extremely grateful for
submitting an update to
   http://www.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/hyph-it.tex
instead of (or at least in addition to) the original file (you may
remove the initial comments).

Also, if you happen to have the original of
   http://www.tug.org/TUGboat/Articles/tb13-1/tb34becc.pdf
it would be nice to include it into repository as documentation about
Italian hyphenation (but that's all too off-topic for the ConTeXt
mailing list).

Thanks again,
    Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* hyphenation patterns
@ 2010-05-23 23:22 Rogutės Sparnuotos
  2010-05-23 21:38 ` Mojca Miklavec
  0 siblings, 1 reply; 18+ messages in thread
From: Rogutės Sparnuotos @ 2010-05-23 23:22 UTC (permalink / raw)
  To: ntg-context

Is there anyone here who understands hyphenation patterns?

Such a document:

\setuplayout[textwidth=0.2cm]
\starttext
\language[la] Manovich.
\stoptext

hyphenates 'Manovich' into Ma-no-vi-ch, while it should be Ma-no-vich. The
same applies for Italian and Lithuanian languages (in LaTeX as well).

Could there be such an omission in the hyphenation patterns? Or am I
missing something?

Thanks,
--  Rogutės Sparnuotos
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: hyphenation patterns
  2010-05-23 23:22 hyphenation patterns Rogutės Sparnuotos
@ 2010-05-23 21:38 ` Mojca Miklavec
       [not found]   ` <4BF9AE8A.6040405@gmail.com>
  2010-05-24 14:50   ` luigi scarso
  0 siblings, 2 replies; 18+ messages in thread
From: Mojca Miklavec @ 2010-05-23 21:38 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Mon, May 24, 2010 at 01:22, Rogutės Sparnuotos wrote:
>
> \setuplayout[textwidth=0.2cm]
> \starttext
> \language[la] Manovich.
> \stoptext
>
> hyphenates 'Manovich' into Ma-no-vi-ch, while it should be Ma-no-vich. The
> same applies for Italian and Lithuanian languages (in LaTeX as well).
>
> Could there be such an omission in the hyphenation patterns? Or am I
> missing something?

Both Italian and Latin have the pattern "1c" meaning "break in front
of any letter c unless another patterns prohibits that". Lithuanian
patterns contain "i1c" which means "break between i and c".

Nothing in ConTeXt can or will be fixed, but here's a short answer
with four options of what you can do:
1. Use \hyphenation{Ma-no-vich} on top of your document
2. Use "Manovič" instead of Manovich (it then hyphenates properly in
Latin at least, I didn't try the others); or "Манович" :)
3. Use \mainlanguage[la] bla bla bla {\language[en] Manovich}
4. Complain to the authors of Italian/Latin/Lithuanian patterns and
ask them for a fix.

Some explanation:
I assume that this is not a native Latin, Italian or Lithuanian word.
If you are talking about the artist name (Lev Manovich) then you are
using English transliteration of Russian word and expect it to
hyphenate properly in Italian. Italian is a
what-you-see-is-what-you-pronounce language (in contrast to English)
and you cannot expect that it will hyphenate properly all the foreign
names that are not even transliterated "properly". An Italian word
would most probably never end with "ch", so there's currently no
pattern present that would prohibit that behaviour. I don't know
Russian enough, but I would blindly guess that the right
transliteration would be Manovič anyway (of course everyone would have
a problem with getting the right accent and with proper pronounciation
then) and German wikipedia somehow confirms that:
Lev Manowitsch (russ. Лев Манович, wiss. Transliteration Lev Manovič;
* 1960 in Moskau)
Note that Germans transliterate the name differently and Italians
could transliterate it in a different way as well. Since Lithuanian
contains the letter "č", I would assume that they would transliterate
the name with č anyway (disclaimer: my knowledge about Lithuanian is
zero, so I'm not even sure how they pronounce that letter). For
example particular - Serbian will never have a problem with
hyphenation of foreign names:
    http://sr.wikipedia.org/sr-el/Алберт_Ајнштајн
    Albert Ajnštajn (nem. Albert Einstein) je bio teorijski fizičar ...

The question is always: how many different foreign names to you want
to hyphenate properly in any given language?

On the other hand, even with Italian pronunciation, I guess that ch is
considered to be a "single consonant" (I may be wrong in that, but
it's not too relevant either), so adding an additional pattern "2ch."
(or "4ch.", not sure which one is needed) cannot hurt.

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-09 21:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-08 15:41 Hyphenation patterns Denis Maier
2020-10-08 16:20 ` Tomas Hala
2020-10-08 17:05 ` Henning Hraban Ramm
2020-10-09  6:52   ` Denis Maier
2020-10-09  6:57     ` Taco Hoekwater
2020-10-09  7:01       ` Denis Maier
2020-10-09 12:48         ` Hans Hagen
2020-10-09 12:59           ` Denis Maier
2020-10-09  8:15     ` Henning Hraban Ramm
2020-10-09  8:59       ` Hans Hagen
2021-04-09 21:57       ` Arthur Rosendahl
2020-10-09  8:54   ` Hans Hagen
  -- strict thread matches above, loose matches on Subject: below --
2010-05-23 23:22 hyphenation patterns Rogutės Sparnuotos
2010-05-23 21:38 ` Mojca Miklavec
     [not found]   ` <4BF9AE8A.6040405@gmail.com>
2010-05-24  0:16     ` Mojca Miklavec
2010-05-24  8:17       ` Hans Hagen
2010-05-24 18:52       ` rogutes
2010-05-24 14:50   ` luigi scarso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).