* hyphenation patterns @ 2010-05-23 23:22 Rogutės Sparnuotos 2010-05-23 21:38 ` Mojca Miklavec 0 siblings, 1 reply; 18+ messages in thread From: Rogutės Sparnuotos @ 2010-05-23 23:22 UTC (permalink / raw) To: ntg-context Is there anyone here who understands hyphenation patterns? Such a document: \setuplayout[textwidth=0.2cm] \starttext \language[la] Manovich. \stoptext hyphenates 'Manovich' into Ma-no-vi-ch, while it should be Ma-no-vich. The same applies for Italian and Lithuanian languages (in LaTeX as well). Could there be such an omission in the hyphenation patterns? Or am I missing something? Thanks, -- Rogutės Sparnuotos ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hyphenation patterns 2010-05-23 23:22 hyphenation patterns Rogutės Sparnuotos @ 2010-05-23 21:38 ` Mojca Miklavec [not found] ` <4BF9AE8A.6040405@gmail.com> 2010-05-24 14:50 ` luigi scarso 0 siblings, 2 replies; 18+ messages in thread From: Mojca Miklavec @ 2010-05-23 21:38 UTC (permalink / raw) To: mailing list for ConTeXt users On Mon, May 24, 2010 at 01:22, Rogutės Sparnuotos wrote: > > \setuplayout[textwidth=0.2cm] > \starttext > \language[la] Manovich. > \stoptext > > hyphenates 'Manovich' into Ma-no-vi-ch, while it should be Ma-no-vich. The > same applies for Italian and Lithuanian languages (in LaTeX as well). > > Could there be such an omission in the hyphenation patterns? Or am I > missing something? Both Italian and Latin have the pattern "1c" meaning "break in front of any letter c unless another patterns prohibits that". Lithuanian patterns contain "i1c" which means "break between i and c". Nothing in ConTeXt can or will be fixed, but here's a short answer with four options of what you can do: 1. Use \hyphenation{Ma-no-vich} on top of your document 2. Use "Manovič" instead of Manovich (it then hyphenates properly in Latin at least, I didn't try the others); or "Манович" :) 3. Use \mainlanguage[la] bla bla bla {\language[en] Manovich} 4. Complain to the authors of Italian/Latin/Lithuanian patterns and ask them for a fix. Some explanation: I assume that this is not a native Latin, Italian or Lithuanian word. If you are talking about the artist name (Lev Manovich) then you are using English transliteration of Russian word and expect it to hyphenate properly in Italian. Italian is a what-you-see-is-what-you-pronounce language (in contrast to English) and you cannot expect that it will hyphenate properly all the foreign names that are not even transliterated "properly". An Italian word would most probably never end with "ch", so there's currently no pattern present that would prohibit that behaviour. I don't know Russian enough, but I would blindly guess that the right transliteration would be Manovič anyway (of course everyone would have a problem with getting the right accent and with proper pronounciation then) and German wikipedia somehow confirms that: Lev Manowitsch (russ. Лев Манович, wiss. Transliteration Lev Manovič; * 1960 in Moskau) Note that Germans transliterate the name differently and Italians could transliterate it in a different way as well. Since Lithuanian contains the letter "č", I would assume that they would transliterate the name with č anyway (disclaimer: my knowledge about Lithuanian is zero, so I'm not even sure how they pronounce that letter). For example particular - Serbian will never have a problem with hyphenation of foreign names: http://sr.wikipedia.org/sr-el/Алберт_Ајнштајн Albert Ajnštajn (nem. Albert Einstein) je bio teorijski fizičar ... The question is always: how many different foreign names to you want to hyphenate properly in any given language? On the other hand, even with Italian pronunciation, I guess that ch is considered to be a "single consonant" (I may be wrong in that, but it's not too relevant either), so adding an additional pattern "2ch." (or "4ch.", not sure which one is needed) cannot hurt. Mojca ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <4BF9AE8A.6040405@gmail.com>]
* Re: hyphenation patterns [not found] ` <4BF9AE8A.6040405@gmail.com> @ 2010-05-24 0:16 ` Mojca Miklavec 2010-05-24 8:17 ` Hans Hagen 2010-05-24 18:52 ` rogutes 0 siblings, 2 replies; 18+ messages in thread From: Mojca Miklavec @ 2010-05-24 0:16 UTC (permalink / raw) To: claudio.beccari; +Cc: mailing list for ConTeXt users Dear Claudio, Thanks a lot for your prompt reply. On Mon, May 24, 2010 at 00:39, Claudio Beccari wrote: > Dear Mojca, > no proper Italian word ends in ch (this digraph in normal Italian words is > pronunced as k, not as č or ć). > Nevertheless there are a number of surnames dating back to the old times > (150 years ago) when North East Italy was under Austro-Hungarian ruling, > when Istrian names, mainly Croatian and Slovenian, where transliterated in > such a way that the tipical patronimic ending -ič or -ić (I don't know the > exact spelling in Latin letters of the Croatian/Slovenian names) was > transliterated for the Empire bureaucracy with -ich. Thanks a lot for some more insight. I admit that I didn't know the details (I should be ashamed) and in my area they were more radical with surname changes (mine was Michelazzi and I think that most surnames here were "properly Romanized", for example Filipčič -> Filippi, so again no problems with hyphenation :) :) :). > This spelling remained > when North East Italy and Istria were annexed to the Kingdom of Italy at the > end of WW1. After WW2 most of Istria returned mainly to Croatia and a small > part to Slovenia, but the Slovenians and Croatians that had moved the NE > Italy and had become Italian citizens maintained their surnames with the > Austro-Hungarian spelling. > > When I prepared the hyphen patterns for Italian ad Latin I did think to > this particular spelling, but I concluded that it was not so important; I > was wrong, and I apologize. There's no need to apologize. First, there's an "infinite" number of foreign names, so that one simply cannot get all of them right. I guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na is ok), but in my opinion it's a valid argument that one should change the language when writing foreign names if they are to be hyphenated properly. I can also easily imagine Slovenian patterns that would hyphenate: Fis-cher, Aac-hen, Go-ethe when not knowing that those letters represent a single "letter"/sound in foreign words. Second, I have no idea, but I think it was a pure coincidence that the "problem" reported by Rogutės Sparnuotos is the same as that for surnames of a group of people on North-East (I think that the name in question comes from Russia with translitaration done by English). On the other hand if it's just a tiny pattern that solves them all ... > I will submit, at least for Italian, a revised > pattern file. I doubt I should do it also for Latin, although it does not > cost anything... In case you do submit any updates, I would be extremely grateful for submitting an update to http://www.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/hyph-it.tex instead of (or at least in addition to) the original file (you may remove the initial comments). Also, if you happen to have the original of http://www.tug.org/TUGboat/Articles/tb13-1/tb34becc.pdf it would be nice to include it into repository as documentation about Italian hyphenation (but that's all too off-topic for the ConTeXt mailing list). Thanks again, Mojca ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hyphenation patterns 2010-05-24 0:16 ` Mojca Miklavec @ 2010-05-24 8:17 ` Hans Hagen 2010-05-24 18:52 ` rogutes 1 sibling, 0 replies; 18+ messages in thread From: Hans Hagen @ 2010-05-24 8:17 UTC (permalink / raw) To: mailing list for ConTeXt users; +Cc: claudio.beccari, Mojca Miklavec On 24-5-2010 2:16, Mojca Miklavec wrote: > There's no need to apologize. First, there's an "infinite" number of > foreign names, so that one simply cannot get all of them right. I > guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na why not just use hyphenmin values of 3 to prevent such cases ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hyphenation patterns 2010-05-24 0:16 ` Mojca Miklavec 2010-05-24 8:17 ` Hans Hagen @ 2010-05-24 18:52 ` rogutes 1 sibling, 0 replies; 18+ messages in thread From: rogutes @ 2010-05-24 18:52 UTC (permalink / raw) To: mailing list for ConTeXt users; +Cc: claudio.beccari, Mojca Miklavec Mojca Miklavec (2010-05-24 02:16): > Dear Claudio, > > Thanks a lot for your prompt reply. > > On Mon, May 24, 2010 at 00:39, Claudio Beccari wrote: > > Dear Mojca, > > no proper Italian word ends in ch (this digraph in normal Italian words is > > pronunced as k, not as č or ć). > > Nevertheless there are a number of surnames dating back to the old times > > (150 years ago) when North East Italy was under Austro-Hungarian ruling, > > when Istrian names, mainly Croatian and Slovenian, where transliterated in > > such a way that the tipical patronimic ending -ič or -ić (I don't know the > > exact spelling in Latin letters of the Croatian/Slovenian names) was > > transliterated for the Empire bureaucracy with -ich. > > Thanks a lot for some more insight. I admit that I didn't know the > details (I should be ashamed) and in my area they were more radical > with surname changes (mine was Michelazzi and I think that most > surnames here were "properly Romanized", for example Filipčič -> > Filippi, so again no problems with hyphenation :) :) :). > > > This spelling remained > > when North East Italy and Istria were annexed to the Kingdom of Italy at the > > end of WW1. After WW2 most of Istria returned mainly to Croatia and a small > > part to Slovenia, but the Slovenians and Croatians that had moved the NE > > Italy and had become Italian citizens maintained their surnames with the > > Austro-Hungarian spelling. > > > > When I prepared the hyphen patterns for Italian ad Latin I did think to > > this particular spelling, but I concluded that it was not so important; I > > was wrong, and I apologize. > > There's no need to apologize. First, there's an "infinite" number of > foreign names, so that one simply cannot get all of them right. I > guess that Lju-bl-ja-na is not properly hyphenated either (Lu-bia-na > is ok), but in my opinion it's a valid argument that one should change > the language when writing foreign names if they are to be hyphenated > properly. I can also easily imagine Slovenian patterns that would > hyphenate: > Fis-cher, Aac-hen, Go-ethe > when not knowing that those letters represent a single "letter"/sound > in foreign words. > > Second, I have no idea, but I think it was a pure coincidence that the > "problem" reported by Rogutės Sparnuotos is the same as that for > surnames of a group of people on North-East (I think that the name in > question comes from Russia with translitaration done by English). On > the other hand if it's just a tiny pattern that solves them all ... Thank you Mojca and Claudio for your replies. Mojca has guessed correctly: I merely noticed that the surname Manovich is hyphenated wrongly in the three languages I've tested. And I don't mind using \hyphenation{} or switching language for foreign names. I don't know how hyphenation patterns are made, so I was surprised to see the main rule of at least Latin/Italian/Lithuanian hyphenation broken (a syllable must contain a vowel). From your explanations it seems that hyphenation patterns are kind of case-by-case rules, so this problem is not suprising, since no common words end with '-ch' in these languages. Wonder if I'll find a maintainer of the Lithuanian patterns... -- -- Rogutės Sparnuotos ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: hyphenation patterns 2010-05-23 21:38 ` Mojca Miklavec [not found] ` <4BF9AE8A.6040405@gmail.com> @ 2010-05-24 14:50 ` luigi scarso 1 sibling, 0 replies; 18+ messages in thread From: luigi scarso @ 2010-05-24 14:50 UTC (permalink / raw) To: mailing list for ConTeXt users On Sun, May 23, 2010 at 11:38 PM, Mojca Miklavec <mojca.miklavec.lists@gmail.com> wrote: > hyphenate properly in Italian. Italian is a > what-you-see-is-what-you-pronounce language (in contrast to English) Apart some traps like glicine vs tagliare where syllable 'gli' is spelled in completely different way or anno (year) vs hanno (have in "they have") where the sound is the same or àncora (anchor) vs ancóra (again) and we usually write ancora vs ancora (yes, no difference: only the sound is different) or péro (pear tree) vs però (but) and so on. -- luigi ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Hyphenation patterns @ 2020-10-08 15:41 Denis Maier 2020-10-08 16:20 ` Tomas Hala 2020-10-08 17:05 ` Henning Hraban Ramm 0 siblings, 2 replies; 18+ messages in thread From: Denis Maier @ 2020-10-08 15:41 UTC (permalink / raw) To: mailing list for ConTeXt users Hi, where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are: 1. applicable => hyphenated as applic-able 2. obligated => hyphenated as oblig-ated I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless. Best, Denis ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-08 15:41 Hyphenation patterns Denis Maier @ 2020-10-08 16:20 ` Tomas Hala 2020-10-08 17:05 ` Henning Hraban Ramm 1 sibling, 0 replies; 18+ messages in thread From: Tomas Hala @ 2020-10-08 16:20 UTC (permalink / raw) To: mailing list for ConTeXt users Hi, you can find patterns on this directory: texlive/2020/texmf-dist/tex/context/patterns/mkiv/ Best wishes, Tomáš Thu, Oct 08, 2020 ve 05:41:09PM +0200 Denis Maier napsal(a): # Hi, # # where can I find the hyphenation patterns used by ConTeXt? I have # two wrongly hyphenated words, and I want to check whether this is # due to incorrect patterns. (I tried the source browser... not much # luck so far.) The words are: # 1. applicable => hyphenated as applic-able # 2. obligated => hyphenated as oblig-ated # # I know I can use \hyphenation to correct that, but I wanted to check # the patterns nevertheless. # # Best, # Denis # ___________________________________________________________________________________ # If your question is of interest to others as well, please add an entry to the Wiki! # # maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context # webpage : http://www.pragma-ade.nl / http://context.aanhet.net # archive : https://bitbucket.org/phg/context-mirror/commits/ # wiki : http://contextgarden.net # ___________________________________________________________________________________ Tomáš Hála -------------------------------------------------------------------- Mendelova univerzita, Provozně ekonomická fakulta, ústav informatiky Zemědělská 1, CZ-613 00 Brno, tel. +420 545 13 22 28 -------------------------------------------------------------------- http://akela.mendelu.cz/~thala ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-08 15:41 Hyphenation patterns Denis Maier 2020-10-08 16:20 ` Tomas Hala @ 2020-10-08 17:05 ` Henning Hraban Ramm 2020-10-09 6:52 ` Denis Maier 2020-10-09 8:54 ` Hans Hagen 1 sibling, 2 replies; 18+ messages in thread From: Henning Hraban Ramm @ 2020-10-08 17:05 UTC (permalink / raw) To: mailing list for ConTeXt users > Am 08.10.2020 um 17:41 schrieb Denis Maier <denismaier@mailbox.org>: > > where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are: > 1. applicable => hyphenated as applic-able > 2. obligated => hyphenated as oblig-ated > > I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless. I guess it’s just a valid option. You can check possible hyphenations like this: \starttext {EN: \en\hyphenatedcoloredword{applicable}} {DE: \de\hyphenatedcoloredword{applicable}} \stoptext Hraban ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-08 17:05 ` Henning Hraban Ramm @ 2020-10-09 6:52 ` Denis Maier 2020-10-09 6:57 ` Taco Hoekwater 2020-10-09 8:15 ` Henning Hraban Ramm 2020-10-09 8:54 ` Hans Hagen 1 sibling, 2 replies; 18+ messages in thread From: Denis Maier @ 2020-10-09 6:52 UTC (permalink / raw) To: mailing list for ConTeXt users, Henning Hraban Ramm [-- Attachment #1.1: Type: text/plain, Size: 616 bytes --] Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm: > \starttext > > {EN: \en\hyphenatedcoloredword{applicable}} > > {DE: \de\hyphenatedcoloredword{applicable}} > > \stoptext Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble" According to Meriam-Webster it should just be "ap·pli·ca·ble". {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" According to Meriam-Webster it should be "ob·li·gate". I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening. So, is that a bug? Best, Denis [-- Attachment #1.2: Type: text/html, Size: 1312 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-09 6:52 ` Denis Maier @ 2020-10-09 6:57 ` Taco Hoekwater 2020-10-09 7:01 ` Denis Maier 2020-10-09 8:15 ` Henning Hraban Ramm 1 sibling, 1 reply; 18+ messages in thread From: Taco Hoekwater @ 2020-10-09 6:57 UTC (permalink / raw) To: mailing list for ConTeXt users > On 9 Oct 2020, at 08:52, Denis Maier <denismaier@mailbox.org> wrote: > > Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm: >> \starttext >> >> {EN: \en\hyphenatedcoloredword{applicable}} >> >> {DE: \de\hyphenatedcoloredword{applicable}} >> >> \stoptext >> > Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble" > According to Meriam-Webster it should just be "ap·pli·ca·ble". > > {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" > According to Meriam-Webster it should be "ob·li·gate". > > I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening. > > So, is that a bug? Not really. hyphenation patterns are a bit like applying JPEG compression to a dictionary. It makes the data size smaller by recognising patterns while ignoring outliers. Occasional errors are to be expected, which is why \hyphenation exists. Best wishes, Taco ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-09 6:57 ` Taco Hoekwater @ 2020-10-09 7:01 ` Denis Maier 2020-10-09 12:48 ` Hans Hagen 0 siblings, 1 reply; 18+ messages in thread From: Denis Maier @ 2020-10-09 7:01 UTC (permalink / raw) To: mailing list for ConTeXt users, Taco Hoekwater Am 09.10.2020 um 08:57 schrieb Taco Hoekwater: > >> On 9 Oct 2020, at 08:52, Denis Maier <denismaier@mailbox.org> wrote: >> >> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm: >>> \starttext >>> >>> {EN: \en\hyphenatedcoloredword{applicable}} >>> >>> {DE: \de\hyphenatedcoloredword{applicable}} >>> >>> \stoptext >>> >> Wow, that's super helpful. The English pattern seems to be "ap-plic-a-ble" >> According to Meriam-Webster it should just be "ap·pli·ca·ble". >> >> {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" >> According to Meriam-Webster it should be "ob·li·gate". >> >> I've had a look at the files mentioned by Tomáš, but as these are not just wordlists I can not really tell what is happening. >> >> So, is that a bug? > Not really. hyphenation patterns are a bit like applying JPEG compression to > a dictionary. It makes the data size smaller by recognising patterns while > ignoring outliers. > > Occasional errors are to be expected, which is why \hyphenation exists. > > I see. I've noticed lang-us.lua has a list of exceptions in it: ["exceptions"]={ ["characters"]="abcdefghijlmnoprstuyz", ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory phil-an-thropic present presents project projects reci-procity re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble", ["length"]=168, ["n"]=14, }, Would it be possible to add more exceptions to that list as they come up? Or is that inappropriate? Denis ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-09 7:01 ` Denis Maier @ 2020-10-09 12:48 ` Hans Hagen 2020-10-09 12:59 ` Denis Maier 0 siblings, 1 reply; 18+ messages in thread From: Hans Hagen @ 2020-10-09 12:48 UTC (permalink / raw) To: ntg-context On 10/9/2020 9:01 AM, Denis Maier wrote: > Am 09.10.2020 um 08:57 schrieb Taco Hoekwater: >> >>> On 9 Oct 2020, at 08:52, Denis Maier <denismaier@mailbox.org> wrote: >>> >>> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm: >>>> \starttext >>>> >>>> {EN: \en\hyphenatedcoloredword{applicable}} >>>> >>>> {DE: \de\hyphenatedcoloredword{applicable}} >>>> >>>> \stoptext >>>> >>> Wow, that's super helpful. The English pattern seems to be >>> "ap-plic-a-ble" >>> According to Meriam-Webster it should just be "ap·pli·ca·ble". >>> >>> {EN: \en\hyphenatedcoloredword{obligate}} gives me "ob-lig-ate" >>> According to Meriam-Webster it should be "ob·li·gate". >>> >>> I've had a look at the files mentioned by Tomáš, but as these are not >>> just wordlists I can not really tell what is happening. >>> >>> So, is that a bug? >> Not really. hyphenation patterns are a bit like applying JPEG >> compression to >> a dictionary. It makes the data size smaller by recognising patterns >> while >> ignoring outliers. >> >> Occasional errors are to be expected, which is why \hyphenation exists. >> >> > I see. I've noticed lang-us.lua has a list of exceptions in it: > ["exceptions"]={ > ["characters"]="abcdefghijlmnoprstuyz", > ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory > phil-an-thropic present presents project projects reci-procity > re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble", > ["length"]=168, > ["n"]=14, > }, > > Would it be possible to add more exceptions to that list as they come > up? Or is that inappropriate? you can add your own runtime in a style: \hyphenation {fo-ob-ar} \hsize 1mm foobar ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-09 12:48 ` Hans Hagen @ 2020-10-09 12:59 ` Denis Maier 0 siblings, 0 replies; 18+ messages in thread From: Denis Maier @ 2020-10-09 12:59 UTC (permalink / raw) To: mailing list for ConTeXt users, Hans Hagen Am 09.10.2020 um 14:48 schrieb Hans Hagen: > On 10/9/2020 9:01 AM, Denis Maier wrote: >> [...] >> I see. I've noticed lang-us.lua has a list of exceptions in it: >> ["exceptions"]={ >> ["characters"]="abcdefghijlmnoprstuyz", >> ["data"]="as-so-ciate as-so-ciates dec-li-na-tion oblig-a-tory >> phil-an-thropic present presents project projects reci-procity >> re-cog-ni-zance ref-or-ma-tion ret-ri-bu-tion ta-ble", >> ["length"]=168, >> ["n"]=14, >> }, >> >> Would it be possible to add more exceptions to that list as they come >> up? Or is that inappropriate? > you can add your own runtime in a style: > > \hyphenation {fo-ob-ar} \hsize 1mm foobar Sure. I use \startexceptions[en] for that. I just thought everyone might benefit... Denis ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-09 6:52 ` Denis Maier 2020-10-09 6:57 ` Taco Hoekwater @ 2020-10-09 8:15 ` Henning Hraban Ramm 2020-10-09 8:59 ` Hans Hagen 2021-04-09 21:57 ` Arthur Rosendahl 1 sibling, 2 replies; 18+ messages in thread From: Henning Hraban Ramm @ 2020-10-09 8:15 UTC (permalink / raw) To: mailing list for ConTeXt users > Am 09.10.2020 um 08:52 schrieb Denis Maier <denismaier@mailbox.org>: > > Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm: >> \starttext >> >> {EN: \en\hyphenatedcoloredword{applicable}} >> >> {DE: \de\hyphenatedcoloredword{applicable}} >> >> \stoptext >> > Wow, that's super helpful. BTW \hyphenatedword works the same. I didn’t see anything colored. There are some more commands like this, even \hyphenatedfile, see https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=hyphenated Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules. Hraban ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-09 8:15 ` Henning Hraban Ramm @ 2020-10-09 8:59 ` Hans Hagen 2021-04-09 21:57 ` Arthur Rosendahl 1 sibling, 0 replies; 18+ messages in thread From: Hans Hagen @ 2020-10-09 8:59 UTC (permalink / raw) To: mailing list for ConTeXt users, Henning Hraban Ramm On 10/9/2020 10:15 AM, Henning Hraban Ramm wrote: > > >> Am 09.10.2020 um 08:52 schrieb Denis Maier <denismaier@mailbox.org>: >> >> Am 08.10.2020 um 19:05 schrieb Henning Hraban Ramm: >>> \starttext >>> >>> {EN: \en\hyphenatedcoloredword{applicable}} >>> >>> {DE: \de\hyphenatedcoloredword{applicable}} >>> >>> \stoptext >>> >> Wow, that's super helpful. > > BTW \hyphenatedword works the same. I didn’t see anything colored. > There are some more commands like this, even \hyphenatedfile, see > https://source.contextgarden.net/tex/context/base/mkiv/supp-box.mkiv?search=hyphenated > > Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules. ancient secret features: >mtxrun --script patterns --hyphenate applicable --language=gb hyphenator | hyphenator | . a p p l i c a b l e . . a p p l i c a b l e . hyphenator | 2a0p0 2 0 0 0 0 0 0 0 0 0 0 hyphenator | 4p1p2 2 4 1 2 0 0 0 0 0 0 0 hyphenator | 0p2l2 2 4 1 2 2 0 0 0 0 0 0 hyphenator | 1a0b0 2 4 1 2 2 0 1 0 0 0 0 hyphenator | 2b0l2 2 4 1 2 2 0 1 2 0 2 0 hyphenator | 4l0e0.0 2 4 1 2 2 0 1 2 4 2 0 hyphenator | .2a4p1p2l2i0c1a2b4l2e0. . a p-p l i c-a b l e . hyphenator | mtx-patterns | gb 3 3 : applicable : applic-able >mtxrun --script patterns --hyphenate applicable --language=us hyphenator | hyphenator | . a p p l i c a b l e . . a p p l i c a b l e . hyphenator | 4p1p0 0 4 1 0 0 0 0 0 0 0 0 hyphenator | 1p2l2 0 4 1 2 2 0 0 0 0 0 0 hyphenator | 0p0l0i2c1a0b0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 1c0a0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 0c0a1b0l0 0 4 1 2 2 2 1 1 0 0 0 hyphenator | 0b2l2 0 4 1 2 2 2 1 1 2 2 0 hyphenator | 0b4l0e0.0 0 4 1 2 2 2 1 1 4 2 0 hyphenator | .0a4p1p2l2i2c1a1b4l2e0. . a p-p l i c-a-b l e . hyphenator | mtx-patterns | us 3 3 : applicable : applic-a-ble not the kind of stuff one wants to expose a new user to Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-09 8:15 ` Henning Hraban Ramm 2020-10-09 8:59 ` Hans Hagen @ 2021-04-09 21:57 ` Arthur Rosendahl 1 sibling, 0 replies; 18+ messages in thread From: Arthur Rosendahl @ 2021-04-09 21:57 UTC (permalink / raw) To: Mailing list for ConTeXt users Denis’ latest question reminded me of an earlier query he had about hyphenation, asking why “applicable” and “obligated” were hyphenated by ConTeXt as ap-plic-a-ble and ob-lig-at-ed, and not ap-pli-ca-ble and ob-li-ga-te(d) like in Merriam-Webster (the discussion started at https://mailman.ntg.nl/pipermail/ntg-context/2020/099695.html). First of all, I note that while Webster’s dictionary is a useful guide, and indeed a major reference for any American typographer, there’s no absolute rule that we have to follow it either. The break applic-able, for example, does look acceptable to me; oblig-ated, less so. Taco reminded that when producing a set of hyphenation patterns from a list of hyphenated words, we’re essentially compressing information, and that some minor deviations are to be expected. However, in my experience, unexpected breakpoints are almost never due to chance, but to a deliberate decision. Then Hraban said that: On Fri, Oct 09, 2020 at 10:15:17AM +0200, Henning Hraban Ramm wrote: > Usually Arthur’s (hail the emperor of hyphenation and protector of the patterns) patterns are flawless, so I guess it’s not a bug but an exception of the rules. I see that my self-appointed title is catching on, nice :-) Unfortunately the patterns are just as likely to contain errors as anything else, and in this particular case we’ll probably never know for sure, because the original hyphenated word list was never published (all the word lists from which patterns were produced in the 80s and 90s have been lost, for all languages). We’re thus reduced to guessing the intent of those who compiled the lists. We can get hints from looking at the patterns involved in the debatable breaks. Hans has a useful script: $ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate applicable hyphenator | hyphenator | . a p p l i c a b l e . . a p p l i c a b l e . hyphenator | 4p1p0 0 4 1 0 0 0 0 0 0 0 0 hyphenator | 1p2l2 0 4 1 2 2 0 0 0 0 0 0 hyphenator | 0p0l0i2c1a0b0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 1c0a0 0 4 1 2 2 2 1 0 0 0 0 hyphenator | 0c0a1b0l0 0 4 1 2 2 2 1 1 0 0 0 hyphenator | 0b2l2 0 4 1 2 2 2 1 1 2 2 0 hyphenator | 0b4l0e0.0 0 4 1 2 2 2 1 1 4 2 0 hyphenator | .0a4p1p2l2i2c1a1b4l2e0. . a p-p l i c-a-b l e . hyphenator | mtx-patterns | us 2 2 : applicable : ap-plic-a-ble That tells us that there are seven patterns involved in hyphenating the word applicable: 4p1, 1p2l2, pli2c1ab, 1ca, ca1bl, b2l2, and b4le. (the final dot is part of that last pattern). The pattern responsible for the break applic-able is pli2c1ab. If we now refer to the source repository for hyphenation patterns (since comments are stripped in the ConTeXt sources): https://github.com/hyphenation/tex-hyphen/blob/master/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-en-us.tex -- we can see line 4508 hyphen.tex patterns end here, and additional patterns begin: which means that the pattern pli2c1ab, line 4817, is an “additional pattern”. The background story is that hyphen.tex, the original hyphenation pattern file for American English, produced in 1982-1983 from a list of hyphenated words (following mostly Webster’s), was later augmented with more patterns that were supposed to improve hyphenation for many words. The person who added these new patterns apparently had a list of words hyphenated incorrectly (according to him) by hyphen.tex, but both that list and the one used to produce hyphen.tex are as mentioned above now lost, probably forever. In any case, the pattern that causes the break applic-able was clearly added intentionally; and as I said that break seems quite reasonable to me. Not so for the one in oblig-ated, so let’s have a look at that: $ mtxrun --script patterns --language=us --left=2 --right=2 --hyphenate obligated hyphenator | hyphenator | . o b l i g a t e d . . o b l i g a t e d . hyphenator | 0o0b0l0i2g1 0 0 0 0 2 1 0 0 0 0 hyphenator | 0b2l2 0 0 2 2 2 1 0 0 0 0 hyphenator | 5l0i0g0a0t0e0 0 0 5 2 2 1 0 0 0 0 hyphenator | 2i0g0 0 0 5 2 2 1 0 0 0 0 hyphenator | 1g0a0 0 0 5 2 2 1 0 0 0 0 hyphenator | 2t1e0d0 0 0 5 2 2 1 2 1 0 0 hyphenator | .0o0b5l2i2g1a2t1e0d0. . o b-l i g-a t-e d . hyphenator | mtx-patterns | us 2 2 : obligated : ob-lig-at-ed Here we see that the dubious break is caused by the pattern obli2g1, also an “additional pattern” (line 4783), and here it’s not hard to guess where it comes from: it has to be for the word obligatory, hyphenated regularly as o-blig-a-to-ry according to M-W -- and myself ;-) The incorrect breakpoint in obli-gated is an undesired side effect of that. Best, ArthuR ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Hyphenation patterns 2020-10-08 17:05 ` Henning Hraban Ramm 2020-10-09 6:52 ` Denis Maier @ 2020-10-09 8:54 ` Hans Hagen 1 sibling, 0 replies; 18+ messages in thread From: Hans Hagen @ 2020-10-09 8:54 UTC (permalink / raw) To: mailing list for ConTeXt users, Henning Hraban Ramm On 10/8/2020 7:05 PM, Henning Hraban Ramm wrote: > >> Am 08.10.2020 um 17:41 schrieb Denis Maier <denismaier@mailbox.org>: >> >> where can I find the hyphenation patterns used by ConTeXt? I have two wrongly hyphenated words, and I want to check whether this is due to incorrect patterns. (I tried the source browser... not much luck so far.) The words are: >> 1. applicable => hyphenated as applic-able >> 2. obligated => hyphenated as oblig-ated >> >> I know I can use \hyphenation to correct that, but I wanted to check the patterns nevertheless. > > I guess it’s just a valid option. > You can check possible hyphenations like this: > > \starttext > > {EN: \en\hyphenatedcoloredword{applicable}} > > {DE: \de\hyphenatedcoloredword{applicable}} > > \stoptext americans and brits hyphnetate differently \starttext {\language[usenglish] {\tt US \number\normallanguage}: \hyphenatedcoloredword{applicable}}\par {\language[ukenglish] {\tt UK \number\normallanguage}: \hyphenatedcoloredword{applicable}}\par \stoptext syllable vs stem (but I bet Arthur can explain better) hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-04-09 21:57 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-05-23 23:22 hyphenation patterns Rogutės Sparnuotos 2010-05-23 21:38 ` Mojca Miklavec [not found] ` <4BF9AE8A.6040405@gmail.com> 2010-05-24 0:16 ` Mojca Miklavec 2010-05-24 8:17 ` Hans Hagen 2010-05-24 18:52 ` rogutes 2010-05-24 14:50 ` luigi scarso 2020-10-08 15:41 Hyphenation patterns Denis Maier 2020-10-08 16:20 ` Tomas Hala 2020-10-08 17:05 ` Henning Hraban Ramm 2020-10-09 6:52 ` Denis Maier 2020-10-09 6:57 ` Taco Hoekwater 2020-10-09 7:01 ` Denis Maier 2020-10-09 12:48 ` Hans Hagen 2020-10-09 12:59 ` Denis Maier 2020-10-09 8:15 ` Henning Hraban Ramm 2020-10-09 8:59 ` Hans Hagen 2021-04-09 21:57 ` Arthur Rosendahl 2020-10-09 8:54 ` Hans Hagen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).