* Unicode normalization and Hebrew in ConTeXt @ 2020-04-28 11:59 Joey McCollum 2020-04-28 13:17 ` Hans Hagen 2020-04-30 9:26 ` Hans Hagen 0 siblings, 2 replies; 16+ messages in thread From: Joey McCollum @ 2020-04-28 11:59 UTC (permalink / raw) To: ntg-context [-- Attachment #1.1: Type: text/plain, Size: 2670 bytes --] I am typesetting a document in Hebrew that includes pointing (e.g., vowels, shin and sin dots, dagesh, etc.) using ConTeXt. The Hebrew text that I want to typeset has been normalized into Unicode's NFC canonical form. It is well-known that the Unicode canonical ordering of Hebrew points conflicts with the recommended mark ordering of specific points based on their functions (see https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf for more on this topic). Thankfully, many typesetting engines automatically reorder the points to ensure that they are combined according to the specifications of many fonts. I'm pretty sure that XeLaTeX is one of these, as it typesets Hebrew letters with multiple points correctly even when the Hebrew text is in NFC form. My question is, can ConTeXt with LuaTeX handle the same situation correctly? In the following minimal example, ConTeXt typesets pointed Hebrew correctly when the characters are in the typographically recommended order, but not when they are in Unicode canonical order: ``` %Setup Hebrew text font: \definefontfeature[f:pointedhebrew][default][ ccmp=yes, mark=yes, script=hebr ] \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew] %Set the body font: \setupbodyfont[hebrew] %Set up right-to-left alignment: \setupalign[r2l] \starttext %Characters after normalization, in Unicode canonical order (bet + segol + dagesh + final nun): בֶּן %A word with characters in typographically recommended order (bet + dagesh + segol + final nun): בֶּן \stoptext ``` I typeset this using ConTeXt version 2020.03.10, as released with TeXLive 2020. I got the SBL Hebrew font from https://www.sbl-site.org/educational/BiblicalFonts_SBLHebrew.aspx. According to the font's user manual (see the link above the MWE), the font should be able to combine the marks to form the correct glyph regardless of their order after the consonant, but that doesn't seem to be the case here. I also tried using the predefined "hebrew" featureset, but that did not change anything. Is there some other OpenType feature or featureset I need to enable to fix this, or is there some module or option I can include to get ConTeXt to typeset Unicode-normalized Hebrew as if it were ordered in the recommended way, like XeLaTeX does? I see that the uninormalize module is mentioned in the thread "XeLaTeX, LuaLaTeX, fontspec, unicode and normalization" on TeX Stack Exchange ( https://tex.stackexchange.com/questions/229044/xelatex-lualatex-fontspec-unicode-and-normalization); can that be used with ConTeXt? Thank you, Joey [-- Attachment #1.2: Type: text/html, Size: 3133 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2020-04-28 11:59 Unicode normalization and Hebrew in ConTeXt Joey McCollum @ 2020-04-28 13:17 ` Hans Hagen [not found] ` <CAGxRUG_cyT1XfYTvTh23+LaVg4an8exL-LZYfbZUOf+s2XvRRQ@mail.gmail.com> 2020-04-30 9:26 ` Hans Hagen 1 sibling, 1 reply; 16+ messages in thread From: Hans Hagen @ 2020-04-28 13:17 UTC (permalink / raw) To: mailing list for ConTeXt users, Joey McCollum On 4/28/2020 1:59 PM, Joey McCollum wrote: > \definefontfeature[f:pointedhebrew][default][ > ccmp=yes, > mark=yes, > script=hebr > ] > \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew] > %Set the body font: > \setupbodyfont[hebrew] > %Set up right-to-left alignment: > \setupalign[r2l] > \starttext > %Characters after normalization, in Unicode canonical order (bet + > segol + dagesh + final nun): > בֶּן > > %A word with characters in typographically recommended order (bet + > dagesh + segol + final nun): > בֶּן > \stoptext \startluacode fonts.handlers.otf.addfeature { name = "normalizehebrew", type = "chainsubstitution", prepend = 1, lookups = { { type = "multiple", data = { [0x5B6] = { 0x5BC, 0x5B6 }, }, }, }, data = { rules = { { current = { { 0x5B6 }, { 0x5BC } }, lookups = { 1, 0 }, }, }, }, } \stopluacode \definefontfeature [f:pointedhebrew] [hebrew] [normalizehebrew=yes] \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew] \setupbodyfont[hebrew] \setupalign[r2l] \starttext בֶּן \quad בֶּן \par \stoptext How many such reorderings are there? (I saw some document about that font and it sounds like a bit messy wrt all these input variants.) (there are several mechanisms in context to deal with such issues, it's all about getting specs from users i.e. tex is all about control so in principle it should be doable) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAGxRUG_cyT1XfYTvTh23+LaVg4an8exL-LZYfbZUOf+s2XvRRQ@mail.gmail.com>]
* Fwd: Unicode normalization and Hebrew in ConTeXt [not found] ` <CAGxRUG_cyT1XfYTvTh23+LaVg4an8exL-LZYfbZUOf+s2XvRRQ@mail.gmail.com> @ 2020-04-28 16:16 ` Joey McCollum 2020-04-28 18:03 ` Hans Hagen 2020-04-28 18:21 ` Hans Hagen 0 siblings, 2 replies; 16+ messages in thread From: Joey McCollum @ 2020-04-28 16:16 UTC (permalink / raw) To: ntg-context [-- Attachment #1.1: Type: text/plain, Size: 5749 bytes --] Thank you for the prompt and thorough response! If the reorderings have to be done for each pair of characters in different combining classes that are not in the expected typographical order, then there will be a lot (probably hundreds) of substitution rules. I am not very familiar with coding in Lua, but if there is a way to add substitution features for specific classes of points, then that would require a lot fewer cases. Unicode's canonical ordering of Hebrew marks is based on their combining classes, with characters in higher combining classes being sorted after those with lower combining classes in canonical order. The typographically recommended ordering of certain characters is found in Table 1 (p. 12) of https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf. The following list of character classes, with information about their Unicode combining classes (which I retrieved from the Lua script https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua), is indexed after the character classes described in that table: 1. The consonants (Unicode points 05D0-05EA) have no combining class and are never reordered; this is typographically correct. 2. Shin dot and sin dot (05C1-05C2) should be next, but Unicode places them in combining classes 24 and 25, after the characters in recommended classes 3-5 and many of the characters in recommended class 6. 3. Dagesh / mapiq (05BC) should be next, but Unicode assigns it a combining class of 21. This means that it will be incorrectly ordered before characters in recommended class 2 and after characters in recommended classes 4-6 after Unicode normalization. 4. Rafe (05BF) should be next, but Unicode assigns it a combining class of 23. Thus, it will be correctly placed after characters in recommended class 3, but incorrectly placed before characters in recommended class 2 after Unicode normalization. 5. The holam and holam haser vowel points (05B9-05BA) should be next, but Unicode places them in combining class 19. This means that it will be placed incorrectly before characters in recommended classes 2-4 and after all characters in recommended class 6 except 05BB after Unicode normalization. 6. The characters in 0591, 0596, 059B, 05A2-05A7, 05AA, 05B0-05B8, 05BB, 05BD, 05C5, 05C7 should be treated as being in the same class, but Unicode places them in combining classes 10-18, 20, 22, and 220. 7. The prepositive marks yetiv and dehi (059A, 05AD) should be next; Unicode places them in combining class 222, so they should correctly come after all characters in recommended classes 1-6. 8. The characters 0307, 0593-0595, 0597-0598, 059C-05A1, 05A8, 05AB-05AC, 05AF, 05C4 should be treated as being in the same class; Unicode places them in combining class 230, so they should correctly come after all characters in recommended classes 1-7. 9. The postpositive marks segolta, pashta, telisha qetana, and zinor (0592, 0599, 05A9, 05AE) should be next; Unicode places them in combining class 230, so they will need to be reordered after the characters in recommended class 8. This a lot of information, and I've probably not presented it as clearly as I could, so if there is any confusion, please let me know, and I can try to explain better. If there is any other information you need, please let me know. Thanks again! On Tue, Apr 28, 2020 at 9:17 AM Hans Hagen <j.hagen@xs4all.nl> wrote: > On 4/28/2020 1:59 PM, Joey McCollum wrote: > > \definefontfeature[f:pointedhebrew][default][ > > ccmp=yes, > > mark=yes, > > script=hebr > > ] > > \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew] > > %Set the body font: > > \setupbodyfont[hebrew] > > %Set up right-to-left alignment: > > \setupalign[r2l] > > \starttext > > %Characters after normalization, in Unicode canonical order (bet + > > segol + dagesh + final nun): > > בֶּן > > > > %A word with characters in typographically recommended order (bet + > > dagesh + segol + final nun): > > בֶּן > > \stoptext > > \startluacode > fonts.handlers.otf.addfeature { > name = "normalizehebrew", > type = "chainsubstitution", > prepend = 1, > lookups = { > { > type = "multiple", > data = { > [0x5B6] = { 0x5BC, 0x5B6 }, > }, > }, > }, > data = { > rules = { > { > current = { { 0x5B6 }, { 0x5BC } }, > lookups = { 1, 0 }, > }, > }, > }, > } > \stopluacode > > \definefontfeature > [f:pointedhebrew] > [hebrew] > [normalizehebrew=yes] > > \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew] > > \setupbodyfont[hebrew] > > \setupalign[r2l] > > \starttext > בֶּן \quad בֶּן \par > \stoptext > > How many such reorderings are there? (I saw some document about that > font and it sounds like a bit messy wrt all these input variants.) > > (there are several mechanisms in context to deal with such issues, it's > all about getting specs from users i.e. tex is all about control so in > principle it should be doable) > > Hans > > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl > ----------------------------------------------------------------- > [-- Attachment #1.2: Type: text/html, Size: 7091 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Fwd: Unicode normalization and Hebrew in ConTeXt 2020-04-28 16:16 ` Fwd: " Joey McCollum @ 2020-04-28 18:03 ` Hans Hagen 2020-04-28 18:21 ` Hans Hagen 1 sibling, 0 replies; 16+ messages in thread From: Hans Hagen @ 2020-04-28 18:03 UTC (permalink / raw) To: mailing list for ConTeXt users, Joey McCollum On 4/28/2020 6:16 PM, Joey McCollum wrote: > Thank you for the prompt and thorough response! > > If the reorderings have to be done for each pair of characters in > different combining classes that are not in the expected typographical > order, then there will be a lot (probably hundreds) of substitution > rules. I am not very familiar with coding in Lua, but if there is a way > to add substitution features for specific classes of points, then that > would require a lot fewer cases. don't worry about that now, the lua part is normally the easy part (a where-it-makes-most-sense-hooking-into could take more thinking) (i have been thinking of some additional feature mechanisms but have to find back some code i played with long ago) > Unicode's canonical ordering of Hebrew marks is based on their combining > classes, with characters in higher combining classes being sorted after > those with lower combining classes in canonical order. The > typographically recommended ordering of certain characters is found in > Table 1 (p. 12) of > https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf. The So how official is that? Or is this something specific for this font? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Fwd: Unicode normalization and Hebrew in ConTeXt 2020-04-28 16:16 ` Fwd: " Joey McCollum 2020-04-28 18:03 ` Hans Hagen @ 2020-04-28 18:21 ` Hans Hagen 2020-04-28 19:06 ` Joey McCollum 2020-04-30 19:40 ` Arthur Reutenauer 1 sibling, 2 replies; 16+ messages in thread From: Hans Hagen @ 2020-04-28 18:21 UTC (permalink / raw) To: mailing list for ConTeXt users, Joey McCollum On 4/28/2020 6:16 PM, Joey McCollum wrote: > https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua), looks like an ancient copy of char-def.lua (we actually do have a combining entry) Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Fwd: Unicode normalization and Hebrew in ConTeXt 2020-04-28 18:21 ` Hans Hagen @ 2020-04-28 19:06 ` Joey McCollum 2020-04-30 19:40 ` Arthur Reutenauer 1 sibling, 0 replies; 16+ messages in thread From: Joey McCollum @ 2020-04-28 19:06 UTC (permalink / raw) To: Hans Hagen; +Cc: mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 1475 bytes --] > > Unicode's canonical ordering of Hebrew marks is based on their combining > > classes, with characters in higher combining classes being sorted after > > those with lower combining classes in canonical order. The > > typographically recommended ordering of certain characters is found in > > Table 1 (p. 12) of > > https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf. The > So how official is that? Or is this something specific for this font? Hebrew Layout Intelligence, which was developed by John Hudson of Tiro Typeworks (who wrote this manual) and Ralph Hancock, is used for mark positioning by a number of Hebrew fonts. Its guidelines govern their glyph classes and chaining substitution rules. The fonts I know of that explicitly implement it are SBL Hebrew, Ezra SIL, Keter YG, Keter Aram Tsova. On Tue, Apr 28, 2020 at 2:21 PM Hans Hagen <j.hagen@xs4all.nl> wrote: > On 4/28/2020 6:16 PM, Joey McCollum wrote: > > > https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua), > > looks like an ancient copy of char-def.lua > > (we actually do have a combining entry) > > Hans > > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl > ----------------------------------------------------------------- > [-- Attachment #1.2: Type: text/html, Size: 2362 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Fwd: Unicode normalization and Hebrew in ConTeXt 2020-04-28 18:21 ` Hans Hagen 2020-04-28 19:06 ` Joey McCollum @ 2020-04-30 19:40 ` Arthur Reutenauer 1 sibling, 0 replies; 16+ messages in thread From: Arthur Reutenauer @ 2020-04-30 19:40 UTC (permalink / raw) To: Mailing list for ConTeXt users On Tue, Apr 28, 2020 at 08:21:01PM +0200, Hans Hagen wrote: > On 4/28/2020 6:16 PM, Joey McCollum wrote: >> https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua), > looks like an ancient copy of char-def.lua I recognise this file name :-) That was from my Google Summer of Code project in 2008. The combining classes were not in char-def.lua at the time, so it was simplest to work with a copy. It’s interesting that it stayed around so long. Best, Arthur ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2020-04-28 11:59 Unicode normalization and Hebrew in ConTeXt Joey McCollum 2020-04-28 13:17 ` Hans Hagen @ 2020-04-30 9:26 ` Hans Hagen 2020-04-30 14:28 ` Joey McCollum 1 sibling, 1 reply; 16+ messages in thread From: Hans Hagen @ 2020-04-30 9:26 UTC (permalink / raw) To: mailing list for ConTeXt users, Joey McCollum On 4/28/2020 1:59 PM, Joey McCollum wrote: > ... > My question is, can ConTeXt with LuaTeX handle the same situation > correctly? In the following minimal example, ConTeXt typesets pointed > Hebrew correctly when the characters are in the typographically > recommended order, but not when they are in Unicode canonical order: We (Joey and I) figured out how to best deal with this. As a result the predefined hebrew feature now will do the right thing for fonts that assume some specific ordering. So, this should work okay: \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=hebrew] in the most recent upload. Maybe there should be a wiki page that summarizes tests with hebrew fonts (but I leave that up to Joey). Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2020-04-30 9:26 ` Hans Hagen @ 2020-04-30 14:28 ` Joey McCollum 2020-04-30 15:14 ` Hans Hagen 0 siblings, 1 reply; 16+ messages in thread From: Joey McCollum @ 2020-04-30 14:28 UTC (permalink / raw) To: Hans Hagen; +Cc: mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 1405 bytes --] Thanks so much, Hans! I should be able to add a wiki page summarizing the tests before the end of the week. For reference purposes, do you know which version of ConTeXt has (or will have) this update included? Joey On Thu, Apr 30, 2020 at 5:26 AM Hans Hagen <j.hagen@xs4all.nl> wrote: > On 4/28/2020 1:59 PM, Joey McCollum wrote: > > > ... > > > My question is, can ConTeXt with LuaTeX handle the same situation > > correctly? In the following minimal example, ConTeXt typesets pointed > > Hebrew correctly when the characters are in the typographically > > recommended order, but not when they are in Unicode canonical order: > We (Joey and I) figured out how to best deal with this. As a result the > predefined hebrew feature now will do the right thing for fonts that > assume some specific ordering. So, this should work okay: > > \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=hebrew] > > in the most recent upload. > > Maybe there should be a wiki page that summarizes tests with hebrew > fonts (but I leave that up to Joey). > > Hans > > > > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl > ----------------------------------------------------------------- > [-- Attachment #1.2: Type: text/html, Size: 2021 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2020-04-30 14:28 ` Joey McCollum @ 2020-04-30 15:14 ` Hans Hagen 2020-04-30 20:17 ` Joey McCollum 0 siblings, 1 reply; 16+ messages in thread From: Hans Hagen @ 2020-04-30 15:14 UTC (permalink / raw) To: Joey McCollum; +Cc: mailing list for ConTeXt users On 4/30/2020 4:28 PM, Joey McCollum wrote: > Thanks so much, Hans! I should be able to add a wiki page summarizing > the tests before the end of the week. > > For reference purposes, do you know which version of ConTeXt has (or > will have) this update included? todays upload ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2020-04-30 15:14 ` Hans Hagen @ 2020-04-30 20:17 ` Joey McCollum 2021-08-17 0:07 ` Joey McCollum via ntg-context 0 siblings, 1 reply; 16+ messages in thread From: Joey McCollum @ 2020-04-30 20:17 UTC (permalink / raw) To: Hans Hagen; +Cc: mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 981 bytes --] Okay! I have not figured out how to add a new page to the wiki, but I was able to add a section to the end of the "Arabic and Hebrew" page ( https://www.contextgarden.net/Arabic_and_Hebrew) discussing the issue, providing a test, and briefly describing the fix. Joey On Thu, Apr 30, 2020 at 11:14 AM Hans Hagen <j.hagen@xs4all.nl> wrote: > On 4/30/2020 4:28 PM, Joey McCollum wrote: > > Thanks so much, Hans! I should be able to add a wiki page summarizing > > the tests before the end of the week. > > > > For reference purposes, do you know which version of ConTeXt has (or > > will have) this update included? > todays upload > > > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl > ----------------------------------------------------------------- > [-- Attachment #1.2: Type: text/html, Size: 1591 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2020-04-30 20:17 ` Joey McCollum @ 2021-08-17 0:07 ` Joey McCollum via ntg-context 2021-08-17 9:19 ` Hans Hagen via ntg-context 0 siblings, 1 reply; 16+ messages in thread From: Joey McCollum via ntg-context @ 2021-08-17 0:07 UTC (permalink / raw) To: Hans Hagen; +Cc: Joey McCollum, mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 4060 bytes --] Hans, Sorry to bring this up after over a year, but I just noticed something that doesn't seem right. I implemented some contextual substitutions in my own fork of the Keter YG Hebrew font (.ttf file attached) under the "dlig" feature that should do the following two things: 1. If a *shin *with a *sin *dot (שׂ) is pointed with a *holam *(the vowel point placed high and on the left), then the *shin*, *sin *dot, and *holam *are combined into a single ligature that depicts the *sin *dot and *holam *merged into a single point. 2. If a *shin *with a *shin *dot (שׁ) follows another letter pointed with a *holam *(except for *vav*, which must be pointed with a *holam haser*), then the shin and shin dot are replaced with a ligature that moves the *shin* dot a bit to the right (so that it appears to be merged with the preceding *holam*), and the combination of the preceding letter and the actual holam is changed to just the preceding letter (thus effectively stripping the old *holam*). I've tested both of these features in FontForge, and they work as expected there. Likewise, if I test them in the following XeLaTeX script, XeLaTeX handles both rules correctly: ``` \documentclass{article} %Set fonts and font features: \usepackage{fontspec} \setmainfont[Path=../fonts/KeterYG/, UprightFont = *-Medium, Script=Hebrew, Ligatures=Discretionary]{KeterYG} % I'm using a local copy of the attached font \begin{document} שֹׂבַע עָשׂוֹר קֹשֶׁט שֹׁשַׁנִּים עָשׂוֹר מֹשֶׁה שַׁלֹשׁ \end{document} ``` But in ConTeXt, only rule (1) above works as expected. Here is a minimal (non-)working example: ``` \starttypescriptcollection[keteryg] \starttypescript[serif][keteryg] \definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew] % use a local copy of the attached font, with all the necessary Hebrew features (this includes dlig by default) \stoptypescript \starttypescript[keteryg] \definetypeface[keteryg][rm][serif][keteryg][default] \stoptypescript \stoptypescriptcollection %Set up the main font: \setupbodyfont[keteryg] %Set up right-to-left alignment: \setupalign[r2l] \starttext שֹׂבַע עָשׂוֹר קֹשֶׁט שֹׁשַׁנִּים עָשׂוֹר מֹשֶׁה שַׁלֹשׁ \stoptext ``` In examples 3, 4, 6, and 7, the *holam *dot still appears before the *shin* -with-merged-*shin*-dot-and-*holam *ligature, when it should be absent. (I realize that it may be difficult to tell; in the last two examples, the presence of two dots is easier to make out.) Do you have any idea why this might be happening in ConTeXt? Does the glyph reordering in font-imp-combining.lua take place before any OpenType features in the font are applied? Thanks again! Joey On Thu, Apr 30, 2020 at 4:17 PM Joey McCollum <jmccollum20140511@gmail.com> wrote: > Okay! I have not figured out how to add a new page to the wiki, but I was > able to add a section to the end of the "Arabic and Hebrew" page ( > https://www.contextgarden.net/Arabic_and_Hebrew) discussing the issue, > providing a test, and briefly describing the fix. > > Joey > > On Thu, Apr 30, 2020 at 11:14 AM Hans Hagen <j.hagen@xs4all.nl> wrote: > >> On 4/30/2020 4:28 PM, Joey McCollum wrote: >> > Thanks so much, Hans! I should be able to add a wiki page summarizing >> > the tests before the end of the week. >> > >> > For reference purposes, do you know which version of ConTeXt has (or >> > will have) this update included? >> todays upload >> >> >> ----------------------------------------------------------------- >> Hans Hagen | PRAGMA ADE >> Ridderstraat 27 | 8061 GH Hasselt | The Netherlands >> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl >> ----------------------------------------------------------------- >> > [-- Attachment #1.2: Type: text/html, Size: 6651 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2021-08-17 0:07 ` Joey McCollum via ntg-context @ 2021-08-17 9:19 ` Hans Hagen via ntg-context 2021-08-17 12:56 ` Joey McCollum via ntg-context 0 siblings, 1 reply; 16+ messages in thread From: Hans Hagen via ntg-context @ 2021-08-17 9:19 UTC (permalink / raw) To: Joey McCollum; +Cc: Hans Hagen, mailing list for ConTeXt users On 8/17/2021 2:07 AM, Joey McCollum wrote: > Sorry to bring this up after over a year, but I just noticed something > that doesn't seem right. I implemented some contextual substitutions in > my own fork of the Keter YG Hebrew font (.ttf file attached) under the > "dlig" feature that should do the following two things: but you don't enable dlig ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2021-08-17 9:19 ` Hans Hagen via ntg-context @ 2021-08-17 12:56 ` Joey McCollum via ntg-context 2021-08-17 19:46 ` Joey McCollum via ntg-context 0 siblings, 1 reply; 16+ messages in thread From: Joey McCollum via ntg-context @ 2021-08-17 12:56 UTC (permalink / raw) To: Hans Hagen; +Cc: Joey McCollum, mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 2430 bytes --] Shouldn't dlig automatically be enabled under the "hebrew" feature set? In font-pre.mkiv, hebrew inherits from semitic-complete, which sets dlig=yes. Still, if I explicitly add dlig, as in the following example, things change, but they still aren't right: ``` \starttypescriptcollection[keteryg] \starttypescript[serif][keteryg] \definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew] % all the necessary Hebrew features, including dlig \stoptypescript \starttypescript[keteryg] \definetypeface[keteryg][rm][serif][keteryg][default] \stoptypescript \stoptypescriptcollection %Set up the main font: \setupbodyfont[keteryg] %Set up right-to-left alignment: \setupalign[r2l] %Explicitly add dlig (in case it wasn't there already): \definefontfeature[plus-dlig][dlig=yes] \starttext \addff{plus-dlig} שֹׂבַע עָשׂוֹר קֹשֶׁט שֹׁשַׁנִּים עָשׂוֹר מֹשֶׁה שַׁלֹשׁ \stoptext ``` In examples 1, 3, 4, and 6, the *holam *of the preceding letter (which should have been stripped in the contextual substitution) just seems to have been moved farther up. In fact, the output looks like it would look if I turned off the reordercombining feature. (And indeed, if I manually reorder the glyphs to the Hebrew Layout Intelligence order, then the results look like they did when I just used the "hebrew" feature.) I may have forgotten to attach the font file I was using for this test. If that is the case, it is available at https://github.com/jjmccollum/Keter-YG. Joey On Tue, Aug 17, 2021 at 5:19 AM Hans Hagen <j.hagen@xs4all.nl> wrote: > On 8/17/2021 2:07 AM, Joey McCollum wrote: > > > Sorry to bring this up after over a year, but I just noticed something > > that doesn't seem right. I implemented some contextual substitutions in > > my own fork of the Keter YG Hebrew font (.ttf file attached) under the > > "dlig" feature that should do the following two things: > but you don't enable dlig > > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl > ----------------------------------------------------------------- > [-- Attachment #1.2: Type: text/html, Size: 4854 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2021-08-17 12:56 ` Joey McCollum via ntg-context @ 2021-08-17 19:46 ` Joey McCollum via ntg-context 2021-08-18 13:19 ` Hans Hagen via ntg-context 0 siblings, 1 reply; 16+ messages in thread From: Joey McCollum via ntg-context @ 2021-08-17 19:46 UTC (permalink / raw) To: Hans Hagen; +Cc: Joey McCollum, mailing list for ConTeXt users [-- Attachment #1.1: Type: text/plain, Size: 2976 bytes --] Thankfully, it looks like this was just a problem with my implementation of the OpenType feature and not with ConTeXt's handling of it! (I worried that it might be ConTeXt when I saw that XeLaTeX was handing the feature correctly.) Hans graciously helped me identify the problem, and everything looks good now! Joey On Tue, Aug 17, 2021 at 8:56 AM Joey McCollum <jmccollum20140511@gmail.com> wrote: > Shouldn't dlig automatically be enabled under the "hebrew" feature set? In > font-pre.mkiv, hebrew inherits from semitic-complete, which sets dlig=yes. > > Still, if I explicitly add dlig, as in the following example, things > change, but they still aren't right: > > ``` > > \starttypescriptcollection[keteryg] > > \starttypescript[serif][keteryg] > > \definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew] > % all the necessary Hebrew features, including dlig > > \stoptypescript > > > \starttypescript[keteryg] > > \definetypeface[keteryg][rm][serif][keteryg][default] > > \stoptypescript > > \stoptypescriptcollection > > > %Set up the main font: > > \setupbodyfont[keteryg] > > %Set up right-to-left alignment: > > \setupalign[r2l] > > %Explicitly add dlig (in case it wasn't there already): > > \definefontfeature[plus-dlig][dlig=yes] > > > \starttext > > \addff{plus-dlig} > > שֹׂבַע > > עָשׂוֹר > > קֹשֶׁט > > שֹׁשַׁנִּים > > עָשׂוֹר > > מֹשֶׁה > > שַׁלֹשׁ > > \stoptext ``` In examples 1, 3, 4, and 6, the *holam *of the preceding > letter (which should have been stripped in the contextual substitution) > just seems to have been moved farther up. In fact, the output looks like it > would look if I turned off the reordercombining feature. (And indeed, if I > manually reorder the glyphs to the Hebrew Layout Intelligence order, then > the results look like they did when I just used the "hebrew" feature.) > > > I may have forgotten to attach the font file I was using for this test. If > that is the case, it is available at > https://github.com/jjmccollum/Keter-YG. > > > Joey > > On Tue, Aug 17, 2021 at 5:19 AM Hans Hagen <j.hagen@xs4all.nl> wrote: > >> On 8/17/2021 2:07 AM, Joey McCollum wrote: >> >> > Sorry to bring this up after over a year, but I just noticed something >> > that doesn't seem right. I implemented some contextual substitutions in >> > my own fork of the Keter YG Hebrew font (.ttf file attached) under the >> > "dlig" feature that should do the following two things: >> but you don't enable dlig >> >> ----------------------------------------------------------------- >> Hans Hagen | PRAGMA ADE >> Ridderstraat 27 | 8061 GH Hasselt | The Netherlands >> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl >> ----------------------------------------------------------------- >> > [-- Attachment #1.2: Type: text/html, Size: 5604 bytes --] [-- Attachment #2: Type: text/plain, Size: 493 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unicode normalization and Hebrew in ConTeXt 2021-08-17 19:46 ` Joey McCollum via ntg-context @ 2021-08-18 13:19 ` Hans Hagen via ntg-context 0 siblings, 0 replies; 16+ messages in thread From: Hans Hagen via ntg-context @ 2021-08-18 13:19 UTC (permalink / raw) To: Joey McCollum; +Cc: Hans Hagen, mailing list for ConTeXt users On 8/17/2021 9:46 PM, Joey McCollum wrote: > Thankfully, it looks like this was just a problem with my implementation > of the OpenType feature and not with ConTeXt's handling of it! (I > worried that it might be ConTeXt when I saw that XeLaTeX was handing the > feature correctly.) Hans graciously helped me identify the problem, and > everything looks good now! Just for the record: one can best try to make a font as robust as possible and not rely on side effects (ambiguous cases). When Idris and I tested some shapers we found that there can be inconsistent results (fwiw, in a rather complex font context agreed more often with uniscribe than xetex, but in the end on ehas to make the font okay for all i guess). When we started with opentype (luatex showed up in 2005) we took uniscribe as reference so that is our benchmark. And lack of specs made us figure out things stepwise. Now, if something works in one shaper and not in another it can of course be due to bugs but it can also be that the spec is simply fuzzy and choices have been made. There is then the danger that eventually bugs become features (I assume the amount of leverage matters here, and tex has zero) which then settles it (kind of) but that doesn't man that one should gamble on it. The same is true for fontnames: don't rely too much on the heuristics hard coded in programs (e.g. fontforge has some for font names, properties, glyph names, and although that is nice for recovery, it also makes other usage hard because fighting fuzzy heuristics is hard once information is lost). Btw, a side effect of your 'issue' is that I found a way to save some memory for some fonts (for now only in lmtx) at the cost of hopefully little extra runtime. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-08-18 13:19 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-28 11:59 Unicode normalization and Hebrew in ConTeXt Joey McCollum 2020-04-28 13:17 ` Hans Hagen [not found] ` <CAGxRUG_cyT1XfYTvTh23+LaVg4an8exL-LZYfbZUOf+s2XvRRQ@mail.gmail.com> 2020-04-28 16:16 ` Fwd: " Joey McCollum 2020-04-28 18:03 ` Hans Hagen 2020-04-28 18:21 ` Hans Hagen 2020-04-28 19:06 ` Joey McCollum 2020-04-30 19:40 ` Arthur Reutenauer 2020-04-30 9:26 ` Hans Hagen 2020-04-30 14:28 ` Joey McCollum 2020-04-30 15:14 ` Hans Hagen 2020-04-30 20:17 ` Joey McCollum 2021-08-17 0:07 ` Joey McCollum via ntg-context 2021-08-17 9:19 ` Hans Hagen via ntg-context 2021-08-17 12:56 ` Joey McCollum via ntg-context 2021-08-17 19:46 ` Joey McCollum via ntg-context 2021-08-18 13:19 ` Hans Hagen via ntg-context
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).