ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Unicode normalization and Hebrew in ConTeXt
@ 2020-04-28 11:59 Joey McCollum
  2020-04-28 13:17 ` Hans Hagen
  2020-04-30  9:26 ` Hans Hagen
  0 siblings, 2 replies; 16+ messages in thread
From: Joey McCollum @ 2020-04-28 11:59 UTC (permalink / raw)
  To: ntg-context


[-- Attachment #1.1: Type: text/plain, Size: 2670 bytes --]

I am typesetting a document in Hebrew that includes pointing (e.g., vowels,
shin and sin dots, dagesh, etc.) using ConTeXt. The Hebrew text that I want
to typeset has been normalized into Unicode's NFC canonical form. It is
well-known that the Unicode canonical ordering of Hebrew points conflicts
with the recommended mark ordering of specific points based on their
functions (see https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf
for more on this topic). Thankfully, many typesetting engines automatically
reorder the points to ensure that they are combined according to the
specifications of many fonts. I'm pretty sure that XeLaTeX is one of these,
as it typesets Hebrew letters with multiple points correctly even when the
Hebrew text is in NFC form.

My question is, can ConTeXt with LuaTeX handle the same situation
correctly? In the following minimal example, ConTeXt typesets pointed
Hebrew correctly when the characters are in the typographically recommended
order, but not when they are in Unicode canonical order:

```
%Setup Hebrew text font:
\definefontfeature[f:pointedhebrew][default][
    ccmp=yes,
    mark=yes,
    script=hebr
]
\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
%Set the body font:
\setupbodyfont[hebrew]
%Set up right-to-left alignment:
\setupalign[r2l]
\starttext
    %Characters after normalization, in Unicode canonical order (bet +
segol + dagesh + final nun):
    בֶּן

    %A word with characters in typographically recommended order (bet +
dagesh + segol + final nun):
    בֶּן
\stoptext
```

I typeset this using ConTeXt version 2020.03.10, as released with TeXLive
2020. I got the SBL Hebrew font from
https://www.sbl-site.org/educational/BiblicalFonts_SBLHebrew.aspx.
According to the font's user manual (see the link above the MWE), the font
should be able to combine the marks to form the correct glyph regardless of
their order after the consonant, but that doesn't seem to be the case here.
I also tried using the predefined "hebrew" featureset, but that did not
change anything.

Is there some other OpenType feature or featureset I need to enable to fix
this, or is there some module or option I can include to get ConTeXt to
typeset Unicode-normalized Hebrew as if it were ordered in the recommended
way, like XeLaTeX does? I see that the uninormalize module is mentioned in
the thread "XeLaTeX, LuaLaTeX, fontspec, unicode and normalization" on TeX
Stack Exchange (
https://tex.stackexchange.com/questions/229044/xelatex-lualatex-fontspec-unicode-and-normalization);
can that be used with ConTeXt?

Thank you,

Joey

[-- Attachment #1.2: Type: text/html, Size: 3133 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2020-04-28 11:59 Unicode normalization and Hebrew in ConTeXt Joey McCollum
@ 2020-04-28 13:17 ` Hans Hagen
       [not found]   ` <CAGxRUG_cyT1XfYTvTh23+LaVg4an8exL-LZYfbZUOf+s2XvRRQ@mail.gmail.com>
  2020-04-30  9:26 ` Hans Hagen
  1 sibling, 1 reply; 16+ messages in thread
From: Hans Hagen @ 2020-04-28 13:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Joey McCollum

On 4/28/2020 1:59 PM, Joey McCollum wrote:
> \definefontfeature[f:pointedhebrew][default][
>      ccmp=yes,
>      mark=yes,
>      script=hebr
> ]
> \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
> %Set the body font:
> \setupbodyfont[hebrew]
> %Set up right-to-left alignment:
> \setupalign[r2l]
> \starttext
>      %Characters after normalization, in Unicode canonical order (bet + 
> segol + dagesh + final nun):
>      בֶּן
> 
>      %A word with characters in typographically recommended order (bet + 
> dagesh + segol + final nun):
>      בֶּן
> \stoptext

\startluacode
     fonts.handlers.otf.addfeature {
         name    = "normalizehebrew",
         type    = "chainsubstitution",
         prepend = 1,
         lookups = {
             {
                 type = "multiple",
                 data = {
                     [0x5B6] = { 0x5BC, 0x5B6 },
                 },
             },
         },
         data = {
             rules = {
                 {
                     current = { { 0x5B6 }, { 0x5BC } },
                     lookups = { 1, 0 },
                 },
             },
         },
     }
\stopluacode

\definefontfeature
   [f:pointedhebrew]
   [hebrew]
   [normalizehebrew=yes]

\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]

\setupbodyfont[hebrew]

\setupalign[r2l]

\starttext
     בֶּן \quad בֶּן \par
\stoptext

How many such reorderings are there? (I saw some document about that 
font and it sounds like a bit messy wrt all these input variants.)

(there are several mechanisms in context to deal with such issues, it's 
all about getting specs from users i.e. tex is all about control so in 
principle it should be doable)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Fwd:  Unicode normalization and Hebrew in ConTeXt
       [not found]   ` <CAGxRUG_cyT1XfYTvTh23+LaVg4an8exL-LZYfbZUOf+s2XvRRQ@mail.gmail.com>
@ 2020-04-28 16:16     ` Joey McCollum
  2020-04-28 18:03       ` Hans Hagen
  2020-04-28 18:21       ` Hans Hagen
  0 siblings, 2 replies; 16+ messages in thread
From: Joey McCollum @ 2020-04-28 16:16 UTC (permalink / raw)
  To: ntg-context


[-- Attachment #1.1: Type: text/plain, Size: 5749 bytes --]

Thank you for the prompt and thorough response!

If the reorderings have to be done for each pair of characters in different
combining classes that are not in the expected typographical order, then
there will be a lot (probably hundreds) of substitution rules. I am not
very familiar with coding in Lua, but if there is a way to add substitution
features for specific classes of points, then that would require a lot
fewer cases.

Unicode's canonical ordering of Hebrew marks is based on their combining
classes, with characters in higher combining classes being sorted after
those with lower combining classes in canonical order. The typographically
recommended ordering of certain characters is found in Table 1 (p. 12) of
https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf. The following
list of character classes, with information about their Unicode combining
classes (which I retrieved from the Lua script
https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua),
is indexed after the character classes described in that table:
1. The consonants (Unicode points 05D0-05EA) have no combining class and
are never reordered; this is typographically correct.
2. Shin dot and sin dot (05C1-05C2) should be next, but Unicode places them
in combining classes 24 and 25, after the characters in recommended classes
3-5 and many of the characters in recommended class 6.
3. Dagesh / mapiq (05BC) should be next, but Unicode assigns it a combining
class of 21. This means that it will be incorrectly ordered before
characters in recommended class 2 and after characters in recommended
classes 4-6 after Unicode normalization.
4. Rafe (05BF) should be next, but Unicode assigns it a combining class of
23. Thus, it will be correctly placed after characters in recommended class
3, but incorrectly placed before characters in recommended class 2 after
Unicode normalization.
5. The holam and holam haser vowel points (05B9-05BA) should be next, but
Unicode places them in combining class 19. This means that it will be
placed incorrectly before characters in recommended classes 2-4 and after
all characters in recommended class 6 except 05BB after Unicode
normalization.
6. The characters in 0591, 0596, 059B, 05A2-05A7, 05AA, 05B0-05B8, 05BB,
05BD, 05C5, 05C7 should be treated as being in the same class, but Unicode
places them in combining classes 10-18, 20, 22, and 220.
7. The prepositive marks yetiv and dehi (059A, 05AD) should be next;
Unicode places them in combining class 222, so they should correctly come
after all characters in recommended classes 1-6.
8. The characters 0307, 0593-0595, 0597-0598, 059C-05A1, 05A8, 05AB-05AC,
05AF, 05C4 should be treated as being in the same class; Unicode places
them in combining class 230, so they should correctly come after all
characters in recommended classes 1-7.
9. The postpositive marks segolta, pashta, telisha qetana, and zinor (0592,
0599, 05A9, 05AE) should be next; Unicode places them in combining class
230, so they will need to be reordered after the characters in recommended
class 8.

This a lot of information, and I've probably not presented it as clearly as
I could, so if there is any confusion, please let me know, and I can try to
explain better. If there is any other information you need, please let me
know.

Thanks again!

On Tue, Apr 28, 2020 at 9:17 AM Hans Hagen <j.hagen@xs4all.nl> wrote:

> On 4/28/2020 1:59 PM, Joey McCollum wrote:
> > \definefontfeature[f:pointedhebrew][default][
> >      ccmp=yes,
> >      mark=yes,
> >      script=hebr
> > ]
> > \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
> > %Set the body font:
> > \setupbodyfont[hebrew]
> > %Set up right-to-left alignment:
> > \setupalign[r2l]
> > \starttext
> >      %Characters after normalization, in Unicode canonical order (bet +
> > segol + dagesh + final nun):
> >      בֶּן
> >
> >      %A word with characters in typographically recommended order (bet +
> > dagesh + segol + final nun):
> >      בֶּן
> > \stoptext
>
> \startluacode
>      fonts.handlers.otf.addfeature {
>          name    = "normalizehebrew",
>          type    = "chainsubstitution",
>          prepend = 1,
>          lookups = {
>              {
>                  type = "multiple",
>                  data = {
>                      [0x5B6] = { 0x5BC, 0x5B6 },
>                  },
>              },
>          },
>          data = {
>              rules = {
>                  {
>                      current = { { 0x5B6 }, { 0x5BC } },
>                      lookups = { 1, 0 },
>                  },
>              },
>          },
>      }
> \stopluacode
>
> \definefontfeature
>    [f:pointedhebrew]
>    [hebrew]
>    [normalizehebrew=yes]
>
> \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
>
> \setupbodyfont[hebrew]
>
> \setupalign[r2l]
>
> \starttext
>      בֶּן \quad בֶּן \par
> \stoptext
>
> How many such reorderings are there? (I saw some document about that
> font and it sounds like a bit messy wrt all these input variants.)
>
> (there are several mechanisms in context to deal with such issues, it's
> all about getting specs from users i.e. tex is all about control so in
> principle it should be doable)
>
> Hans
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
>

[-- Attachment #1.2: Type: text/html, Size: 7091 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Fwd: Unicode normalization and Hebrew in ConTeXt
  2020-04-28 16:16     ` Fwd: " Joey McCollum
@ 2020-04-28 18:03       ` Hans Hagen
  2020-04-28 18:21       ` Hans Hagen
  1 sibling, 0 replies; 16+ messages in thread
From: Hans Hagen @ 2020-04-28 18:03 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Joey McCollum

On 4/28/2020 6:16 PM, Joey McCollum wrote:
> Thank you for the prompt and thorough response!
> 
> If the reorderings have to be done for each pair of characters in 
> different combining classes that are not in the expected typographical 
> order, then there will be a lot (probably hundreds) of substitution 
> rules. I am not very familiar with coding in Lua, but if there is a way 
> to add substitution features for specific classes of points, then that 
> would require a lot fewer cases.

don't worry about that now, the lua part is normally the easy part (a 
where-it-makes-most-sense-hooking-into could take more thinking)

(i have been thinking of some additional feature mechanisms but have to 
find back some code i played with long ago)

> Unicode's canonical ordering of Hebrew marks is based on their combining 
> classes, with characters in higher combining classes being sorted after 
> those with lower combining classes in canonical order. The 
> typographically recommended ordering of certain characters is found in 
> Table 1 (p. 12) of 
> https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf. The 

So how official is that? Or is this something specific for this font?

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Fwd: Unicode normalization and Hebrew in ConTeXt
  2020-04-28 16:16     ` Fwd: " Joey McCollum
  2020-04-28 18:03       ` Hans Hagen
@ 2020-04-28 18:21       ` Hans Hagen
  2020-04-28 19:06         ` Joey McCollum
  2020-04-30 19:40         ` Arthur Reutenauer
  1 sibling, 2 replies; 16+ messages in thread
From: Hans Hagen @ 2020-04-28 18:21 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Joey McCollum

On 4/28/2020 6:16 PM, Joey McCollum wrote:
> https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua), 
looks like an ancient copy of char-def.lua

(we actually do have a combining entry)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Fwd: Unicode normalization and Hebrew in ConTeXt
  2020-04-28 18:21       ` Hans Hagen
@ 2020-04-28 19:06         ` Joey McCollum
  2020-04-30 19:40         ` Arthur Reutenauer
  1 sibling, 0 replies; 16+ messages in thread
From: Joey McCollum @ 2020-04-28 19:06 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1475 bytes --]

> > Unicode's canonical ordering of Hebrew marks is based on their combining
> > classes, with characters in higher combining classes being sorted after
> > those with lower combining classes in canonical order. The
> > typographically recommended ordering of certain characters is found in
> > Table 1 (p. 12) of
> > https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf. The

> So how official is that? Or is this something specific for this font?

Hebrew Layout Intelligence, which was developed by John Hudson of Tiro
Typeworks (who wrote this manual) and Ralph Hancock, is used for mark
positioning by a number of Hebrew fonts. Its guidelines govern their glyph
classes and chaining substitution rules. The fonts I know of that
explicitly implement it are SBL Hebrew, Ezra SIL, Keter YG, Keter Aram
Tsova.

On Tue, Apr 28, 2020 at 2:21 PM Hans Hagen <j.hagen@xs4all.nl> wrote:

> On 4/28/2020 6:16 PM, Joey McCollum wrote:
> >
> https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua),
>
> looks like an ancient copy of char-def.lua
>
> (we actually do have a combining entry)
>
> Hans
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
>

[-- Attachment #1.2: Type: text/html, Size: 2362 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2020-04-28 11:59 Unicode normalization and Hebrew in ConTeXt Joey McCollum
  2020-04-28 13:17 ` Hans Hagen
@ 2020-04-30  9:26 ` Hans Hagen
  2020-04-30 14:28   ` Joey McCollum
  1 sibling, 1 reply; 16+ messages in thread
From: Hans Hagen @ 2020-04-30  9:26 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Joey McCollum

On 4/28/2020 1:59 PM, Joey McCollum wrote:

 > ...

> My question is, can ConTeXt with LuaTeX handle the same situation 
> correctly? In the following minimal example, ConTeXt typesets pointed 
> Hebrew correctly when the characters are in the typographically 
> recommended order, but not when they are in Unicode canonical order:
We (Joey and I) figured out how to best deal with this. As a result the 
predefined hebrew feature now will do the right thing for fonts that 
assume some specific ordering. So, this should work okay:

\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=hebrew]

in the most recent upload.

Maybe there should be a wiki page that summarizes tests with hebrew 
fonts (but I leave that up to Joey).

Hans



-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2020-04-30  9:26 ` Hans Hagen
@ 2020-04-30 14:28   ` Joey McCollum
  2020-04-30 15:14     ` Hans Hagen
  0 siblings, 1 reply; 16+ messages in thread
From: Joey McCollum @ 2020-04-30 14:28 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1405 bytes --]

Thanks so much, Hans! I should be able to add a wiki page summarizing the
tests before the end of the week.

For reference purposes, do you know which version of ConTeXt has (or will
have) this update included?

Joey

On Thu, Apr 30, 2020 at 5:26 AM Hans Hagen <j.hagen@xs4all.nl> wrote:

> On 4/28/2020 1:59 PM, Joey McCollum wrote:
>
>  > ...
>
> > My question is, can ConTeXt with LuaTeX handle the same situation
> > correctly? In the following minimal example, ConTeXt typesets pointed
> > Hebrew correctly when the characters are in the typographically
> > recommended order, but not when they are in Unicode canonical order:
> We (Joey and I) figured out how to best deal with this. As a result the
> predefined hebrew feature now will do the right thing for fonts that
> assume some specific ordering. So, this should work okay:
>
> \definefontfamily[hebrew] [rm] [SBL Hebrew] [features=hebrew]
>
> in the most recent upload.
>
> Maybe there should be a wiki page that summarizes tests with hebrew
> fonts (but I leave that up to Joey).
>
> Hans
>
>
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
>

[-- Attachment #1.2: Type: text/html, Size: 2021 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2020-04-30 14:28   ` Joey McCollum
@ 2020-04-30 15:14     ` Hans Hagen
  2020-04-30 20:17       ` Joey McCollum
  0 siblings, 1 reply; 16+ messages in thread
From: Hans Hagen @ 2020-04-30 15:14 UTC (permalink / raw)
  To: Joey McCollum; +Cc: mailing list for ConTeXt users

On 4/30/2020 4:28 PM, Joey McCollum wrote:
> Thanks so much, Hans! I should be able to add a wiki page summarizing 
> the tests before the end of the week.
> 
> For reference purposes, do you know which version of ConTeXt has (or 
> will have) this update included?
todays upload


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Fwd: Unicode normalization and Hebrew in ConTeXt
  2020-04-28 18:21       ` Hans Hagen
  2020-04-28 19:06         ` Joey McCollum
@ 2020-04-30 19:40         ` Arthur Reutenauer
  1 sibling, 0 replies; 16+ messages in thread
From: Arthur Reutenauer @ 2020-04-30 19:40 UTC (permalink / raw)
  To: Mailing list for ConTeXt users

On Tue, Apr 28, 2020 at 08:21:01PM +0200, Hans Hagen wrote:
> On 4/28/2020 6:16 PM, Joey McCollum wrote:
>> https://raw.githubusercontent.com/michal-h21/uninormalize/master/char-def-with-ccc.lua),
> looks like an ancient copy of char-def.lua

  I recognise this file name :-)  That was from my Google Summer of Code
project in 2008.  The combining classes were not in char-def.lua at the
time, so it was simplest to work with a copy.  It’s interesting that it
stayed around so long.

	Best,

		Arthur
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2020-04-30 15:14     ` Hans Hagen
@ 2020-04-30 20:17       ` Joey McCollum
  2021-08-17  0:07         ` Joey McCollum via ntg-context
  0 siblings, 1 reply; 16+ messages in thread
From: Joey McCollum @ 2020-04-30 20:17 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 981 bytes --]

Okay! I have not figured out how to add a new page to the wiki, but I was
able to add a section to the end of the "Arabic and Hebrew" page (
https://www.contextgarden.net/Arabic_and_Hebrew) discussing the issue,
providing a test, and briefly describing the fix.

Joey

On Thu, Apr 30, 2020 at 11:14 AM Hans Hagen <j.hagen@xs4all.nl> wrote:

> On 4/30/2020 4:28 PM, Joey McCollum wrote:
> > Thanks so much, Hans! I should be able to add a wiki page summarizing
> > the tests before the end of the week.
> >
> > For reference purposes, do you know which version of ConTeXt has (or
> > will have) this update included?
> todays upload
>
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
>

[-- Attachment #1.2: Type: text/html, Size: 1591 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2020-04-30 20:17       ` Joey McCollum
@ 2021-08-17  0:07         ` Joey McCollum via ntg-context
  2021-08-17  9:19           ` Hans Hagen via ntg-context
  0 siblings, 1 reply; 16+ messages in thread
From: Joey McCollum via ntg-context @ 2021-08-17  0:07 UTC (permalink / raw)
  To: Hans Hagen; +Cc: Joey McCollum, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 4060 bytes --]

Hans,

Sorry to bring this up after over a year, but I just noticed something that
doesn't seem right. I implemented some contextual substitutions in my own
fork of the Keter YG Hebrew font (.ttf file attached) under the "dlig"
feature that should do the following two things:

   1. If a *shin *with a *sin *dot (שׂ) is pointed with a *holam *(the
   vowel point placed high and on the left), then the *shin*, *sin *dot,
   and *holam *are combined into a single ligature that depicts the *sin *dot
   and *holam *merged into a single point.
   2. If a *shin *with a *shin *dot (שׁ) follows another letter pointed
   with a *holam *(except for *vav*, which must be pointed with a *holam
   haser*), then the shin and shin dot are replaced with a ligature that
   moves the *shin* dot a bit to the right (so that it appears to be merged
   with the preceding *holam*), and the combination of the preceding letter
   and the actual holam is changed to just the preceding letter (thus
   effectively stripping the old *holam*).

I've tested both of these features in FontForge, and they work as expected
there. Likewise, if I test them in the following XeLaTeX script, XeLaTeX
handles both rules correctly:

```
\documentclass{article}
%Set fonts and font features:
\usepackage{fontspec}
\setmainfont[Path=../fonts/KeterYG/, UprightFont = *-Medium, Script=Hebrew,
Ligatures=Discretionary]{KeterYG} % I'm using a local copy of the attached
font
\begin{document}
שֹׂבַע

עָשׂוֹר

קֹשֶׁט

שֹׁשַׁנִּים

עָשׂוֹר

מֹשֶׁה

שַׁלֹשׁ
\end{document}
```

But in ConTeXt, only rule (1) above works as expected. Here is a minimal
(non-)working example:

```

\starttypescriptcollection[keteryg]

\starttypescript[serif][keteryg]

\definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew]
% use a local copy of the attached font, with all the necessary Hebrew
features (this includes dlig by default)

\stoptypescript


\starttypescript[keteryg]

\definetypeface[keteryg][rm][serif][keteryg][default]

\stoptypescript

\stoptypescriptcollection


%Set up the main font:

\setupbodyfont[keteryg]

%Set up right-to-left alignment:

\setupalign[r2l]

\starttext

שֹׂבַע

עָשׂוֹר

קֹשֶׁט

שֹׁשַׁנִּים

עָשׂוֹר

מֹשֶׁה

שַׁלֹשׁ

\stoptext
```

In examples 3, 4, 6, and 7, the *holam *dot still appears before the *shin*
-with-merged-*shin*-dot-and-*holam *ligature, when it should be absent. (I
realize that it may be difficult to tell; in the last two examples, the
presence of two dots is easier to make out.)

Do you have any idea why this might be happening in ConTeXt? Does the glyph
reordering in font-imp-combining.lua take place before any OpenType
features in the font are applied?

Thanks again!

Joey

On Thu, Apr 30, 2020 at 4:17 PM Joey McCollum <jmccollum20140511@gmail.com>
wrote:

> Okay! I have not figured out how to add a new page to the wiki, but I was
> able to add a section to the end of the "Arabic and Hebrew" page (
> https://www.contextgarden.net/Arabic_and_Hebrew) discussing the issue,
> providing a test, and briefly describing the fix.
>
> Joey
>
> On Thu, Apr 30, 2020 at 11:14 AM Hans Hagen <j.hagen@xs4all.nl> wrote:
>
>> On 4/30/2020 4:28 PM, Joey McCollum wrote:
>> > Thanks so much, Hans! I should be able to add a wiki page summarizing
>> > the tests before the end of the week.
>> >
>> > For reference purposes, do you know which version of ConTeXt has (or
>> > will have) this update included?
>> todays upload
>>
>>
>> -----------------------------------------------------------------
>>                                            Hans Hagen | PRAGMA ADE
>>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
>> -----------------------------------------------------------------
>>
>

[-- Attachment #1.2: Type: text/html, Size: 6651 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2021-08-17  0:07         ` Joey McCollum via ntg-context
@ 2021-08-17  9:19           ` Hans Hagen via ntg-context
  2021-08-17 12:56             ` Joey McCollum via ntg-context
  0 siblings, 1 reply; 16+ messages in thread
From: Hans Hagen via ntg-context @ 2021-08-17  9:19 UTC (permalink / raw)
  To: Joey McCollum; +Cc: Hans Hagen, mailing list for ConTeXt users

On 8/17/2021 2:07 AM, Joey McCollum wrote:

> Sorry to bring this up after over a year, but I just noticed something 
> that doesn't seem right. I implemented some contextual substitutions in 
> my own fork of the Keter YG Hebrew font (.ttf file attached) under the 
> "dlig" feature that should do the following two things:
but you don't enable dlig

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2021-08-17  9:19           ` Hans Hagen via ntg-context
@ 2021-08-17 12:56             ` Joey McCollum via ntg-context
  2021-08-17 19:46               ` Joey McCollum via ntg-context
  0 siblings, 1 reply; 16+ messages in thread
From: Joey McCollum via ntg-context @ 2021-08-17 12:56 UTC (permalink / raw)
  To: Hans Hagen; +Cc: Joey McCollum, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2430 bytes --]

Shouldn't dlig automatically be enabled under the "hebrew" feature set? In
font-pre.mkiv, hebrew inherits from semitic-complete, which sets dlig=yes.

Still, if I explicitly add dlig, as in the following example, things
change, but they still aren't right:

```

\starttypescriptcollection[keteryg]

\starttypescript[serif][keteryg]

\definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew]
% all the necessary Hebrew features, including dlig

\stoptypescript


\starttypescript[keteryg]

\definetypeface[keteryg][rm][serif][keteryg][default]

\stoptypescript

\stoptypescriptcollection


%Set up the main font:

\setupbodyfont[keteryg]

%Set up right-to-left alignment:

\setupalign[r2l]

%Explicitly add dlig (in case it wasn't there already):

\definefontfeature[plus-dlig][dlig=yes]


\starttext

\addff{plus-dlig}

שֹׂבַע

עָשׂוֹר

קֹשֶׁט

שֹׁשַׁנִּים

עָשׂוֹר

מֹשֶׁה

שַׁלֹשׁ

\stoptext ``` In examples 1, 3, 4, and 6, the *holam *of the preceding
letter (which should have been stripped in the contextual substitution)
just seems to have been moved farther up. In fact, the output looks like it
would look if I turned off the reordercombining feature. (And indeed, if I
manually reorder the glyphs to the Hebrew Layout Intelligence order, then
the results look like they did when I just used the "hebrew" feature.)


I may have forgotten to attach the font file I was using for this test. If
that is the case, it is available at https://github.com/jjmccollum/Keter-YG.


Joey

On Tue, Aug 17, 2021 at 5:19 AM Hans Hagen <j.hagen@xs4all.nl> wrote:

> On 8/17/2021 2:07 AM, Joey McCollum wrote:
>
> > Sorry to bring this up after over a year, but I just noticed something
> > that doesn't seem right. I implemented some contextual substitutions in
> > my own fork of the Keter YG Hebrew font (.ttf file attached) under the
> > "dlig" feature that should do the following two things:
> but you don't enable dlig
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
>

[-- Attachment #1.2: Type: text/html, Size: 4854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2021-08-17 12:56             ` Joey McCollum via ntg-context
@ 2021-08-17 19:46               ` Joey McCollum via ntg-context
  2021-08-18 13:19                 ` Hans Hagen via ntg-context
  0 siblings, 1 reply; 16+ messages in thread
From: Joey McCollum via ntg-context @ 2021-08-17 19:46 UTC (permalink / raw)
  To: Hans Hagen; +Cc: Joey McCollum, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2976 bytes --]

Thankfully, it looks like this was just a problem with my implementation of
the OpenType feature and not with ConTeXt's handling of it! (I worried that
it might be ConTeXt when I saw that XeLaTeX was handing the feature
correctly.) Hans graciously helped me identify the problem, and everything
looks good now!

Joey

On Tue, Aug 17, 2021 at 8:56 AM Joey McCollum <jmccollum20140511@gmail.com>
wrote:

> Shouldn't dlig automatically be enabled under the "hebrew" feature set? In
> font-pre.mkiv, hebrew inherits from semitic-complete, which sets dlig=yes.
>
> Still, if I explicitly add dlig, as in the following example, things
> change, but they still aren't right:
>
> ```
>
> \starttypescriptcollection[keteryg]
>
> \starttypescript[serif][keteryg]
>
> \definefontsynonym[Serif][file:../fonts/KeterYG/KeterYG-Medium.ttf][features=hebrew]
> % all the necessary Hebrew features, including dlig
>
> \stoptypescript
>
>
> \starttypescript[keteryg]
>
> \definetypeface[keteryg][rm][serif][keteryg][default]
>
> \stoptypescript
>
> \stoptypescriptcollection
>
>
> %Set up the main font:
>
> \setupbodyfont[keteryg]
>
> %Set up right-to-left alignment:
>
> \setupalign[r2l]
>
> %Explicitly add dlig (in case it wasn't there already):
>
> \definefontfeature[plus-dlig][dlig=yes]
>
>
> \starttext
>
> \addff{plus-dlig}
>
> שֹׂבַע
>
> עָשׂוֹר
>
> קֹשֶׁט
>
> שֹׁשַׁנִּים
>
> עָשׂוֹר
>
> מֹשֶׁה
>
> שַׁלֹשׁ
>
> \stoptext ``` In examples 1, 3, 4, and 6, the *holam *of the preceding
> letter (which should have been stripped in the contextual substitution)
> just seems to have been moved farther up. In fact, the output looks like it
> would look if I turned off the reordercombining feature. (And indeed, if I
> manually reorder the glyphs to the Hebrew Layout Intelligence order, then
> the results look like they did when I just used the "hebrew" feature.)
>
>
> I may have forgotten to attach the font file I was using for this test. If
> that is the case, it is available at
> https://github.com/jjmccollum/Keter-YG.
>
>
> Joey
>
> On Tue, Aug 17, 2021 at 5:19 AM Hans Hagen <j.hagen@xs4all.nl> wrote:
>
>> On 8/17/2021 2:07 AM, Joey McCollum wrote:
>>
>> > Sorry to bring this up after over a year, but I just noticed something
>> > that doesn't seem right. I implemented some contextual substitutions in
>> > my own fork of the Keter YG Hebrew font (.ttf file attached) under the
>> > "dlig" feature that should do the following two things:
>> but you don't enable dlig
>>
>> -----------------------------------------------------------------
>>                                            Hans Hagen | PRAGMA ADE
>>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
>> -----------------------------------------------------------------
>>
>

[-- Attachment #1.2: Type: text/html, Size: 5604 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode normalization and Hebrew in ConTeXt
  2021-08-17 19:46               ` Joey McCollum via ntg-context
@ 2021-08-18 13:19                 ` Hans Hagen via ntg-context
  0 siblings, 0 replies; 16+ messages in thread
From: Hans Hagen via ntg-context @ 2021-08-18 13:19 UTC (permalink / raw)
  To: Joey McCollum; +Cc: Hans Hagen, mailing list for ConTeXt users

On 8/17/2021 9:46 PM, Joey McCollum wrote:
> Thankfully, it looks like this was just a problem with my implementation 
> of the OpenType feature and not with ConTeXt's handling of it! (I 
> worried that it might be ConTeXt when I saw that XeLaTeX was handing the 
> feature correctly.) Hans graciously helped me identify the problem, and 
> everything looks good now!
Just for the record: one can best try to make a font as robust as 
possible and not rely on side effects (ambiguous cases). When Idris and 
I tested some shapers we found that there can be inconsistent results 
(fwiw, in a rather complex font context agreed more often with uniscribe 
than xetex, but in the end on ehas to make the font okay for all i guess).

When we started with opentype (luatex showed up in 2005) we took 
uniscribe as reference so that is our benchmark. And lack of specs made 
us figure out things stepwise. Now, if something works in one shaper and 
not in another it can of course be due to bugs but it can also be that 
the spec is simply fuzzy and choices have been made. There is then the 
danger that eventually bugs become features (I assume the amount of 
leverage matters here, and tex has zero) which then settles it (kind of) 
but that doesn't man that one should gamble on it.

The same is true for fontnames: don't rely too much on the heuristics 
hard coded in programs (e.g. fontforge has some for font names, 
properties, glyph names, and although that is nice for recovery, it also 
makes other usage hard because fighting fuzzy heuristics is hard once 
information is lost).

Btw, a side effect of your 'issue' is that I found a way to save some 
memory for some fonts (for now only in lmtx) at the cost of hopefully 
little extra runtime.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-08-18 13:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 11:59 Unicode normalization and Hebrew in ConTeXt Joey McCollum
2020-04-28 13:17 ` Hans Hagen
     [not found]   ` <CAGxRUG_cyT1XfYTvTh23+LaVg4an8exL-LZYfbZUOf+s2XvRRQ@mail.gmail.com>
2020-04-28 16:16     ` Fwd: " Joey McCollum
2020-04-28 18:03       ` Hans Hagen
2020-04-28 18:21       ` Hans Hagen
2020-04-28 19:06         ` Joey McCollum
2020-04-30 19:40         ` Arthur Reutenauer
2020-04-30  9:26 ` Hans Hagen
2020-04-30 14:28   ` Joey McCollum
2020-04-30 15:14     ` Hans Hagen
2020-04-30 20:17       ` Joey McCollum
2021-08-17  0:07         ` Joey McCollum via ntg-context
2021-08-17  9:19           ` Hans Hagen via ntg-context
2021-08-17 12:56             ` Joey McCollum via ntg-context
2021-08-17 19:46               ` Joey McCollum via ntg-context
2021-08-18 13:19                 ` Hans Hagen via ntg-context

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).