Unicode normalization and Hebrew in ConTeXt

* Unicode normalization and Hebrew in ConTeXt
@ 2020-04-28 11:59 Joey McCollum
  2020-04-28 13:17 ` Hans Hagen
  2020-04-30  9:26 ` Hans Hagen
  0 siblings, 2 replies; 16+ messages in thread
From: Joey McCollum @ 2020-04-28 11:59 UTC (permalink / raw)
  To: ntg-context

[-- Attachment #1.1: Type: text/plain, Size: 2670 bytes --]

I am typesetting a document in Hebrew that includes pointing (e.g., vowels,
shin and sin dots, dagesh, etc.) using ConTeXt. The Hebrew text that I want
to typeset has been normalized into Unicode's NFC canonical form. It is
well-known that the Unicode canonical ordering of Hebrew points conflicts
with the recommended mark ordering of specific points based on their
functions (see https://www.sbl-site.org/Fonts/SBLHebrewUserManual1.5x.pdf
for more on this topic). Thankfully, many typesetting engines automatically
reorder the points to ensure that they are combined according to the
specifications of many fonts. I'm pretty sure that XeLaTeX is one of these,
as it typesets Hebrew letters with multiple points correctly even when the
Hebrew text is in NFC form.

My question is, can ConTeXt with LuaTeX handle the same situation
correctly? In the following minimal example, ConTeXt typesets pointed
Hebrew correctly when the characters are in the typographically recommended
order, but not when they are in Unicode canonical order:

```
%Setup Hebrew text font:
\definefontfeature[f:pointedhebrew][default][
    ccmp=yes,
    mark=yes,
    script=hebr
]
\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=f:pointedhebrew]
%Set the body font:
\setupbodyfont[hebrew]
%Set up right-to-left alignment:
\setupalign[r2l]
\starttext
    %Characters after normalization, in Unicode canonical order (bet +
segol + dagesh + final nun):
    בֶּן

    %A word with characters in typographically recommended order (bet +
dagesh + segol + final nun):
    בֶּן
\stoptext
```

I typeset this using ConTeXt version 2020.03.10, as released with TeXLive
2020. I got the SBL Hebrew font from
https://www.sbl-site.org/educational/BiblicalFonts_SBLHebrew.aspx.
According to the font's user manual (see the link above the MWE), the font
should be able to combine the marks to form the correct glyph regardless of
their order after the consonant, but that doesn't seem to be the case here.
I also tried using the predefined "hebrew" featureset, but that did not
change anything.

Is there some other OpenType feature or featureset I need to enable to fix
this, or is there some module or option I can include to get ConTeXt to
typeset Unicode-normalized Hebrew as if it were ordered in the recommended
way, like XeLaTeX does? I see that the uninormalize module is mentioned in
the thread "XeLaTeX, LuaLaTeX, fontspec, unicode and normalization" on TeX
Stack Exchange (
https://tex.stackexchange.com/questions/229044/xelatex-lualatex-fontspec-unicode-and-normalization);
can that be used with ConTeXt?

Thank you,

Joey

[-- Attachment #1.2: Type: text/html, Size: 3133 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 16+ messages in thread