Re: accessing glyphs in the private area

From: Hans Hagen <j.hagen@xs4all.nl>
To: news3@nililand.de, mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: accessing glyphs in the private area
Date: Tue, 2 Oct 2018 11:29:46 +0200	[thread overview]
Message-ID: <c91b7ed4-2ab2-a5ba-1481-9cfcc7c47cc8@xs4all.nl> (raw)
In-Reply-To: <1tv75weq27x7z.dlg@nililand.de>

On 10/2/2018 9:29 AM, Ulrike Fischer wrote:
> Am Tue, 2 Oct 2018 06:55:02 +0200 schrieb luigi scarso:
> 
>>> For what do you reserve the space in the PUA?
> 
>>   http://www.pragma-ade.nl/general/manuals/fonts-mkiv.pdf
>> page 32 of the document :
>   
>> As we already mentioned in a previous chapter, in ConTeXt we use
>> Unicode internally. This also means that fonts are organized this
>> way. By default the glyph representation of a Unicode character
>> sits in the same slot in the glyph table. All additional glyphs,
>> like ligatures or alternates are pushed in the private unicode
>> space. This is why in the lists shown in the figures the
>> ligatures have a private Unicode number.
> 
> Hm. To clarify. In xetex there is clear distinction between the slot
> and unicode. \XeTeXglyph (slot) and \char (unicode) give different
> output and \char actively uses the tounicode mapping of the font.
> 
> \font\test="[lmroman10-regular.otf]"
> \test
> \XeTeXglyph"7A
> \char"7A
> \bye
>  
> In luatex \char and \Uchar don't really care about unicode, even if
> the font has tounicode=1 and tounicode entries, they access the char
> by the hashed integer number.

they access the char in the characters table (where each character has 
an index field so one can write a simple function that accesses it by 
index; also, i assume that in xetex \char gives the character as known 
to tex so if one input non-unicode one gets that)

> So to get "unicode" the font loader has to sort the glyphs, index
> unicode glyphs by their unicode code point, and assign "non-unicode"
> glyphs numbers that don't interfere.
> 
> Did I got right?

indeed, and we use the private space for those with no unicode (which 
can be a lot, also think for instance of the snippets that make up math 
extensibles)

> Then I do understand that you need some free numbers to push
> glyphes. But I do not understand why to achieve this you remove
> glyphs from their unicode points. The PUA is not some non-unicode
> wilderness. The code points there are as valid as in the other code
> blocks. You wouldn't move away the greek block to get the place, so
> why do you think it is okay to throw out of the PUA block what SIL
> and other font designers encoded there?  Can't you check for a free
> range instead?

sure, but then i also loose some functionality in context (unless i gho 
for ugly solutions) ... as all glyphs are supposed to have a name access 
by name is a pretty good alternative

the main issue is that there are fonts that use private > 0xFFFF space 
which then would mean a lot of extra mem for names ... so the question 
is are there fonts that use that range

Hans

-- 

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________