ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* unic-xxx.tex glyph lists: minor bugs, questions
@ 2006-11-05  1:24 Philipp Reichmuth
  2006-11-05 14:27 ` Hans Hagen
  0 siblings, 1 reply; 12+ messages in thread
From: Philipp Reichmuth @ 2006-11-05  1:24 UTC (permalink / raw)


Hi,

I've been writing a script that sifts through the unic-xxx.tex files to 
get a readable mapping what Unicode characters are supported using 
\Amacron-style names.

In the process I found one bug and something that might be another bug:

- the Cyrillic block (unic-004.tex) is missing an \unknownchar line for 
U+04CF, so that the remaining (few) glyphs are off by one

- the Hebrew block (unic-005.tex) starts with a \numexpr line indicating 
an offset of 224 = E0; however, the first character in the list is 
U+05D0.  So either the whole block is off by 16, starting at 0x0490 
instead of 0x0500, or the 224 should be a 208 (=D0) instead.  BTW 
unic-005.tex is the only file with Macintosh line endings. Are the 
unic-xxx files automatically generated or maintained by hand?

Incidentally, it would be trivial now to put the list of ConTeXt glyphs 
on the Wiki, if anyone's interested.

I wanted to use this to work towards better support for the whole range 
of ConTeXt glyphs with OpenType fonts under XeTeX, by reading what 
ConTeXt glyphs are available in a font and building a list of 
"\catcode`ā=\active \def ā {\amacron}"-style list for the rest. 
(Unfortunately this kind of list would be font-specific, but the generic 
alternative would be a huge list of active characters with an 
\ifnum\XeTeXcharglyph"....>0 macro behind it, and that would probable be 
quite slow.)  I wonder if there is a more intelligent way to achieve 
this goal; since part of the logic for mapping code points into glyph 
macros exists already, it would be easier if there was a way to reuse that.

The best way out would be if I could enable ConTeXt's UTF-8 regime while 
running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten 
that to work yet.

Philipp

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-05  1:24 unic-xxx.tex glyph lists: minor bugs, questions Philipp Reichmuth
@ 2006-11-05 14:27 ` Hans Hagen
  2006-11-06 22:56   ` Philipp Reichmuth
  2006-11-09 16:51   ` Mojca Miklavec
  0 siblings, 2 replies; 12+ messages in thread
From: Hans Hagen @ 2006-11-05 14:27 UTC (permalink / raw)


Philipp Reichmuth wrote:
> I've been writing a script that sifts through the unic-xxx.tex files to 
> get a readable mapping what Unicode characters are supported using 
> \Amacron-style names.
>   
mtxtools can create such lists using the unicode consotium glyph table, 
mojca's mapping list and enco/regi files

we use mtxtools to create the tables needed for xetex (used for case 
mapping) and luatex (more extensive manipulations)
> In the process I found one bug and something that might be another bug:
>
> - the Cyrillic block (unic-004.tex) is missing an \unknownchar line for 
> U+04CF, so that the remaining (few) glyphs are off by one
>   
just mail me the patched file
> - the Hebrew block (unic-005.tex) starts with a \numexpr line indicating 
> an offset of 224 = E0; however, the first character in the list is 
> U+05D0.  So either the whole block is off by 16, starting at 0x0490 
> instead of 0x0500, or the 224 should be a 208 (=D0) instead.  BTW 
> unic-005.tex is the only file with Macintosh line endings. Are the 
> unic-xxx files automatically generated or maintained by hand?
>   
maintained by hand, again, just send me the fixed file, but we need to 
make sure that the fix is ok (i.e. works as expected)
> Incidentally, it would be trivial now to put the list of ConTeXt glyphs 
> on the Wiki, if anyone's interested.
>   
there is a file  contextnames.txt in the distributions (maintained by 
mojca), while the not yet distributed char-def.lua has the info for luatex
> I wanted to use this to work towards better support for the whole range 
> of ConTeXt glyphs with OpenType fonts under XeTeX, by reading what 
> ConTeXt glyphs are available in a font and building a list of 
> "\catcode`ā=\active \def ā {\amacron}"-style list for the rest. 
> (Unfortunately this kind of list would be font-specific, but the generic 
> alternative would be a huge list of active characters with an 
> \ifnum\XeTeXcharglyph"....>0 macro behind it, and that would probable be 
> quite slow.)  I wonder if there is a more intelligent way to achieve 
> this goal; since part of the logic for mapping code points into glyph 
> macros exists already, it would be easier if there was a way to reuse that.
>   
best take a look at mtxtools; if needed we can generate the definitions 
; concerning speed, it will not be that slow, because tex is quite fast 
on such tests (unless XeTeXcharglyph is slow due to lib access); the 
biggest thing is to make sure that things don't expand in unwanted ways.

(i must find time to update my xetex bin ; i must admit that i never 
tried to use open type fonts in xetex (the mac is broken)
> The best way out would be if I could enable ConTeXt's UTF-8 regime while 
> running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten 
> that to work yet.
>   
maybe mojca has

Hans

-- 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-05 14:27 ` Hans Hagen
@ 2006-11-06 22:56   ` Philipp Reichmuth
  2006-11-09 16:51   ` Mojca Miklavec
  1 sibling, 0 replies; 12+ messages in thread
From: Philipp Reichmuth @ 2006-11-06 22:56 UTC (permalink / raw)


Hans Hagen schrieb:
> mtxtools can create such lists using the unicode consotium glyph table, 
> mojca's mapping list and enco/regi files
> 
> we use mtxtools to create the tables needed for xetex (used for case 
> mapping) and luatex (more extensive manipulations)

Sounds interesting.  Where do I get that?  It's not in the distribution.

> maintained by hand, again, just send me the fixed file, but we need to 
> make sure that the fix is ok (i.e. works as expected)

OK, I just sent you the files.  Can anyone test this who can read Hebrew?

> best take a look at mtxtools; if needed we can generate the definitions 
> ; concerning speed, it will not be that slow, because tex is quite fast 
> on such tests (unless XeTeXcharglyph is slow due to lib access); the 
> biggest thing is to make sure that things don't expand in unwanted ways.

OK, I'll experiment a little bit and if anything comes of it I'll post 
again.

Philipp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-05 14:27 ` Hans Hagen
  2006-11-06 22:56   ` Philipp Reichmuth
@ 2006-11-09 16:51   ` Mojca Miklavec
  2006-11-09 16:56     ` Mojca Miklavec
  2006-11-14 11:42     ` Hans Hagen
  1 sibling, 2 replies; 12+ messages in thread
From: Mojca Miklavec @ 2006-11-09 16:51 UTC (permalink / raw)


On 11/5/06, Hans Hagen wrote:
> Philipp Reichmuth wrote:
> > I've been writing a script that sifts through the unic-xxx.tex files to
> > get a readable mapping what Unicode characters are supported using
> > \Amacron-style names.
> >
> mtxtools can create such lists using the unicode consotium glyph table,
> mojca's mapping list and enco/regi files
>
> we use mtxtools to create the tables needed for xetex (used for case
> mapping) and luatex (more extensive manipulations)

I have mtxtools.bat, but no mtxtools.rb here.

> > Are the
> > unic-xxx files automatically generated or maintained by hand?
> >
> maintained by hand, again, just send me the fixed file, but we need to
> make sure that the fix is ok (i.e. works as expected)

Although there should be no reason for not generating them
automatically. I did that for regime files (I only wrote a script,
executed it and Hans included the files, so it's only semi-automatic;
it would be polite from me if I managed to incorporate that into
existing [whateverthename]tools.rb).

> > Incidentally, it would be trivial now to put the list of ConTeXt glyphs
> > on the Wiki, if anyone's interested.
> >
> there is a file  contextnames.txt in the distributions (maintained by
> mojca), while the not yet distributed char-def.lua has the info for luatex

If you find errors there, please let me know. (Missing letter in
Cyrillic was due to missing position in Unicode).

> > I wanted to use this to work towards better support for the whole range
> > of ConTeXt glyphs with OpenType fonts under XeTeX, by reading what
> > ConTeXt glyphs are available in a font and building a list of
> > "\catcode`ā=\active \def ā {\amacron}"-style list for the rest.
> > (Unfortunately this kind of list would be font-specific, but the generic
> > alternative would be a huge list of active characters with an
> > \ifnum\XeTeXcharglyph"....>0 macro behind it, and that would probable be
> > quite slow.)  I wonder if there is a more intelligent way to achieve
> > this goal; since part of the logic for mapping code points into glyph
> > macros exists already, it would be easier if there was a way to reuse that.
> >
> best take a look at mtxtools; if needed we can generate the definitions
> ; concerning speed, it will not be that slow, because tex is quite fast
> on such tests (unless XeTeXcharglyph is slow due to lib access); the
> biggest thing is to make sure that things don't expand in unwanted ways.
>
> (i must find time to update my xetex bin ; i must admit that i never
> tried to use open type fonts in xetex (the mac is broken)

But OpenType fonts also work on Linux & Windows.

> > The best way out would be if I could enable ConTeXt's UTF-8 regime while
> > running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten
> > that to work yet.
> >
> maybe mojca has

You could theoretically comment out \beginXETEX \expandafter \endinput
\endXETEX in regi-utf.tex, but that's not the best idea.

Mojca

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-09 16:51   ` Mojca Miklavec
@ 2006-11-09 16:56     ` Mojca Miklavec
  2006-11-14 11:35       ` Hans Hagen
  2006-11-14 11:42     ` Hans Hagen
  1 sibling, 1 reply; 12+ messages in thread
From: Mojca Miklavec @ 2006-11-09 16:56 UTC (permalink / raw)


> > > The best way out would be if I could enable ConTeXt's UTF-8 regime while
> > > running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten
> > > that to work yet.

That would mean that you loose the whole range of glyphs & scripts
outside of the scope which ConTeXt supports (you would land almost at
the level of pdfTeX again). For most european users that might still
be something reasonable, but I wouldn't go that way.

> > maybe mojca has

(little correction to what I wrote in my previous mail)

If you were really looking for that part of code - simply replace
\expandafter \endinput inside XETEX block in regi-utf.tex with
\XeTeXinputencoding=bytes. Then \enableregime[utf-8] will mean that
ConTeXt took control over utf instead of XeTeX. From what I understood
on the wiki, it probably used to be that way at the beginning, but
then Hans changed his mind and decided to ignore \enableregime[utf]
completely when processing with XeTeX.

Mojca

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-09 16:56     ` Mojca Miklavec
@ 2006-11-14 11:35       ` Hans Hagen
  0 siblings, 0 replies; 12+ messages in thread
From: Hans Hagen @ 2006-11-14 11:35 UTC (permalink / raw)


Mojca Miklavec wrote:
>>>> The best way out would be if I could enable ConTeXt's UTF-8 regime while
>>>> running XeTeX in \XeTeXinputencoding=bytes mode, but I haven't gotten
>>>> that to work yet.
>>>>         
>
> That would mean that you loose the whole range of glyphs & scripts
> outside of the scope which ConTeXt supports (you would land almost at
> the level of pdfTeX again). For most european users that might still
> be something reasonable, but I wouldn't go that way.
>
>   
>>> maybe mojca has
>>>       
>
> (little correction to what I wrote in my previous mail)
>
> If you were really looking for that part of code - simply replace
> \expandafter \endinput inside XETEX block in regi-utf.tex with
> \XeTeXinputencoding=bytes. Then \enableregime[utf-8] will mean that
> ConTeXt took control over utf instead of XeTeX. From what I understood
> on the wiki, it probably used to be that way at the beginning, but
> then Hans changed his mind and decided to ignore \enableregime[utf]
> completely when processing with XeTeX.
>   
indeed; when this is uncommented (i.e. traditional utf is used) ... do the patterns still work as expected? 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-09 16:51   ` Mojca Miklavec
  2006-11-09 16:56     ` Mojca Miklavec
@ 2006-11-14 11:42     ` Hans Hagen
  2006-11-16 10:49       ` Philipp Reichmuth
  2006-11-16 13:12       ` Sanjoy Mahajan
  1 sibling, 2 replies; 12+ messages in thread
From: Hans Hagen @ 2006-11-14 11:42 UTC (permalink / raw)


Mojca Miklavec wrote:
> On 11/5/06, Hans Hagen wrote:
>   
>> Philipp Reichmuth wrote:
>>     
>>> I've been writing a script that sifts through the unic-xxx.tex files to
>>> get a readable mapping what Unicode characters are supported using
>>> \Amacron-style names.
>>>
>>>       
>> mtxtools can create such lists using the unicode consotium glyph table,
>> mojca's mapping list and enco/regi files
>>
>> we use mtxtools to create the tables needed for xetex (used for case
>> mapping) and luatex (more extensive manipulations)
>>     
>
> I have mtxtools.bat, but no mtxtools.rb here.
>   
hm, will be in the mkiv zip soon (or maybe all mkiv code will be in the 
main zip; depends on how many context users want to experiment with the 
declared-stable parts of luatex)
>   
>>> Are the
>>> unic-xxx files automatically generated or maintained by hand?
>>>
>>>       
>> maintained by hand, again, just send me the fixed file, but we need to
>> make sure that the fix is ok (i.e. works as expected)
>>     
>
> Although there should be no reason for not generating them
> automatically. I did that for regime files (I only wrote a script,
> executed it and Hans included the files, so it's only semi-automatic;
> it would be polite from me if I managed to incorporate that into
> existing [whateverthename]tools.rb).
>   
we should indeed discuss a was to keep these things up to date esp since 
in context mkiv we will use

    [0x00F4] = { unicodeslot=0x00F4, category='ll', 
adobename='ocircumflex', contextname='ocircumflex', description='LATIN 
SMALL LETTER O WITH CIRCUMFLEX', shcode=0x006F, uccode=0x00D4 },
    [0x00F5] = { unicodeslot=0x00F5, category='ll', adobename='otilde', 
contextname='otilde', description='LATIN SMALL LETTER O WITH TILDE', 
shcode=0x006F, uccode=0x00D5 },

like table entries for manipulating encodings, fonts, and whatever

>
> But OpenType fonts also work on Linux & Windows.
>   
sure, but one needs this fontconfig thing ; in my opinion xetex makes 
sense when it integrates automatically into the os-specific fotn stuff, 
since xetex has the 'use libraries when possible' approach; so, i'll 
happily wait till the announced integration is there

(i prefer to invest my time only once in this area: cook up a generic 
and flexible way for luatex and then derive xetex stuff from that)

Hans


-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-14 11:42     ` Hans Hagen
@ 2006-11-16 10:49       ` Philipp Reichmuth
  2006-11-16 13:12       ` Sanjoy Mahajan
  1 sibling, 0 replies; 12+ messages in thread
From: Philipp Reichmuth @ 2006-11-16 10:49 UTC (permalink / raw)


Hans Hagen schrieb:
> hm, will be in the mkiv zip soon (or maybe all mkiv code will be in the 
> main zip; depends on how many context users want to experiment with the 
> declared-stable parts of luatex)

I would.

>> But OpenType fonts also work on Linux & Windows.
>   
> sure, but one needs this fontconfig thing ;

But at least on Windows that's relatively transparent.  You tell 
fontconfig once where your fonts are (e.g. c:\windows\fonts), and that's 
it, basically.

Philipp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-14 11:42     ` Hans Hagen
  2006-11-16 10:49       ` Philipp Reichmuth
@ 2006-11-16 13:12       ` Sanjoy Mahajan
  2006-11-16 16:20         ` Hans Hagen
  1 sibling, 1 reply; 12+ messages in thread
From: Sanjoy Mahajan @ 2006-11-16 13:12 UTC (permalink / raw)


Hans Hagen wrote:
> depends on how many context users want to experiment with the 
> declared-stable parts of luatex

I'll experiment, especially if I can figure out a set of magic
kpathsea paths to keep mkii and mkiv in parallel.

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
         --Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-16 13:12       ` Sanjoy Mahajan
@ 2006-11-16 16:20         ` Hans Hagen
  2006-11-22 23:54           ` Sanjoy Mahajan
  0 siblings, 1 reply; 12+ messages in thread
From: Hans Hagen @ 2006-11-16 16:20 UTC (permalink / raw)


Sanjoy Mahajan wrote:
> Hans Hagen wrote:
>   
>> depends on how many context users want to experiment with the 
>> declared-stable parts of luatex
>>     
>
> I'll experiment, especially if I can figure out a set of magic
> kpathsea paths to keep mkii and mkiv in parallel.
>
>   
no need for that ; it is made to run in parallel, just an extra zip with 
mkiv and lua files ending up in base, and luatools.lua ending up in the script path; also, mkiv does not use kpse -)  

Hans 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-16 16:20         ` Hans Hagen
@ 2006-11-22 23:54           ` Sanjoy Mahajan
  2006-11-23  9:56             ` Hans Hagen
  0 siblings, 1 reply; 12+ messages in thread
From: Sanjoy Mahajan @ 2006-11-22 23:54 UTC (permalink / raw)


>> I'll experiment, especially if I can figure out a set of magic
>> kpathsea paths to keep mkii and mkiv in parallel.

> no need for that ; it is made to run in parallel, just an extra zip
> with mkiv and lua files ending up in base, and luatools.lua ending
> up in the script path; also, mkiv does not use kpse -)

Great.  When the mkiv zip is available, I'll try it (tried poking
around the pragma site but didn't find it).  No kpse is indeed good
news!

About backward compatability, maybe the mkiv transition is the time to
sacrifice backward compatability in a few instances where it makes the
code or user interface simpler?  One example off the top of my head is
\setuppapersize[ABC] becoming equivalent to \setuppapersize[ABC][ABC]
(rather than to \setuppapersize[ABC][A4]), and there are no doubt
others.  Or is the (understandable) policy of ConTeXt development that
backward compatability is paramount?

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
         --Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: unic-xxx.tex glyph lists: minor bugs, questions
  2006-11-22 23:54           ` Sanjoy Mahajan
@ 2006-11-23  9:56             ` Hans Hagen
  0 siblings, 0 replies; 12+ messages in thread
From: Hans Hagen @ 2006-11-23  9:56 UTC (permalink / raw)


Sanjoy Mahajan wrote:
>>> I'll experiment, especially if I can figure out a set of magic
>>> kpathsea paths to keep mkii and mkiv in parallel.
>>>       
>
>   
>> no need for that ; it is made to run in parallel, just an extra zip
>> with mkiv and lua files ending up in base, and luatools.lua ending
>> up in the script path; also, mkiv does not use kpse -)
>>     
>
> Great.  When the mkiv zip is available, I'll try it (tried poking
> around the pragma site but didn't find it).  No kpse is indeed good
> news!
>   
i will put it there as soon as tex live is really frozen since we don't 
want a mess up now
> About backward compatability, maybe the mkiv transition is the time to
> sacrifice backward compatability in a few instances where it makes the
> code or user interface simpler?  One example off the top of my head is
> \setuppapersize[ABC] becoming equivalent to \setuppapersize[ABC][ABC]
> (rather than to \setuppapersize[ABC][A4]), and there are no doubt
> others.  Or is the (understandable) policy of ConTeXt development that
> backward compatability is paramount?
>   
i've always tried to be downward compatible, but some changes are less 
dangerous (like the setuppapersize proposal)

however, such changes then would also affect mkii (typesetting part 
mostly the same);

another issue is that i want to move towards a 'macro package building 
block' approach so that one can combine components to make specialized 
versions

anyhow, you can collect UI issues and organize a poll on the wiki

Hans

-- 

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                             | www.pragma-pod.nl
-----------------------------------------------------------------

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-11-23  9:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-05  1:24 unic-xxx.tex glyph lists: minor bugs, questions Philipp Reichmuth
2006-11-05 14:27 ` Hans Hagen
2006-11-06 22:56   ` Philipp Reichmuth
2006-11-09 16:51   ` Mojca Miklavec
2006-11-09 16:56     ` Mojca Miklavec
2006-11-14 11:35       ` Hans Hagen
2006-11-14 11:42     ` Hans Hagen
2006-11-16 10:49       ` Philipp Reichmuth
2006-11-16 13:12       ` Sanjoy Mahajan
2006-11-16 16:20         ` Hans Hagen
2006-11-22 23:54           ` Sanjoy Mahajan
2006-11-23  9:56             ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).