From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/14411
Path: main.gmane.org!not-for-mail
From: Hans Hagen <pragma@wxs.nl>
Newsgroups: gmane.comp.tex.context
Subject: Re: unicode and out-of-box usability
Date: Sat, 03 Jan 2004 23:38:02 +0100
Sender: ntg-context-admin@ntg.nl
Message-ID: <6.0.1.1.2.20040103232206.01e51ec0@localhost>
References: <20040102175913.3787@smtp.btinternet.com>
Reply-To: ntg-context@ntg.nl
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Trace: sea.gmane.org 1073249791 10669 80.91.224.253 (4 Jan 2004 20:56:31 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Sun, 4 Jan 2004 20:56:31 +0000 (UTC)
Original-X-From: ntg-context-admin@ntg.nl Sun Jan 04 21:56:29 2004
Return-path: <ntg-context-admin@ntg.nl>
Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl)
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1AdFIT-0007x2-00
	for <gctc-ntg-context-518@gmane.org>; Sun, 04 Jan 2004 21:56:29 +0100
Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1])
	by ref.ntg.nl (Postfix) with ESMTP
	id 7E4FF10B57; Sun,  4 Jan 2004 21:55:35 +0100 (MET)
Original-Received: from mail.solcon.net (mail.solcon.net [212.45.33.5])
	by ref.ntg.nl (Postfix) with ESMTP id 16C7110B3E
	for <ntg-context@ntg.nl>; Sun,  4 Jan 2004 21:53:10 +0100 (MET)
Original-Received: from server-1.pragma-net.nl (dsl-212-84-128-085.solcon.nl [212.84.128.85])
	by mail.solcon.net (8.11.6/8.9.3) with ESMTP id i04KrOp07612
	for <ntg-context@ntg.nl>; Sun, 4 Jan 2004 21:53:24 +0100
Original-Received: by server-1.pragma-net.nl (Postfix, from userid 65534)
	id 612962E94E; Sun,  4 Jan 2004 21:53:33 +0100 (CET)
Original-Received: from laptop-3.wxs.nl (unknown [10.100.1.1])
	by server-1.pragma-net.nl (Postfix) with ESMTP id BAC452E94C
	for <ntg-context@ntg.nl>; Sun,  4 Jan 2004 20:53:29 +0000 (UTC)
X-Sender: hagen-mail@localhost
X-Mailer: QUALCOMM Windows Eudora Version 6.0.1.1
Original-To: ntg-context@ntg.nl
In-Reply-To: <20040102175913.3787@smtp.btinternet.com>
Original-References: <20040102175913.3787@smtp.btinternet.com>
Errors-To: ntg-context-admin@ntg.nl
X-BeenThere: ntg-context@ntg.nl
X-Mailman-Version: 2.0.13
Precedence: bulk
List-Help: <mailto:ntg-context-request@ntg.nl?subject=help>
List-Post: <mailto:ntg-context@ntg.nl>
List-Subscribe: <http://www.ntg.nl/mailman/listinfo/ntg-context>,
	<mailto:ntg-context-request@ntg.nl?subject=subscribe>
List-Id: mailing list for ConTeXt users <ntg-context.ntg.nl>
List-Unsubscribe: <http://www.ntg.nl/mailman/listinfo/ntg-context>,
	<mailto:ntg-context-request@ntg.nl?subject=unsubscribe>
List-Archive: <http://www.ntg.nl/pipermail/ntg-context/>
Xref: main.gmane.org gmane.comp.tex.context:14411
X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:14411

At 18:59 02/01/2004, you wrote:

>I've been struggling through, trying to learn Unicode in ConTeXt. It's
>been instructive, at least. (Hope to make a MyWay about it...)

good

>There are a few weird things that made it difficult to learn, and I was
>wondering if someone could help explain why things are the way they are.
>
>In unic-ini:
>\chardef\utfunihashmode=0 % 1 = enabled
>
>Actually, if I understand things correctly, '1' means "disabled", which
>is what I preferred, having not yet created any unicode vectors. So the
>internal documentation there seems wrong, and I would argue the default
>case (0) makes it harder for beginners.

hm, did you look at the unic-001 etc files? the trick is in fast and efficient
expansion without the need to define lots of named glyphs

>More confusingly, in font-uni:

forget about that one, although it's called unicode, it's actually a 
mechanism for
the many vectors derived from unicode / related to unicode but not entirely 
i.e. cjk fonts

>\def\enableunicodefont#1%
>   {\definefontsynonym[\s!Unicode][\getvalue{\??uc#1\c!file}]%
>    \def\unicodescale             {\getvalue{\??uc#1\c!schaal}}%
>    \def\unicodeheight            {\getvalue{\??uc#1\c!hoogte}}%
>    \def\unicodedepth             {\getvalue{\??uc#1\c!diepte}}%
>    \def\unicodedigits            {\getvalue{\??uc#1\c!conversie}}%
>    \def\handleunicodeglyph       {\getvalue{\??uc#1\c!commando}}%
>%%%%%%%%%%% NEXT LINE
>    \enableregime[unicode]% the following \relax's are realy needed
>    \doifvalue{\??uc#1\c!interlinie}\v!ja\setupinterlinespace\relax
>    \getvalue{\??uc#1\c!commandos}\relax}
>
>The \enableregime[unicode] runs in direct opposition with the
>\enableregime[utf] that normally goes at the start of (some of my)
>documents. As it stands, with the regime hard-coded, users have to put an
>\enableregime[utf] *after* the font declaration. That's awkward.

so, don't use that mechanism, stick to the utf mechanism

>The last proposed change/complaint is back in unic-ini, and came from my
>attempts to match the main body font with the unicode font.
>
>\def\utfunifontglyph#1%
>   {\xdef\unidiv{\number\utfdiv{#1}}%
>    \xdef\unimod{\number\utfmod{#1}}%
>    \ifnum#1<\utf@i
>%%%% \unicodeasciicharacter\unimod
>      \char\unimod % \unicodeascii\unimod
>    \else\ifcsname\@@univector\unidiv\endcsname
>      \csname\doutfunihash{\unidiv}{#1}\endcsname
>    \else % so, these can be different fonts !
>      \unicodeglyph\unidiv\unimod % no \uchar (yet)
>    \fi\fi}
>
>Basically, I'd like to use the \unicodeasciicharacter hook with this
>definition:
>
>\def\unicodeasciicharacter{\uchar{0}}
>
>(I'm not certain the above is release-quality code, but I've been testing
>it with a stripped down \utfunifontglyph that should be functionally
>equivalent.)

play with it and we'll see

>Working with the unicode code makes me appreciate that it's really
>powerful part of ConTeXt. Thanks, Hans!

how about the following:

there are many font encodings around but none is really complete enough to 
deal with basic unicode (0/1/2 range)

why not define a new font encoding with characters only so that we can have 
as many chars as needed in a 0-255 vector, all those
special characters (registered, and so) are (1) used seldom, (2) not 
related to hyphenation and kerning; it is also a way to get
rid of some 'ligatures' like --- becoming an emdash (in context and xml we 
can conformtably directly call symbols, and these may
come from a different instance of the font

Hans