From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/14411 Path: main.gmane.org!not-for-mail From: Hans Hagen Newsgroups: gmane.comp.tex.context Subject: Re: unicode and out-of-box usability Date: Sat, 03 Jan 2004 23:38:02 +0100 Sender: ntg-context-admin@ntg.nl Message-ID: <6.0.1.1.2.20040103232206.01e51ec0@localhost> References: <20040102175913.3787@smtp.btinternet.com> Reply-To: ntg-context@ntg.nl NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Trace: sea.gmane.org 1073249791 10669 80.91.224.253 (4 Jan 2004 20:56:31 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 4 Jan 2004 20:56:31 +0000 (UTC) Original-X-From: ntg-context-admin@ntg.nl Sun Jan 04 21:56:29 2004 Return-path: Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AdFIT-0007x2-00 for ; Sun, 04 Jan 2004 21:56:29 +0100 Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1]) by ref.ntg.nl (Postfix) with ESMTP id 7E4FF10B57; Sun, 4 Jan 2004 21:55:35 +0100 (MET) Original-Received: from mail.solcon.net (mail.solcon.net [212.45.33.5]) by ref.ntg.nl (Postfix) with ESMTP id 16C7110B3E for ; Sun, 4 Jan 2004 21:53:10 +0100 (MET) Original-Received: from server-1.pragma-net.nl (dsl-212-84-128-085.solcon.nl [212.84.128.85]) by mail.solcon.net (8.11.6/8.9.3) with ESMTP id i04KrOp07612 for ; Sun, 4 Jan 2004 21:53:24 +0100 Original-Received: by server-1.pragma-net.nl (Postfix, from userid 65534) id 612962E94E; Sun, 4 Jan 2004 21:53:33 +0100 (CET) Original-Received: from laptop-3.wxs.nl (unknown [10.100.1.1]) by server-1.pragma-net.nl (Postfix) with ESMTP id BAC452E94C for ; Sun, 4 Jan 2004 20:53:29 +0000 (UTC) X-Sender: hagen-mail@localhost X-Mailer: QUALCOMM Windows Eudora Version 6.0.1.1 Original-To: ntg-context@ntg.nl In-Reply-To: <20040102175913.3787@smtp.btinternet.com> Original-References: <20040102175913.3787@smtp.btinternet.com> Errors-To: ntg-context-admin@ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.comp.tex.context:14411 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:14411 At 18:59 02/01/2004, you wrote: >I've been struggling through, trying to learn Unicode in ConTeXt. It's >been instructive, at least. (Hope to make a MyWay about it...) good >There are a few weird things that made it difficult to learn, and I was >wondering if someone could help explain why things are the way they are. > >In unic-ini: >\chardef\utfunihashmode=0 % 1 = enabled > >Actually, if I understand things correctly, '1' means "disabled", which >is what I preferred, having not yet created any unicode vectors. So the >internal documentation there seems wrong, and I would argue the default >case (0) makes it harder for beginners. hm, did you look at the unic-001 etc files? the trick is in fast and efficient expansion without the need to define lots of named glyphs >More confusingly, in font-uni: forget about that one, although it's called unicode, it's actually a mechanism for the many vectors derived from unicode / related to unicode but not entirely i.e. cjk fonts >\def\enableunicodefont#1% > {\definefontsynonym[\s!Unicode][\getvalue{\??uc#1\c!file}]% > \def\unicodescale {\getvalue{\??uc#1\c!schaal}}% > \def\unicodeheight {\getvalue{\??uc#1\c!hoogte}}% > \def\unicodedepth {\getvalue{\??uc#1\c!diepte}}% > \def\unicodedigits {\getvalue{\??uc#1\c!conversie}}% > \def\handleunicodeglyph {\getvalue{\??uc#1\c!commando}}% >%%%%%%%%%%% NEXT LINE > \enableregime[unicode]% the following \relax's are realy needed > \doifvalue{\??uc#1\c!interlinie}\v!ja\setupinterlinespace\relax > \getvalue{\??uc#1\c!commandos}\relax} > >The \enableregime[unicode] runs in direct opposition with the >\enableregime[utf] that normally goes at the start of (some of my) >documents. As it stands, with the regime hard-coded, users have to put an >\enableregime[utf] *after* the font declaration. That's awkward. so, don't use that mechanism, stick to the utf mechanism >The last proposed change/complaint is back in unic-ini, and came from my >attempts to match the main body font with the unicode font. > >\def\utfunifontglyph#1% > {\xdef\unidiv{\number\utfdiv{#1}}% > \xdef\unimod{\number\utfmod{#1}}% > \ifnum#1<\utf@i >%%%% \unicodeasciicharacter\unimod > \char\unimod % \unicodeascii\unimod > \else\ifcsname\@@univector\unidiv\endcsname > \csname\doutfunihash{\unidiv}{#1}\endcsname > \else % so, these can be different fonts ! > \unicodeglyph\unidiv\unimod % no \uchar (yet) > \fi\fi} > >Basically, I'd like to use the \unicodeasciicharacter hook with this >definition: > >\def\unicodeasciicharacter{\uchar{0}} > >(I'm not certain the above is release-quality code, but I've been testing >it with a stripped down \utfunifontglyph that should be functionally >equivalent.) play with it and we'll see >Working with the unicode code makes me appreciate that it's really >powerful part of ConTeXt. Thanks, Hans! how about the following: there are many font encodings around but none is really complete enough to deal with basic unicode (0/1/2 range) why not define a new font encoding with characters only so that we can have as many chars as needed in a 0-255 vector, all those special characters (registered, and so) are (1) used seldom, (2) not related to hyphenation and kerning; it is also a way to get rid of some 'ligatures' like --- becoming an emdash (in context and xml we can conformtably directly call symbols, and these may come from a different instance of the font Hans