From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/24302 Path: news.gmane.org!not-for-mail From: Richard Gabriel Newsgroups: gmane.comp.tex.context Subject: Re: Chinese Date: Tue, 13 Dec 2005 10:14:46 +0100 Message-ID: <20051213091446.283c7ce3@mx1.kerio.com> References: <439D9D0F.6080406@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1673421025==" X-Trace: sea.gmane.org 1134465493 6551 80.91.229.2 (13 Dec 2005 09:18:13 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 13 Dec 2005 09:18:13 +0000 (UTC) Original-X-From: ntg-context-bounces@ntg.nl Tue Dec 13 10:18:06 2005 Return-path: Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1Em6Gz-00072B-JO for gctc-ntg-context-518@m.gmane.org; Tue, 13 Dec 2005 10:16:37 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id EA080127E9; Tue, 13 Dec 2005 10:16:36 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 00613-01; Tue, 13 Dec 2005 10:16:36 +0100 (CET) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id C5711127E4; Tue, 13 Dec 2005 10:14:50 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id BB5E2127E4 for ; Tue, 13 Dec 2005 10:14:48 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 32682-05-5 for ; Tue, 13 Dec 2005 10:14:47 +0100 (CET) Original-Received: from mx1.kerio.com (mx1.kerio.com [195.39.55.2]) by ronja.ntg.nl (Postfix) with ESMTP id 57363127DD for ; Tue, 13 Dec 2005 10:14:47 +0100 (CET) Original-Received: from 192.168.10.80 ([192.168.10.80]) by mx1.kerio.com for ntg-context@ntg.nl; Tue, 13 Dec 2005 10:14:46 +0100 Original-To: mailing list for ConTeXt users In-Reply-To: <439D9D0F.6080406@wxs.nl> X-Mailer: Kerio MailServer 6.1.2 WebMail X-User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.5 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:24302 Archived-At: This is a multi-part message in MIME format. --===============1673421025== Content-Type: multipart/alternative; boundary="----------flower-power-468a1ef16b7e9b6cbb684ce0bd09d2ff" This is a multi-part message in MIME format. ------------flower-power-468a1ef16b7e9b6cbb684ce0bd09d2ff Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello Hans, to be honest: I don't speak Chinese and don't know much about it. A few days ago, I was told that we'll let translate some of our document= s (XML) into Chinese and Japanese and I 'll have to typeset them.=20 So I started playing with Chinese in ConTeXt. I've reported the results = which other users (e.g. Tobias) have also noticed.=20 In fact, all the sample Simplified Chinese documents I've tested it on w= ere easily convertible to CP936 (GBK) and could be typeset. This doesn't= mean that you shall not extend the Unicode support, I only think I will= not hardly require it... :-) But yet another question: What about Japanese=3F I've made only small re= search so far, but unlike Chinese, there's almost no information about J= apanese in TeX. How much of work would be to adjust the current "chinese= " ConTeXt module for Japanese=3F What would you need for it=3F [Of course, meanwhile I'll investigate some other ways of typesetting Ja= panese...] Thanks, Richard =5F=5F=5F=5F=5F =20 From: Hans Hagen [mailto:pragma@wxs.nl] To: mailing list for ConTeXt users [mailto:ntg-context@ntg.nl] Sent: Mon, 12 Dec 2005 16:53:51 +0100 Subject: Re: [NTG-context] Chinese Richard Gabriel wrote: > Hi guys, > > I can confirm that the UTF-8 input doesn't work for me too. > If I convert the file info GBK (CP936), it works fine [I suggest to=20 > use the 'iconv' utility for the conversion :-)]. > > I tested the UTF-8 output the followin ways: > > 1) > \enableregime[utf] > \usemodule[chinese] > chinese is not yet defined in utf so if you want that, we need to do it now, since the chinese remapping stuff is rather complex, the best=20 method is to consider a dedicated mechanism question: do the unicode tables cover gbk and big 5 well=3F assuming this, how about making a set of tfm,enc,map files that match=20 the unicode positions (volunteers ...) we can extend the utf handler with a kind of plugin mechanism: \unprotect \def\utfunihashglyph#1% {\@EA\doutfunihashglyph\@EA{\number\utfdiv{#1}}{#1}} % only div once \def\doutfunihashglyph#1#2% div raw {\csname \ifnum#2<\utf@i \strippedcsname\unicodeasciicharacter \else\ifcsname\@@unicommand#1\endcsname \@@unicommand#1% \else\ifcsname\@@univector#1\endcsname \@@univector#1% \else \strippedcsname\unicodeunknowncharacter \fi\fi\fi \@EA\endcsname\@EA{\number\utfmod{#2}}} % only mod once \def\unicodeunknowncharacter#1% {\unknownchar} \let\utfunihash\utfunihashglyph \def\@@unicommand{@@unicommand} \def\defineutfcommand #1 #2% {\setvalue{\@@unicommand#1}##1{#2{#1}{##1}}} so we can define pluig in handlers for e.g. chinese \defineutfcommand 81 {\uchar} (bombs due to missing fonts, so for testing) \def\NotYet#1#2{[#1 #2]} \defineutfcommand 81 {\NotYet} (next comes adapting the chinese files; i can imagine that we redo the= =20 big5 and gbk definitions so that they remap to ut8 as common encoding) so .. the question is ... who is going to make the tfm/enc/map files Hans =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context =20 ------------flower-power-468a1ef16b7e9b6cbb684ce0bd09d2ff Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello Hans,

to be honest: I don't speak Chinese and don't know mu= ch about it.
A few days ago, I was told that we'll let translate some= of our documents (XML) into Chinese and Japanese and I 'll have to type= set them.
So I started playing with Chinese in ConTeXt. I've reporte= d the results which other users (e.g. Tobias) have also noticed.
In = fact, all the sample Simplified Chinese documents I've tested it on were= easily convertible to CP936 (GBK) and could be typeset. This doesn't me= an that you shall not extend the Unicode support, I only think I will no= t hardly require it... :-)

But yet another question: What about J= apanese=3F I've made only small research so far, but unlike Chinese, the= re's almost no information about Japanese in TeX. How much of work would= be to adjust the current "chinese" ConTeXt module for Japanese=3F What = would you need for it=3F
[Of course, meanwhile I'll investigate some = other ways of typesetting Japanese...]

Thanks,
Richard

=


From: Ha= ns Hagen [mailto:pragma@wxs.nl]
To: mailing list for ConTeXt u= sers [mailto:ntg-context@ntg.nl]
Sent: Mon, 12 Dec 2005 16:53:= 51 +0100
Subject: Re: [NTG-context] Chinese

Ric= hard Gabriel wrote:

> Hi guys,
>
> I can confirm t= hat the UTF-8 input doesn't work for me too.
> If I convert the fi= le info GBK (CP936), it works fine [I suggest to
> use the 'iconv= ' utility for the conversion :-)].
>
> I tested the UTF-8 ou= tput the followin ways:
>
> 1)
> \enableregime[utf]> \usemodule[chinese]
>

chinese is not yet defined in u= tf so if you want that, we need to do it

now, since the chinese r= emapping stuff is rather complex, the best
method is to consider a d= edicated mechanism

question: do the unicode tables cover gbk and = big 5 well=3F

assuming this, how about making a set of tfm,enc,ma= p files that match
the unicode positions (volunteers ...)

we = can extend the utf handler with a kind of plugin mechanism:

\unpr= otect

\def\utfunihashglyph#1%
{\@EA\doutfunihashglyph\@EA{\n= umber\utfdiv{#1}}{#1}} % only div once

\def\doutfunihashglyph#1#2= % div raw
{\csname
\ifnum#2<\ut= f@i
\strippedcsname\unicodeasciicharacter
\else\if= csname\@@unicommand#1\endcsname
\@@unicommand#1%
\else= \ifcsname\@@univector#1\endcsname
\@@univector#1%
\els= e
\strippedcsname\unicodeunknowncharacter
\fi\fi\fi \@EA\endcsname\@EA{\number\utfmod{#2}}} % only mod once

\de= f\unicodeunknowncharacter#1%
{\unknownchar}

\let\utfunihash\= utfunihashglyph

\def\@@unicommand{@@unicommand}

\def\defin= eutfcommand #1 #2%
{\setvalue{\@@unicommand#1}##1{#2{#1}{##1}}}
=
so we can define pluig in handlers for e.g. chinese

\defineut= fcommand 81 {\uchar}

(bombs due to missing fonts, so for testing)=

\def\NotYet#1#2{[#1 #2]}

\defineutfcommand 81 {\NotYet}
(next comes adapting the chinese files; i can imagine that we redo= the
big5 and gbk definitions so that they remap to ut8 as common en= coding)

so .. the question is ... who is going to make the tfm/en= c/map files

Hans

=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F= =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
ntg-context mailing list
ntg-context@ntg.nl
http://www.= ntg.nl/mailman/listinfo/ntg-context
------------flower-power-468a1ef16b7e9b6cbb684ce0bd09d2ff-- --===============1673421025== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context --===============1673421025==--