From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/10162 Path: main.gmane.org!not-for-mail From: "Cho, Jin-Hwan" Newsgroups: gmane.comp.tex.context Subject: UTF8 problems with Hangul Syllables Date: Wed, 18 Dec 2002 13:49:20 +0900 Organization: Korean TeX Users Group Sender: ntg-context-admin@ntg.nl Message-ID: <3DFFFE50.BBC4099A@ktug.or.kr> Reply-To: ntg-context@ntg.nl NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: main.gmane.org 1040187224 26838 80.91.224.249 (18 Dec 2002 04:53:44 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 18 Dec 2002 04:53:44 +0000 (UTC) Return-path: Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18OWDG-0006yk-00 for ; Wed, 18 Dec 2002 05:53:42 +0100 Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1]) by ref.ntg.nl (Postfix) with ESMTP id 6726C10AEC; Wed, 18 Dec 2002 05:54:08 +0100 (MET) Original-Received: from rabit.kias.re.kr (unknown [210.219.50.14]) by ref.ntg.nl (Postfix) with SMTP id AF4D110AE7 for ; Wed, 18 Dec 2002 05:52:14 +0100 (MET) Original-Received: from ktug.or.kr (spin.kias.re.kr [210.98.29.127]) by ns.kias.re.kr (8.9.3/8.9.3) with ESMTP id NAA05689 for ; Wed, 18 Dec 2002 13:57:20 +0900 (KST) X-Mailer: Mozilla 4.79 [en] (Windows NT 5.0; U) X-Accept-Language: en,ko Original-To: ntg-context@ntg.nl Errors-To: ntg-context-admin@ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.comp.tex.context:10162 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:10162 Here is my comment and question on the new feature of ConTeXt supporting the UTF8 encoding. Actually I tried to test the following short ConTeXt document containing two Korean characters. At the second line I used the Bitstream Cyberbit font and the corresponding TFM files were generated by ttf2tfm with Unicode.sfd (the same way as the UTF8 support in CJK-LaTeX). \enableregime [utf] \definefontsynonym [UnicodeRegular] [cyberb] \chardef\utfunihashmode=1 \starttext ^^eb^^bf^^a1 ^^ec^^80^^80 \stoptext Here, ^^eb^^bf^^a1 = U+BFE1 and ^^ec^^80^^80 = U+C000. 1. Without the third line (\chardef\utfunihashmode=1), I could not see any characters. Why? 2. After enabling \utfunihashmode, I could see the first character. But not the second character. The difference was that the value of \unidiv were 191 for the first character and 192 for the second character. In fact, all characters with \unidiv >= 192 and \unidiv <= 223 (from U+C000 to U+DFFF; half of Hangul Syllables) were not shown correctly. Why? Anyway, it is now possible to get a PDF file containing several different languages with ConTeXt + dvipdfmx. Furthermore, the texts in the PDF file can be searched and extracted. Bookmarks and text annotations too! I used the following map entry (usually in cid-x.map) for dvipdfmx. cyberb@Unicode@ Identity-H :0:cyberbit.ttf Best, ChoF. -- ~~~~~~~~~~~~~~~~~~~~~~~~~ *** | Cho, Jin-Hwan == ChoF | ^ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~ o | Research Fellow | ~~~ | School of Mathematics ~~~~~~~~~~~~~~ | Korea Institute for Advanced Study | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | chofchof@ktug.or.kr | | http://free.kaist.ac.kr/ChoF/ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~