From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/10162
Path: main.gmane.org!not-for-mail
From: "Cho, Jin-Hwan" <chofchof@ktug.or.kr>
Newsgroups: gmane.comp.tex.context
Subject: UTF8 problems with Hangul Syllables
Date: Wed, 18 Dec 2002 13:49:20 +0900
Organization: Korean TeX Users Group
Sender: ntg-context-admin@ntg.nl
Message-ID: <3DFFFE50.BBC4099A@ktug.or.kr>
Reply-To: ntg-context@ntg.nl
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: main.gmane.org 1040187224 26838 80.91.224.249 (18 Dec 2002 04:53:44 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Wed, 18 Dec 2002 04:53:44 +0000 (UTC)
Return-path: <ntg-context-admin@ntg.nl>
Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl)
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18OWDG-0006yk-00
	for <gctc-ntg-context-518@gmane.org>; Wed, 18 Dec 2002 05:53:42 +0100
Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1])
	by ref.ntg.nl (Postfix) with ESMTP
	id 6726C10AEC; Wed, 18 Dec 2002 05:54:08 +0100 (MET)
Original-Received: from rabit.kias.re.kr (unknown [210.219.50.14])
	by ref.ntg.nl (Postfix) with SMTP id AF4D110AE7
	for <ntg-context@ntg.nl>; Wed, 18 Dec 2002 05:52:14 +0100 (MET)
Original-Received: from ktug.or.kr (spin.kias.re.kr [210.98.29.127])
	by ns.kias.re.kr (8.9.3/8.9.3) with ESMTP id NAA05689
	for <ntg-context@ntg.nl>; Wed, 18 Dec 2002 13:57:20 +0900 (KST)
X-Mailer: Mozilla 4.79 [en] (Windows NT 5.0; U)
X-Accept-Language: en,ko
Original-To: ntg-context@ntg.nl
Errors-To: ntg-context-admin@ntg.nl
X-BeenThere: ntg-context@ntg.nl
X-Mailman-Version: 2.0.13
Precedence: bulk
List-Help: <mailto:ntg-context-request@ntg.nl?subject=help>
List-Post: <mailto:ntg-context@ntg.nl>
List-Subscribe: <http://www.ntg.nl/mailman/listinfo/ntg-context>,
	<mailto:ntg-context-request@ntg.nl?subject=subscribe>
List-Id: mailing list for ConTeXt users <ntg-context.ntg.nl>
List-Unsubscribe: <http://www.ntg.nl/mailman/listinfo/ntg-context>,
	<mailto:ntg-context-request@ntg.nl?subject=unsubscribe>
List-Archive: <http://www.ntg.nl/pipermail/ntg-context/>
Xref: main.gmane.org gmane.comp.tex.context:10162
X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:10162

Here is my comment and question on the new feature of ConTeXt supporting
the UTF8 encoding.

Actually I tried to test the following short ConTeXt document containing
two Korean characters. At the second line I used the Bitstream Cyberbit
font and the corresponding TFM files were generated by ttf2tfm with Unicode.sfd
(the same way as the UTF8 support in CJK-LaTeX).

\enableregime [utf]
\definefontsynonym [UnicodeRegular] [cyberb]
\chardef\utfunihashmode=1
\starttext
^^eb^^bf^^a1
^^ec^^80^^80
\stoptext

Here, ^^eb^^bf^^a1 = U+BFE1 and ^^ec^^80^^80 = U+C000. 

1. Without the third line (\chardef\utfunihashmode=1), I could not see
   any characters. Why?

2. After enabling \utfunihashmode, I could see the first character. But
   not the second character. The difference was that the value of \unidiv
   were 191 for the first character and 192 for the second character.
   In fact, all characters with \unidiv >= 192 and \unidiv <= 223
   (from U+C000 to U+DFFF; half of Hangul Syllables) were not shown
   correctly. Why?
   
Anyway, it is now possible to get a PDF file containing several different
languages with ConTeXt + dvipdfmx. Furthermore, the texts in the PDF file
can be searched and extracted. Bookmarks and text annotations too!

I used the following map entry (usually in cid-x.map) for dvipdfmx.

cyberb@Unicode@ Identity-H :0:cyberbit.ttf

Best, ChoF.
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~     ***
| Cho, Jin-Hwan == ChoF |     ^ ^
~~~~~~~~~~~~~~~~~~~~~~~~~      o
| Research Fellow       |     ~~~
| School of Mathematics ~~~~~~~~~~~~~~
| Korea Institute for Advanced Study |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| chofchof@ktug.or.kr                |
| http://free.kaist.ac.kr/ChoF/      |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~