From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/12488 Path: main.gmane.org!not-for-mail From: Matt Gushee Newsgroups: gmane.comp.tex.context Subject: Re: Writing Japanese using ConTeXt Date: Sun, 15 Jun 2003 16:22:52 -0600 Sender: ntg-context-admin@ntg.nl Message-ID: <20030615222252.GC23999@swordfish> References: <000001c32db3$f5e81750$0a01a8c0@TIMBO> <5.2.0.9.1.20030615225437.089e2500@localhost> Reply-To: ntg-context@ntg.nl NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1055715872 26778 80.91.224.249 (15 Jun 2003 22:24:32 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 15 Jun 2003 22:24:32 +0000 (UTC) Original-X-From: ntg-context-admin@ntg.nl Mon Jun 16 00:24:27 2003 Return-path: Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19Rfui-0006vW-00 for ; Mon, 16 Jun 2003 00:23:52 +0200 Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1]) by ref.ntg.nl (Postfix) with ESMTP id 0AC2E10B5C; Mon, 16 Jun 2003 00:26:33 +0200 (MEST) Original-Received: from swordfish (unknown [216.241.35.41]) by ref.ntg.nl (Postfix) with ESMTP id 3149F10ACE for ; Mon, 16 Jun 2003 00:22:52 +0200 (MEST) Original-Received: from matt by swordfish with local (Exim 3.35 #1 (Debian)) id 19Rftk-0000KX-00 for ; Sun, 15 Jun 2003 16:22:52 -0600 Original-To: ntg-context@ntg.nl Content-Disposition: inline In-Reply-To: <5.2.0.9.1.20030615225437.089e2500@localhost> User-Agent: Mutt/1.3.27i Errors-To: ntg-context-admin@ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.0.13 Precedence: bulk X-Reply-To: Matt Gushee List-Help: List-Post: List-Subscribe: , List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.comp.tex.context:12488 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:12488 On Sun, Jun 15, 2003 at 11:03:06PM +0200, Hans Hagen wrote: > A few questions; > > - How are the rules for breaking? For a detailed explanation, you should refer to the big book. But actually the rules are not all that difficult--probably a good deal simpler than European languages, I'd say. The most important thing to know is that there is a certain set of characters that may not occur at the end of a line, and another set that may not occur at the beginning, and I believe (it's been a while since I seriously looked at any of this) that there are certain unbreakable pairs, but not a huge number of them. > - how many glyphs are there (well, i could look it up in the big cjk book) That's rather a tricky question, and the answer depends partly on whether you want a complete solution or an 80/20 one. You probably know that there are two main character sets in Japanese: jis-x-0208 and jis-x-0212 (of course, the full names are suffixed with years, but I forget what the current versions are). The vast majority of all Japanese text (notice I said text, *not* documents) can be written with hiragana and katakana (50+ characters each), roman alphabet (256, I guess?), and the kanji in jis-x-0208, of which there are about 6000. However, it's hard to get away without using jis-x-0212. Literary terms and probably some specialized scientific vocabulary often require it, and most critically, geographic and personal names very often use jis-x-0212 characters. It's common to find names whose characters are represented in jis-x-0208, but for any given name you must use a different glyph that is in jis-x-0212. In Japanese culture it is unacceptable to substitute glyphs in names. An analogy in Western languages might be: suppose you had a typesetting system that was incapable of rendering the string "sen" at the end of the word. Thus, whenever yyou encountered the names Andersen or Olsen, you would print them as "Anderson" and "Olson." I don't think anyone would consider that acceptable. So the upshot of this is that, though jis-x-0212 glyphs make up a very small proportion of the Japanese text that is printed (I'd guess 1-2 percent), a large proportion of documents (40-50 percent, maybe) require one or more glyphs from that set. So that's another 8000 glyphs, if you want to do it right. One other point that may or may not matter is that ... I'm not sure if this is the correct terminology, but the code points of the Japanese character sets are arrayed in a sparse matrix (?). Each plane is 194x194, rather than 256x256. I used to know why. -- Matt Gushee When a nation follows the Way, Englewood, Colorado, USA Horses bear manure through mgushee@havenrock.com its fields; http://www.havenrock.com/ When a nation ignores the Way, Horses bear soldiers through its streets. --Lao Tzu (Peter Merel, trans.)