ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Matt Gushee <mgushee@havenrock.com>
Subject: Re: Writing Japanese using ConTeXt
Date: Sun, 15 Jun 2003 16:22:52 -0600	[thread overview]
Message-ID: <20030615222252.GC23999@swordfish> (raw)
In-Reply-To: <5.2.0.9.1.20030615225437.089e2500@localhost>

On Sun, Jun 15, 2003 at 11:03:06PM +0200, Hans Hagen wrote:

> A few questions;
> 
> - How are the rules for breaking?

For a detailed explanation, you should refer to the big book. But
actually the rules are not all that difficult--probably a good deal
simpler than European languages, I'd say. The most important thing to
know is that there is a certain set of characters that may not occur at
the end of a line, and another set that may not occur at the beginning,
and I believe (it's been a while since I seriously looked at any of
this) that there are certain unbreakable pairs, but not a huge number of
them.

> - how many glyphs are there (well, i could look it up in the big cjk book)

That's rather a tricky question, and the answer depends partly on
whether you want a complete solution or an 80/20 one. You probably know
that there are two main character sets in Japanese: jis-x-0208 and
jis-x-0212 (of course, the full names are suffixed with years, but I
forget what the current versions are). The vast majority of all Japanese
text (notice I said text, *not* documents) can be written with hiragana
and katakana (50+ characters each), roman alphabet (256, I guess?), and
the kanji in jis-x-0208, of which there are about 6000.

However, it's hard to get away without using jis-x-0212. Literary terms
and probably some specialized scientific vocabulary often require it,
and most critically, geographic and personal names very often use
jis-x-0212 characters. It's common to find names whose characters are
represented in jis-x-0208, but for any given name you must use a
different glyph that is in jis-x-0212. In Japanese culture it is
unacceptable to substitute glyphs in names. An analogy in Western
languages might be: suppose you had a typesetting system that was
incapable of rendering the string "sen" at the end of the word. Thus,
whenever yyou encountered the names Andersen or Olsen, you would print
them as "Anderson" and "Olson." I don't think anyone would consider that
acceptable. 

So the upshot of this is that, though jis-x-0212 glyphs make up a very
small proportion of the Japanese text that is printed (I'd guess 1-2
percent), a large proportion of documents (40-50 percent, maybe) require
one or more glyphs from that set. So that's another 8000 glyphs, if you
want to do it right.

One other point that may or may not matter is that ... I'm not sure if
this is the correct terminology, but the code points of the Japanese
character sets are arrayed in a sparse matrix (?). Each plane is
194x194, rather than 256x256. I used to know why.

-- 
Matt Gushee                 When a nation follows the Way,
Englewood, Colorado, USA    Horses bear manure through
mgushee@havenrock.com           its fields;
http://www.havenrock.com/   When a nation ignores the Way,
                            Horses bear soldiers through
                                its streets.
                                
                            --Lao Tzu (Peter Merel, trans.)

  reply	other threads:[~2003-06-15 22:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-06-08 11:48 Tim 't Hart
2003-06-09 14:16 ` Matthew Huggett
2003-06-09 16:33   ` Tim 't Hart
2003-06-10  8:18     ` Hans Hagen
2003-06-10 20:02       ` Tim 't Hart
2003-06-11  2:35         ` Matthew Huggett
2003-06-09 23:24 ` Matt Gushee
2003-06-10  7:41   ` Matthew Huggett
2003-06-10  8:13   ` Hans Hagen
2003-06-10 19:36     ` Tim 't Hart
2003-06-15 21:03 ` Hans Hagen
2003-06-15 22:22   ` Matt Gushee [this message]
2003-06-16  7:55     ` Hans Hagen
2003-06-16  4:37   ` Tim 't Hart
2003-06-16  7:51     ` Hans Hagen
2003-06-17  7:15 Lei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030615222252.GC23999@swordfish \
    --to=mgushee@havenrock.com \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).