From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Fri, 11 Sep 2009 19:36:33 +0100
From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-ID: <3DA76AC1A01346A9B22FA838@[192.168.1.2]>
In-Reply-To: <509071940909110954i7f3e6a31ic1a93cb9b741f60@mail.gmail.com>
References: <2ccd406da7f34cd3fb8be6c3c29e7765@quanstro.net>
	<op.uz3bjkbqhipq0d@santucco.avp.ru>
	<509071940909110954i7f3e6a31ic1a93cb9b741f60@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: Re: [9fans] Simplified Chinese plan 9
Topicbox-Message-UUID: 6d50a9ec-ead5-11e9-9d60-3106f5b1d025

> anyway, the general idea is that it can compose kanji from strings of
> hiragana. it's also been used for other languages (although my memory of
> that says it was mostly for the transliteration function, rather than the
> compositing function). is it possible to do something similar for the
> hanzi, composing them up from roots/stems? i've seen reference to the
> idea in chinese dictionaries, but have no idea if it's use is widespread.

Kana to kanji conversion is peculiar to Japanese and that's basically how
all Japanese IMEs work. You input a series of kana (in Roman/Latin letters
converted on-the-fly), then either assert them as they are or accept a
corresponding kanji the IME offers. It's called inline conversion.
Conversion may also be explicitly requested from the software when for some
reason inline conversion results are unsatisfactory. It takes really good
UI design to make the process practical.

For Chinese, input from a standardized romanization is required, Pinyin
being the most widely used (cellphones, computers, people who learn Chinese
as a second language and would have an immensely hard time if they were to
write in ideographs, even many Chinese people). Kana to kanji conversion is
not viable there simply because kana is not the syllabary system used to
express Chinese. Chinese syllables do no correspond to kana, plus Chinese
is tonal while Japanese is not. Phonetically, and therefore input-wise
since practical CJK input is based on sounds rather than meanings, the two
languages are universes apart even though they share Han characters in the
semantic sphere. Actually, any practical input system should rely on sound
representation rather than meaning--there only so many sounds while there
are infinitely many meanings.

Roots/stems you refer to are elements in the ideographs used to classify
Han characters. They are more properly called radicals and are ordered by
stroke count, i.e. the number times you put down the pen to compose one
from the basic strokes. Most IMEs, _besides_ automatic conversion, offer
the option to choose a kanji/hanzi/hanja by any one of various lookup
methods. Radical lookup is one such method. There are other classifications
of Han characters such as Hadamitzky-Spahn (applicable to kanji) which
aren't present in many IMEs.

This is a great example of a full-blown Japanese word processor (it's
Windows freeware):

<http://www.physics.ucla.edu/~grosenth/jwpce.html>

Features nearly everything expected from a CJK input system and works
independent of MS IME although can also be used in conjunction.

At present, Windows and MS Office do an unrivalled job of enabling
multi-lingual input and display. I can't help but feel this is sort of a
lock-in situation for people who need/fancy that sort of capability. This
isn't really something I would revel in but it's at least reassuring that
there is _some_ convenient, stable, uniform way to get these things done.



--On Friday, September 11, 2009 12:54 -0400 Anthony Sorace
<anothy@gmail.com> wrote:

> i know very little about existing chinese input methods, so this is more a
> question for my own understanding than a suggestion, but:
>
> there is ktrans for Plan 9; the latest version i'm aware of is described
> here: 	http://basalt.cias.osakafu-u.ac.jp/plan9/s39.html
> although that page is a bit hard to read since line breaks are not
> preserved. the contents are just the README from the tar file; maybe
> easier to just download that and read there.
>
> anyway, the general idea is that it can compose kanji from strings of
> hiragana. it's also been used for other languages (although my memory of
> that says it was mostly for the transliteration function, rather than the
> compositing function). is it possible to do something similar for the
> hanzi, composing them up from roots/stems? i've seen reference to the
> idea in chinese dictionaries, but have no idea if it's use is widespread.
>
> i've had ktrans working on 4th edition in the past, although i just tried
> again (after a long gap), and it blows an assert, which i've not looked
> into yet.
>