From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 11 Sep 2009 19:36:33 +0100 From: Eris Discordia To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <3DA76AC1A01346A9B22FA838@[192.168.1.2]> In-Reply-To: <509071940909110954i7f3e6a31ic1a93cb9b741f60@mail.gmail.com> References: <2ccd406da7f34cd3fb8be6c3c29e7765@quanstro.net> <509071940909110954i7f3e6a31ic1a93cb9b741f60@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: Re: [9fans] Simplified Chinese plan 9 Topicbox-Message-UUID: 6d50a9ec-ead5-11e9-9d60-3106f5b1d025 > anyway, the general idea is that it can compose kanji from strings of > hiragana. it's also been used for other languages (although my memory of > that says it was mostly for the transliteration function, rather than the > compositing function). is it possible to do something similar for the > hanzi, composing them up from roots/stems? i've seen reference to the > idea in chinese dictionaries, but have no idea if it's use is widespread. Kana to kanji conversion is peculiar to Japanese and that's basically how all Japanese IMEs work. You input a series of kana (in Roman/Latin letters converted on-the-fly), then either assert them as they are or accept a corresponding kanji the IME offers. It's called inline conversion. Conversion may also be explicitly requested from the software when for some reason inline conversion results are unsatisfactory. It takes really good UI design to make the process practical. For Chinese, input from a standardized romanization is required, Pinyin being the most widely used (cellphones, computers, people who learn Chinese as a second language and would have an immensely hard time if they were to write in ideographs, even many Chinese people). Kana to kanji conversion is not viable there simply because kana is not the syllabary system used to express Chinese. Chinese syllables do no correspond to kana, plus Chinese is tonal while Japanese is not. Phonetically, and therefore input-wise since practical CJK input is based on sounds rather than meanings, the two languages are universes apart even though they share Han characters in the semantic sphere. Actually, any practical input system should rely on sound representation rather than meaning--there only so many sounds while there are infinitely many meanings. Roots/stems you refer to are elements in the ideographs used to classify Han characters. They are more properly called radicals and are ordered by stroke count, i.e. the number times you put down the pen to compose one from the basic strokes. Most IMEs, _besides_ automatic conversion, offer the option to choose a kanji/hanzi/hanja by any one of various lookup methods. Radical lookup is one such method. There are other classifications of Han characters such as Hadamitzky-Spahn (applicable to kanji) which aren't present in many IMEs. This is a great example of a full-blown Japanese word processor (it's Windows freeware): Features nearly everything expected from a CJK input system and works independent of MS IME although can also be used in conjunction. At present, Windows and MS Office do an unrivalled job of enabling multi-lingual input and display. I can't help but feel this is sort of a lock-in situation for people who need/fancy that sort of capability. This isn't really something I would revel in but it's at least reassuring that there is _some_ convenient, stable, uniform way to get these things done. --On Friday, September 11, 2009 12:54 -0400 Anthony Sorace wrote: > i know very little about existing chinese input methods, so this is more a > question for my own understanding than a suggestion, but: > > there is ktrans for Plan 9; the latest version i'm aware of is described > here: http://basalt.cias.osakafu-u.ac.jp/plan9/s39.html > although that page is a bit hard to read since line breaks are not > preserved. the contents are just the README from the tar file; maybe > easier to just download that and read there. > > anyway, the general idea is that it can compose kanji from strings of > hiragana. it's also been used for other languages (although my memory of > that says it was mostly for the transliteration function, rather than the > compositing function). is it possible to do something similar for the > hanzi, composing them up from roots/stems? i've seen reference to the > idea in chinese dictionaries, but have no idea if it's use is widespread. > > i've had ktrans working on 4th edition in the past, although i just tried > again (after a long gap), and it blows an assert, which i've not looked > into yet. >