From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Fri, 11 Sep 2009 17:13:03 +0100
From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-ID: <3F111F9E2158FA97B4F102D5@[192.168.1.2]>
In-Reply-To: <op.uz3bjkbqhipq0d@santucco.avp.ru>
References: <2ccd406da7f34cd3fb8be6c3c29e7765@quanstro.net>
	<op.uz3bjkbqhipq0d@santucco.avp.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Subject: Re: [9fans] Simplified Chinese  plan 9
Topicbox-Message-UUID: 6d2404fa-ead5-11e9-9d60-3106f5b1d025

> Maybe it makes a sence to make something like this in Plan9 (an analog
> kbmap) for typing complex symbols like an hieroglyph ?

Your method is in essence what Microsoft's IME on Windows and various IMEs=20
on UNIX-likes (such as SCUM) use. However, an IME for inputting from a list =

of over twenty thousand characters takes quite an effort to devise before=20
it can be practical and useful. Right now even display of CJK is not quite=20
fully supported on any existing FOSS platform (Ruby character display was=20
added to Firefox only somewhere after version 3). Non-integrated pieces of=20
FOSS with great capabilities do exist.

In case of (Simplified and Traditional) Chinese there apparently exist only =

two successful IMEs out there: one is Microsoft's, the other belongs to a=20
Chinese company that has put lots of money and effort into developing the=20
software. I believe both support input by Pinyin romanization, although I=20
may be wrong. There's also Google's Pinyin IME which was involved in a=20
lawsuit with said Chinese company.

In case of Japanese an IME needs to support three writing systems at once,=20
firstly the two kana, and then transforming from kana to kanji. Abundance=20
of homonyms in Japanese as well as a certain writing strategy called ateji=20
(using kanji for phonetic value rather than semantic value) makes embedding =

of a dictionary into the IME unavoidable. Good dictionaries for this=20
purpose don't come free--they must either be bought from professional=20
companies or compiled by people who intimately know the language,=20
preferably native speakers. This latter, I believe, is how IMEs on=20
UNIX-likes came to be. Anyhow, Japanese IMEs, too, rely on input based on a =

romanization of the language. The actual number of distinct kanji required=20
for input of text at a high school literate level is around two=20
thousand--JLPT Level One roughly corresponds to that--but people, of=20
course, expect a much larger dictionary. Microsoft IME also provides=20
semantic aid by offering short descriptions of kanji so that people can=20
decide which corresponds to the meaning they want to convey. Although=20
unnecessary, it is a most welcome addition.

I don't know anything about Korean writing system or IMEs but since CJK=20
ideographs (most importantly Han characters) are involved similar=20
statements may apply.

Overall, there's no easy way that is light on financial and/or human=20
resources--the two types of resources are interchangeable, i.e. if you have =

an active user base you may be able to avoid expenditure--to put CJK input=20
support into a UI, which is probably why Plan 9 doesn't have that at the=20
moment. It isn't a computer thing--it's a human thing. I might add porting=20
IMEs from some UNIX-like system is probably the best option (for those with =

the technical prowess).

**********

DISTRACTION

While googling around for the existence of IMEs on Plan 9 I came across=20
this document from 1996 titled "Unicode: Writing in the Global Village:"

> Despite these hurdles, Unicode may soon become the most common
> multilingual character-coding system. Support for multiple-language use
> is quickly growing. New operating systems=E2=80=94AT&T's Plan 9, Windows =
NT,
> Novell's Netware 4.01 Directory Services, Sybase's Gain Momentum, and
> Apple's Newton already support Unicode.

--=20
<http://www.nyu.edu/its/pubs/connect/archives/96fall/hargitaivillage.html>

It's funny how the author assumes display and input are the same thing=20
while they so greatly differ, input being times harder to implement.



--On Friday, September 11, 2009 15:29 +0400 Alexander Sychev=20
<santucco@gmail.com> wrote:

> Hello!
>
> Some time ago I wrote for inferno an analog of kbmap with an extention -
> a  possibility to print complex symbols via sequences of more basic
> symbols.
> I use it for typing by the russian translit.
> Here is a piece of file for my kbmap:
> <------------cut --------------->
> 1       45      0
> 1       46      '=D0=A6
> 1       47      '=D0=92
> 1       48      '=D0=91
> 1       49      '=D0=9D
> 1       50      '=D0=9C
> C       =D1=86=D1=85      '=D1=87
> C       =D0=A6=D1=85      '=D0=A7
> C       =D1=81=D1=85      '=D1=88
> C       =D0=A1=D1=85      '=D0=A8
> C       =D1=81=D1=86=D1=85     '=D1=89
> C       =D0=A1=D1=86=D1=85     '=D0=A9
> <------------cut--------------->
>
> The latin symbols are mapped to russian when it is possible. Other
> russian symbols are presented via sequences of mapped symbols, e.g.
> russian symbol  '=D0=A7' [ch] is presented like an sequence of '=D1=86' =
[c] =D0=B8
> '=D1=85' [h].
> A sequence can be broken by pressing any non-symbol key.
> There is at least one big disadvantage of this method - the input focus
> can be changed, e.g. by mouse. In inferno I didn't resolve this problem,
> because /dev/pointer can be opened only once.
>
> Maybe it makes a sence to make something like this in Plan9 (an analog
> kbmap) for typing complex symbols like an hieroglyph ?
>
> On Fri, 11 Sep 2009 14:23:02 +0400, erik quanstrom
> <quanstro@quanstro.net> wrote:
>
>>> HI..everyone:
>>>        Is there some ways to input Simplified Chinese in plan 9 ? I
>>> know plan 9 supports Unicode, so it is no questions for plan 9 to
>>> display Simplified Chinese....... and i have seen some pictures on
>>> Internet to prove it...so i have a question like that above...
>>>    I'm looking forward for the answer........... Thanks first......!!!!
>>
>> the only way to input simplified chinese currently
>> is to use the general codepoint input method.
>> <compose> 'x' + four hex digits.  on a pc compose =3D
>> <alt>.  that's probably not what you're looking for.
>> i am not aware that anyone has written an input
>> method specificly for simplified chinese.
>>
>> - erik
>
>
> --
> Best regards,
>    santucco
>