From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <60dd568491e43f8b733ddd04f15cd129@plan9.bell-labs.com>
From: David Presotto <presotto@closedmind.org>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] input methods for non-ascii languages
In-Reply-To: <010c01c355c4$af2cb6c0$b9844051@insultant.net>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="upas-tznywhkeomszxbjfcoayfwgwnv"
Date: Tue, 29 Jul 2003 08:20:05 -0400
Topicbox-Message-UUID: 082d3ade-eacc-11e9-9e20-41e7f4b1d025

This is a multi-part message in MIME format.
--upas-tznywhkeomszxbjfcoayfwgwnv
Content-Disposition: inline
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Some years back I saw a grammar based editor at Sony.
It allowed you to type a sentence in romaji and it would display
a romaji/kanji/kana representation of the sentence at the bottom
of the screen.  Since it 'understood' the sentence, by the time
you got to the end of the sentence, it had a pretty high probability
of having it right.  You could then tab over to any word and cycle
through hiragana, katakana, romaji and the possible kanji equivs.
Of course, as you fixed each word, it could be changing all of the
unfixed part to match.  I'ld hate to think how much code was
behind it.
--upas-tznywhkeomszxbjfcoayfwgwnv
Content-Type: message/rfc822
Content-Disposition: inline

Received: from plan9.cs.bell-labs.com ([135.104.9.2]) by plan9; Tue Jul 29 07:30:31 EDT 2003
Received: from mail.cse.psu.edu ([130.203.4.6]) by plan9; Tue Jul 29 07:30:29 EDT 2003
Received: by mail.cse.psu.edu (CSE Mail Server, from userid 60001)
	id 1C98E19A0B; Tue, 29 Jul 2003 07:30:21 -0400 (EDT)
Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.6.6])
	by mail.cse.psu.edu (CSE Mail Server) with ESMTP
	id 0224119A0B; Tue, 29 Jul 2003 07:30:16 -0400 (EDT)
X-Original-To: 9fans@cse.psu.edu
Delivered-To: 9fans@cse.psu.edu
Received: by mail.cse.psu.edu (CSE Mail Server, from userid 60001)
	id DD5A219B46; Tue, 29 Jul 2003 07:29:56 -0400 (EDT)
Received: from ams004.ftl.affinity.com (lvs00-fl.valueweb.net [216.219.253.199])
	by mail.cse.psu.edu (CSE Mail Server) with ESMTP id 0886819A0B
	for <9fans@cse.psu.edu>; Tue, 29 Jul 2003 07:29:56 -0400 (EDT)
Received: from coma ([81.64.132.185]) by ams.ftl.affinity.com with SMTP id <223226-20167>; Tue, 29 Jul 2003 07:29:46 -0400
Message-ID: <010c01c355c4$af2cb6c0$b9844051@insultant.net>
From: "boyd, rounin" <boyd@insultant.net>
To: <9fans@cse.psu.edu>
References: <0488901b9c79a39ff7a6284c92c17653@centurytel.net>
Subject: Re: [9fans] input methods for non-ascii languages
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Sender: 9fans-admin@cse.psu.edu
Errors-To: 9fans-admin@cse.psu.edu
X-BeenThere: 9fans@cse.psu.edu
X-Mailman-Version: 2.0.11
Precedence: bulk
Reply-To: 9fans@cse.psu.edu
List-Id: Fans of the OS Plan 9 from Bell Labs <9fans.cse.psu.edu>
List-Archive: <https://lists.cse.psu.edu/archives/9fans/>
Date: Tue, 29 Jul 2003 13:29:32 +0200
X-Spam-Status: No, hits=-1.0 required=5.0
	tests=QUOTED_EMAIL_TEXT,REFERENCES
	version=2.55
X-Spam-Level:
X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp)
Content-Transfer-Encoding: quoted-printable

> I don't know if this would also be relevant, but you can check
> 'nemo' directory on sources for devkbmap stuff. If you are setup
> to get to sources, you'll find it here:

that stuff is based on the fact that you have a keyboard that
allows you to type the characters directly.  well, i should make
myself clear:

    all pc keyboards generate the same scan codes for the
    the same key (modulo weirdness) but the keytops have
    different symbols on them.  eg: where a us keyboard
    would have qwerty i have azerty.  when typing either
    sequence the same set of scan codes is generated.

to make it more difficult not all of the characters can be typed
directly; to get =EA [&ecirc;] i have to type ^ then e.

japanese is a special case 'cos it has 4 character sets:

    - hiragana [phonetic set for japanese words]
    - katakana [phonetic set for foreign words]
    - kanji [the ideographs]
    - romaji [romanised representation]

i've seen numerous systems and keyboards for doing this
and other things (the various japanese on 9fans know better
than i, obviously) and it's pretty nasty.

some keyboards have the kana imposed on a qwerty keyboard
and you use a 'shift' key to get at them.

for typing the kanji, well the system i like is that you type the
stem of the pronounciation and you then cycle through a
set of ideographs until the one you want turns up.  i'm
not sure, but there should be no reason why such a
system couldn't sort them by frequency, on a personalised
basis.

iirc the basic set of kanji is around 800, then there's a jump
to 2000 and most newspapers use around 6000.

reading them is hard enough, but in writing them you have
to remember the 'stroke order', not some random set of
strokes that will get you the character (this goes for the
kana as well, but they are simple).

--upas-tznywhkeomszxbjfcoayfwgwnv--