From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <60dd568491e43f8b733ddd04f15cd129@plan9.bell-labs.com> From: David Presotto To: 9fans@cse.psu.edu Subject: Re: [9fans] input methods for non-ascii languages In-Reply-To: <010c01c355c4$af2cb6c0$b9844051@insultant.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="upas-tznywhkeomszxbjfcoayfwgwnv" Date: Tue, 29 Jul 2003 08:20:05 -0400 Topicbox-Message-UUID: 082d3ade-eacc-11e9-9e20-41e7f4b1d025 This is a multi-part message in MIME format. --upas-tznywhkeomszxbjfcoayfwgwnv Content-Disposition: inline Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Some years back I saw a grammar based editor at Sony. It allowed you to type a sentence in romaji and it would display a romaji/kanji/kana representation of the sentence at the bottom of the screen. Since it 'understood' the sentence, by the time you got to the end of the sentence, it had a pretty high probability of having it right. You could then tab over to any word and cycle through hiragana, katakana, romaji and the possible kanji equivs. Of course, as you fixed each word, it could be changing all of the unfixed part to match. I'ld hate to think how much code was behind it. --upas-tznywhkeomszxbjfcoayfwgwnv Content-Type: message/rfc822 Content-Disposition: inline Received: from plan9.cs.bell-labs.com ([135.104.9.2]) by plan9; Tue Jul 29 07:30:31 EDT 2003 Received: from mail.cse.psu.edu ([130.203.4.6]) by plan9; Tue Jul 29 07:30:29 EDT 2003 Received: by mail.cse.psu.edu (CSE Mail Server, from userid 60001) id 1C98E19A0B; Tue, 29 Jul 2003 07:30:21 -0400 (EDT) Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.6.6]) by mail.cse.psu.edu (CSE Mail Server) with ESMTP id 0224119A0B; Tue, 29 Jul 2003 07:30:16 -0400 (EDT) X-Original-To: 9fans@cse.psu.edu Delivered-To: 9fans@cse.psu.edu Received: by mail.cse.psu.edu (CSE Mail Server, from userid 60001) id DD5A219B46; Tue, 29 Jul 2003 07:29:56 -0400 (EDT) Received: from ams004.ftl.affinity.com (lvs00-fl.valueweb.net [216.219.253.199]) by mail.cse.psu.edu (CSE Mail Server) with ESMTP id 0886819A0B for <9fans@cse.psu.edu>; Tue, 29 Jul 2003 07:29:56 -0400 (EDT) Received: from coma ([81.64.132.185]) by ams.ftl.affinity.com with SMTP id <223226-20167>; Tue, 29 Jul 2003 07:29:46 -0400 Message-ID: <010c01c355c4$af2cb6c0$b9844051@insultant.net> From: "boyd, rounin" To: <9fans@cse.psu.edu> References: <0488901b9c79a39ff7a6284c92c17653@centurytel.net> Subject: Re: [9fans] input methods for non-ascii languages MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Sender: 9fans-admin@cse.psu.edu Errors-To: 9fans-admin@cse.psu.edu X-BeenThere: 9fans@cse.psu.edu X-Mailman-Version: 2.0.11 Precedence: bulk Reply-To: 9fans@cse.psu.edu List-Id: Fans of the OS Plan 9 from Bell Labs <9fans.cse.psu.edu> List-Archive: Date: Tue, 29 Jul 2003 13:29:32 +0200 X-Spam-Status: No, hits=-1.0 required=5.0 tests=QUOTED_EMAIL_TEXT,REFERENCES version=2.55 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) Content-Transfer-Encoding: quoted-printable > I don't know if this would also be relevant, but you can check > 'nemo' directory on sources for devkbmap stuff. If you are setup > to get to sources, you'll find it here: that stuff is based on the fact that you have a keyboard that allows you to type the characters directly. well, i should make myself clear: all pc keyboards generate the same scan codes for the the same key (modulo weirdness) but the keytops have different symbols on them. eg: where a us keyboard would have qwerty i have azerty. when typing either sequence the same set of scan codes is generated. to make it more difficult not all of the characters can be typed directly; to get =EA [ê] i have to type ^ then e. japanese is a special case 'cos it has 4 character sets: - hiragana [phonetic set for japanese words] - katakana [phonetic set for foreign words] - kanji [the ideographs] - romaji [romanised representation] i've seen numerous systems and keyboards for doing this and other things (the various japanese on 9fans know better than i, obviously) and it's pretty nasty. some keyboards have the kana imposed on a qwerty keyboard and you use a 'shift' key to get at them. for typing the kanji, well the system i like is that you type the stem of the pronounciation and you then cycle through a set of ideographs until the one you want turns up. i'm not sure, but there should be no reason why such a system couldn't sort them by frequency, on a personalised basis. iirc the basic set of kanji is around 800, then there's a jump to 2000 and most newspapers use around 6000. reading them is hard enough, but in writing them you have to remember the 'stroke order', not some random set of strokes that will get you the character (this goes for the kana as well, but they are simple). --upas-tznywhkeomszxbjfcoayfwgwnv--