From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id LAA25401; Tue, 11 Sep 2001 11:13:47 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id LAA25861 for ; Tue, 11 Sep 2001 11:13:47 +0200 (MET DST) Received: from cahors.inria.fr (cahors.inria.fr [128.93.8.84]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f8B9DWL22144; Tue, 11 Sep 2001 11:13:32 +0200 (MET DST) Received: from localhost (IDENT:furuse@cahors.inria.fr [128.93.8.84]) by cahors.inria.fr (8.11.0/8.8.7) with ESMTP id f8B9DVS23017; Tue, 11 Sep 2001 11:13:31 +0200 Date: Tue, 11 Sep 2001 11:13:29 +0200 (CEST) Message-Id: <20010911.111329.35007088.Jun.Furuse@inria.fr> To: shoh@duonix.com Cc: caml-list@inria.fr Subject: Re: [Caml-list] Q: multibyte encoding for CJK From: "Jun P. FURUSE" In-Reply-To: <002801c13a77$cc751930$1e01a8c0@hama> References: <002801c13a77$cc751930$1e01a8c0@hama> X-Mailer: Mew version 2.0 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://pauillac.inria.fr/~furuse/ Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Hi, > When I tested mutibyte variables in caml-light, > it showed "Illegal character". > > Do you have any Idea > how to use multibyte variable for Chinese, Japan, Korean > in caml-light or ocaml? Camllight (and O'Caml) is not designed for multibyte Asian languages. In Camllight, the identifiers (variables) must begin with an "alphabet" followed by alphabets, numbers, _, or '. The "alphabets" are A-Z, a-z and the accented characters like á ç (in the HTML encoding). However, if you have enough luck, you can still use your Asian keywords. The condition is: you must use EUC (= extended unix code) encoding, and your identifier cannot contain any character code except 0xc0-0xd6 0xd8-0xf6 0xf8-0xff in Unix... (The legal upper-byte characters for identifiers are restricted to the European accented alphabets.) Well, as far as I know, this means that the use of Japanese identifiers is practically impossible. I am not an expert of Asian encodings, but I am afraid that so do Chinese and Korean. BTW, the use of your language inside strings "..." has no problem, if you use EUC encoding. But of course you will have trouble with string_length, sub_string, etc... Hope this helps, -- JPF ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr