caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Q: multibyte encoding for CJK
@ 2001-09-11  4:11 SooHyoung Oh
  2001-09-11  9:13 ` Jun P. FURUSE
  0 siblings, 1 reply; 2+ messages in thread
From: SooHyoung Oh @ 2001-09-11  4:11 UTC (permalink / raw)
  To: caml-list


Hi!
When I tested mutibyte variables in caml-light,
it showed "Illegal character".

Do you have any Idea
how to use multibyte variable for Chinese, Japan, Korean
in caml-light or ocaml?

===
SooHyoung Oh
Email: shoh@duonix.com      Web: www.duonix.com
Tel.: 02-3413-3730                C.P.: 011-453-4303


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Caml-list] Q: multibyte encoding for CJK
  2001-09-11  4:11 [Caml-list] Q: multibyte encoding for CJK SooHyoung Oh
@ 2001-09-11  9:13 ` Jun P. FURUSE
  0 siblings, 0 replies; 2+ messages in thread
From: Jun P. FURUSE @ 2001-09-11  9:13 UTC (permalink / raw)
  To: shoh; +Cc: caml-list

Hi,

> When I tested mutibyte variables in caml-light,
> it showed "Illegal character".
> 
> Do you have any Idea
> how to use multibyte variable for Chinese, Japan, Korean
> in caml-light or ocaml?

Camllight (and O'Caml) is not designed for multibyte Asian languages.
In Camllight, the identifiers (variables) must begin with 
an "alphabet" followed by alphabets, numbers, _, or '.

The "alphabets" are A-Z, a-z and the accented characters like
á ç (in the HTML encoding).

However, if you have enough luck, you can still use your Asian
keywords. The condition is: you must use EUC (= extended unix code) 
encoding, and your identifier cannot contain any character code except
0xc0-0xd6 0xd8-0xf6 0xf8-0xff in Unix... (The legal upper-byte
characters for identifiers are restricted to the European accented
alphabets.)

Well, as far as I know, this means that the use of Japanese
identifiers is practically impossible. I am not an expert of Asian
encodings, but I am afraid that so do Chinese and Korean.

BTW, the use of your language inside strings "..." has no problem, 
if you use EUC encoding. But of course you will have trouble with 
string_length, sub_string, etc... 

Hope this helps,
--
JPF
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-09-11  9:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-11  4:11 [Caml-list] Q: multibyte encoding for CJK SooHyoung Oh
2001-09-11  9:13 ` Jun P. FURUSE

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).