From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rich@annexia.org>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78])
	by yquem.inria.fr (Postfix) with ESMTP id 749C7BC4E
	for <caml-list@yquem.inria.fr>; Sat, 19 Aug 2006 11:05:18 +0200 (CEST)
Received: from furbychan.cocan.org (furbychan.cocan.org [80.68.91.176])
	by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k7J95HHJ018897
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <caml-list@yquem.inria.fr>; Sat, 19 Aug 2006 11:05:18 +0200
Received: from rich by furbychan.cocan.org with local (Exim 3.35 #1 (Debian))
	id 1GEMlI-0000aa-00; Sat, 19 Aug 2006 10:05:00 +0100
Date: Sat, 19 Aug 2006 10:05:00 +0100
To: Jonathan Roewen <jonathan.roewen@gmail.com>
Cc: OCaml <caml-list@yquem.inria.fr>
Subject: Re: [Caml-list] Supporting unicode in ocaml...
Message-ID: <20060819090500.GB2213@furbychan.cocan.org>
References: <ad8cfe7e0608181637s15bded9cg2e1c2a4ca2927703@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ad8cfe7e0608181637s15bded9cg2e1c2a4ca2927703@mail.gmail.com>
User-Agent: Mutt/1.5.9i
From: Richard Jones <rich@annexia.org>
X-Spam: no; 0.00; ocaml:01 ocaml:01 grammar:01 camomile:01 higher-level:01 camomile:01 lowercase:01 notepad:01 2006:98 sourceforge:01 constants:01 constants:01 wrote:01 caml-list:01 functions:01 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3

On Sat, Aug 19, 2006 at 11:37:47AM +1200, Jonathan Roewen wrote:
> Does the ocaml team ever plan on supporting unicode to some degree?
> 
> What about being able to parse utf-8 encoded files, but keeping the
> ascii only grammar? Then with the only change that if it's a utf-8
> file, that the utf-8 encoding of string constants are maintained. With
> this scheme, you could theoretically bail on non-ascii characters
> everywhere else. And a 3rd-party library like camomile could be used
> for higher-level processing of the utf8-encoded string constants (from
> the camomile docs, utf8 strings use the ocaml string type too).

Have a look at Camomile:

http://camomile.sourceforge.net/

Generally speaking, though, I just always use string == UTF-8 string
and avoid using some of the unsafe functions from the standard
library, such as String.lowercase.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com