From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 749C7BC4E for ; Sat, 19 Aug 2006 11:05:18 +0200 (CEST) Received: from furbychan.cocan.org (furbychan.cocan.org [80.68.91.176]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k7J95HHJ018897 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Sat, 19 Aug 2006 11:05:18 +0200 Received: from rich by furbychan.cocan.org with local (Exim 3.35 #1 (Debian)) id 1GEMlI-0000aa-00; Sat, 19 Aug 2006 10:05:00 +0100 Date: Sat, 19 Aug 2006 10:05:00 +0100 To: Jonathan Roewen Cc: OCaml Subject: Re: [Caml-list] Supporting unicode in ocaml... Message-ID: <20060819090500.GB2213@furbychan.cocan.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i From: Richard Jones X-Spam: no; 0.00; ocaml:01 ocaml:01 grammar:01 camomile:01 higher-level:01 camomile:01 lowercase:01 notepad:01 2006:98 sourceforge:01 constants:01 constants:01 wrote:01 caml-list:01 functions:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Sat, Aug 19, 2006 at 11:37:47AM +1200, Jonathan Roewen wrote: > Does the ocaml team ever plan on supporting unicode to some degree? > > What about being able to parse utf-8 encoded files, but keeping the > ascii only grammar? Then with the only change that if it's a utf-8 > file, that the utf-8 encoding of string constants are maintained. With > this scheme, you could theoretically bail on non-ascii characters > everywhere else. And a 3rd-party library like camomile could be used > for higher-level processing of the utf8-encoded string constants (from > the camomile docs, utf8 strings use the ocaml string type too). Have a look at Camomile: http://camomile.sourceforge.net/ Generally speaking, though, I just always use string == UTF-8 string and avoid using some of the unsafe functions from the standard library, such as String.lowercase. Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com