From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id CFBA1BB83 for ; Sat, 19 Aug 2006 01:37:48 +0200 (CEST) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.184]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k7INbmba023560 for ; Sat, 19 Aug 2006 01:37:48 +0200 Received: by nf-out-0910.google.com with SMTP id g2so1671124nfe for ; Fri, 18 Aug 2006 16:37:48 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=C6uovotZhzda40meYtrP0BPTR4ec+X0QHhCbFi2vIWk1+0PybB79+kjW1GyA5lDSt9GkMVEASkrKTkPiHQo35377ayxPVVy40JAz+f68LjGfmwXbG5uSvWfWXmvyrzUZNlatGsna9DdCYvKkdx6SjpNxWdh0+R6gP7PkjKRX0f4= Received: by 10.49.10.3 with SMTP id n3mr4674640nfi; Fri, 18 Aug 2006 16:37:48 -0700 (PDT) Received: by 10.78.194.3 with HTTP; Fri, 18 Aug 2006 16:37:47 -0700 (PDT) Message-ID: Date: Sat, 19 Aug 2006 11:37:47 +1200 From: "Jonathan Roewen" To: OCaml Subject: [Caml-list] Supporting unicode in ocaml... MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Spam: no; 0.00; ocaml:01 ocaml:01 grammar:01 camomile:01 higher-level:01 camomile:01 byte:01 constants:01 constants:01 caml-list:01 strings:01 parse:02 encoding:02 string:02 string:02 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=RCVD_BY_IP autolearn=disabled version=3.0.3 Hi, Does the ocaml team ever plan on supporting unicode to some degree? What about being able to parse utf-8 encoded files, but keeping the ascii only grammar? Then with the only change that if it's a utf-8 file, that the utf-8 encoding of string constants are maintained. With this scheme, you could theoretically bail on non-ascii characters everywhere else. And a 3rd-party library like camomile could be used for higher-level processing of the utf8-encoded string constants (from the camomile docs, utf8 strings use the ocaml string type too). I must admit that I have no idea how complex even a seemingly small change like this may be, but at least the detection of the byte order mark should make it a compatible change... Jonathan