From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA16769; Tue, 5 Jun 2001 18:29:15 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA16713 for ; Tue, 5 Jun 2001 18:29:14 +0200 (MET DST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f55GTAr10365; Tue, 5 Jun 2001 18:29:10 +0200 (MET DST) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA16680; Tue, 5 Jun 2001 18:29:09 +0200 (MET DST) Date: Tue, 5 Jun 2001 18:29:09 +0200 From: Xavier Leroy To: caml-list@inria.fr Cc: shawnw@speakeasy.org Subject: Re: [Caml-list] Ocaml interface to ctype.h functions Message-ID: <20010605182909.A16268@pauillac.inria.fr> References: <20010601232433.A22189@speakeasy.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20010601232433.A22189@speakeasy.org>; from shawnw@speakeasy.org on Fri, Jun 01, 2001 at 11:24:33PM -0700 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > I've been working on some projects recently where it would be nice to have > access to the ctype.h character classification functions (isalpha(), > isspace(), etc.) in Ocaml, and couldn't find anything like them in a search > through the standard library. It's easy to whip up a library for this, but > before doing so, I thought I'd ask if there's any plans to put them in the > Character module or some other place it makes sense to have them. It would make sense to have classification functions in the Char module. The main issue is: what is a letter?, or: how to deal with character sets. If only one, fixed character set is supported (e.g. US-ASCII or Latin-1), it's truly easy, but will not satisfy everyone. OCaml has already been criticized for supporting ISO Latin-1 accented letters in identifiers! (Look at the caml-list archives if you don't believe me.) Building on the C functions isalpha(), etc, is a bit of a cop-out, because then we're dependent on what these functions actually do on a variety of Unix, Windows and Macintosh systems. In particular, we become dependent on the ISO C internationalization framework ("locales"), which I think is a mess because it relies too much on a global state (the current locale). To give an example of the kind of problems I fear, just doing setlocale(LC_ALL, "fr_FR") in an OCaml program causes float_of_string "3.14" to return 0.0. Guess why? float_of_string relies on the C function atof(), which is internationalized, and doesn't recognize "." as a decimal point -- French uses a "," instead... Finally, there's the Unicode approach. Letters, etc, are well defined without reference to a "locale" or whatever piece of state. But then we've just shifted the problem to a more general one: retrofitting Unicode into OCaml, which again has been the subject of lively discussions on this mailing list :-) > If it's > just a matter of waiting for someone to do it, I'm willing to volunteer, as > I'd probably be doing it anyways on my own. It's mostly a matter of knowing what we want these classification functions to do. Meanwhile, it might be easier to define your own isalpha, etc, predicates; at least you get to choose the encoding! Besides, it's really easy using pattern-matching, e.g. for ASCII: let isalpha = function 'A'..'Z'|'a'..'z' -> true | _ -> false - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr. Archives: http://caml.inria.fr