caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: caml-list@inria.fr
Cc: shawnw@speakeasy.org
Subject: Re: [Caml-list] Ocaml interface to ctype.h functions
Date: Tue, 5 Jun 2001 18:29:09 +0200	[thread overview]
Message-ID: <20010605182909.A16268@pauillac.inria.fr> (raw)
In-Reply-To: <20010601232433.A22189@speakeasy.org>; from shawnw@speakeasy.org on Fri, Jun 01, 2001 at 11:24:33PM -0700

> I've been working on some projects recently where it would be nice to have
> access to the ctype.h character classification functions (isalpha(),
> isspace(), etc.) in Ocaml, and couldn't find anything like them in a search
> through the standard library. It's easy to whip up a library for this, but
> before doing so, I thought I'd ask if there's any plans to put them in the
> Character module or some other place it makes sense to have them.

It would make sense to have classification functions in the Char
module.  The main issue is: what is a letter?, or: how to deal with
character sets.

If only one, fixed character set is supported (e.g. US-ASCII or
Latin-1), it's truly easy, but will not satisfy everyone.  OCaml has
already been criticized for supporting ISO Latin-1 accented letters in
identifiers!  (Look at the caml-list archives if you don't believe me.)

Building on the C functions isalpha(), etc, is a bit of a cop-out,
because then we're dependent on what these functions actually do on a
variety of Unix, Windows and Macintosh systems.  In particular, we
become dependent on the ISO C internationalization framework ("locales"),
which I think is a mess because it relies too much on a global state
(the current locale).

To give an example of the kind of problems I fear, just doing
setlocale(LC_ALL, "fr_FR") in an OCaml program causes
float_of_string "3.14" to return 0.0.  Guess why?  float_of_string
relies on the C function atof(), which is internationalized, and
doesn't recognize "." as a decimal point -- French uses a "," instead...

Finally, there's the Unicode approach.  Letters, etc, are well defined
without reference to a "locale" or whatever piece of state.  But then
we've just shifted the problem to a more general one: retrofitting
Unicode into OCaml, which again has been the subject of lively
discussions on this mailing list :-)

> If it's
> just a matter of waiting for someone to do it, I'm willing to volunteer, as
> I'd probably be doing it anyways on my own.

It's mostly a matter of knowing what we want these classification
functions to do.  Meanwhile, it might be easier to define your own
isalpha, etc, predicates; at least you get to choose the encoding!
Besides, it's really easy using pattern-matching, e.g. for ASCII:

let isalpha = function 'A'..'Z'|'a'..'z' -> true | _ -> false

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


  parent reply	other threads:[~2001-06-05 16:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-06-02  6:24 Shawn Wagner
2001-06-02 13:25 ` Michael Hicks
2001-06-02 21:04   ` Shawn Wagner
     [not found]     ` <shawnw@speakeasy.org>
2001-06-05  7:35       ` Luc MAZARDO
2001-06-05 13:59         ` Shawn Wagner
2001-06-05 16:29 ` Xavier Leroy [this message]
2001-06-05 16:44   ` Sylvain Kerjean
2001-06-05 18:17   ` Chris Hecker
2001-06-11 16:00   ` Shawn Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20010605182909.A16268@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=shawnw@speakeasy.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).