caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Ocaml interface to ctype.h functions
@ 2001-06-02  6:24 Shawn Wagner
  2001-06-02 13:25 ` Michael Hicks
  2001-06-05 16:29 ` Xavier Leroy
  0 siblings, 2 replies; 9+ messages in thread
From: Shawn Wagner @ 2001-06-02  6:24 UTC (permalink / raw)
  To: caml-list

I've been working on some projects recently where it would be nice to have
access to the ctype.h character classification functions (isalpha(),
isspace(), etc.) in Ocaml, and couldn't find anything like them in a search
through the standard library. It's easy to whip up a library for this, but
before doing so, I thought I'd ask if there's any plans to put them in the
Character module or some other place it makes sense to have them. If it's
just a matter of waiting for someone to do it, I'm willing to volunteer, as
I'd probably be doing it anyways on my own.


-- 
Shawn Wagner
shawnw@speakeasy.org
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Caml-list] Ocaml interface to ctype.h functions
  2001-06-02  6:24 [Caml-list] Ocaml interface to ctype.h functions Shawn Wagner
@ 2001-06-02 13:25 ` Michael Hicks
  2001-06-02 21:04   ` Shawn Wagner
  2001-06-05 16:29 ` Xavier Leroy
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Hicks @ 2001-06-02 13:25 UTC (permalink / raw)
  To: Shawn Wagner, caml-list

Perhaps these are things that could/should be added to the Char module?
Mike

> -----Original Message-----
> From: owner-caml-list@pauillac.inria.fr
> [mailto:owner-caml-list@pauillac.inria.fr]On Behalf Of Shawn Wagner
> Sent: Saturday, June 02, 2001 2:25 AM
> To: caml-list@inria.fr
> Subject: [Caml-list] Ocaml interface to ctype.h functions
>
>
> I've been working on some projects recently where it would be nice to have
> access to the ctype.h character classification functions (isalpha(),
> isspace(), etc.) in Ocaml, and couldn't find anything like them
> in a search
> through the standard library. It's easy to whip up a library for this, but
> before doing so, I thought I'd ask if there's any plans to put them in the
> Character module or some other place it makes sense to have them. If it's
> just a matter of waiting for someone to do it, I'm willing to
> volunteer, as
> I'd probably be doing it anyways on my own.
>
>
> --
> Shawn Wagner
> shawnw@speakeasy.org
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr.  Archives:
http://caml.inria.fr

-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocaml interface to ctype.h functions
  2001-06-02 13:25 ` Michael Hicks
@ 2001-06-02 21:04   ` Shawn Wagner
       [not found]     ` <shawnw@speakeasy.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Shawn Wagner @ 2001-06-02 21:04 UTC (permalink / raw)
  To: caml-list

On Sat, Jun 02, 2001 at 09:25:16AM -0400, Michael Hicks wrote:
> Perhaps these are things that could/should be added to the Char module?
> Mike

Yeah. Not sure why I said Character when I really meant Char. 


-- 
Shawn Wagner
shawnw@speakeasy.org
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocaml interface to ctype.h functions
       [not found]     ` <shawnw@speakeasy.org>
@ 2001-06-05  7:35       ` Luc MAZARDO
  2001-06-05 13:59         ` Shawn Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Luc MAZARDO @ 2001-06-05  7:35 UTC (permalink / raw)
  To: caml-list


You can see at :
http://www.ocaml.org/bin/caml-bugs/feature%20wish?id=57;user=guest#followups

Actually, Char.{upper,lowercase} seems to work only for ASCII systems. :(

Friendly.
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocaml interface to ctype.h functions
  2001-06-05  7:35       ` Luc MAZARDO
@ 2001-06-05 13:59         ` Shawn Wagner
  0 siblings, 0 replies; 9+ messages in thread
From: Shawn Wagner @ 2001-06-05 13:59 UTC (permalink / raw)
  To: caml-list

On Tue, Jun 05, 2001 at 09:35:58AM +0200, Luc MAZARDO wrote:
> 
> You can see at :
> http://www.ocaml.org/bin/caml-bugs/feature%20wish?id=57;user=guest#followups
>

Hmm. Too bad that's over a year old, still not in the distribution (At
least, not the relevant bits), and works only with ascii and English. At
least now I know where to send the patch when I find the time to actually do
it... hopefully sometime this week.

> Actually, Char.{upper,lowercase} seems to work only for ASCII systems. :(
> 

I'll probably re-do them to use the C toupper()/tolower() functions. Might
have to add Sys.set_locale as well while I'm at it.

-- 
Shawn Wagner
shawnw@speakeasy.org
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocaml interface to ctype.h functions
  2001-06-02  6:24 [Caml-list] Ocaml interface to ctype.h functions Shawn Wagner
  2001-06-02 13:25 ` Michael Hicks
@ 2001-06-05 16:29 ` Xavier Leroy
  2001-06-05 16:44   ` Sylvain Kerjean
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Xavier Leroy @ 2001-06-05 16:29 UTC (permalink / raw)
  To: caml-list; +Cc: shawnw

> I've been working on some projects recently where it would be nice to have
> access to the ctype.h character classification functions (isalpha(),
> isspace(), etc.) in Ocaml, and couldn't find anything like them in a search
> through the standard library. It's easy to whip up a library for this, but
> before doing so, I thought I'd ask if there's any plans to put them in the
> Character module or some other place it makes sense to have them.

It would make sense to have classification functions in the Char
module.  The main issue is: what is a letter?, or: how to deal with
character sets.

If only one, fixed character set is supported (e.g. US-ASCII or
Latin-1), it's truly easy, but will not satisfy everyone.  OCaml has
already been criticized for supporting ISO Latin-1 accented letters in
identifiers!  (Look at the caml-list archives if you don't believe me.)

Building on the C functions isalpha(), etc, is a bit of a cop-out,
because then we're dependent on what these functions actually do on a
variety of Unix, Windows and Macintosh systems.  In particular, we
become dependent on the ISO C internationalization framework ("locales"),
which I think is a mess because it relies too much on a global state
(the current locale).

To give an example of the kind of problems I fear, just doing
setlocale(LC_ALL, "fr_FR") in an OCaml program causes
float_of_string "3.14" to return 0.0.  Guess why?  float_of_string
relies on the C function atof(), which is internationalized, and
doesn't recognize "." as a decimal point -- French uses a "," instead...

Finally, there's the Unicode approach.  Letters, etc, are well defined
without reference to a "locale" or whatever piece of state.  But then
we've just shifted the problem to a more general one: retrofitting
Unicode into OCaml, which again has been the subject of lively
discussions on this mailing list :-)

> If it's
> just a matter of waiting for someone to do it, I'm willing to volunteer, as
> I'd probably be doing it anyways on my own.

It's mostly a matter of knowing what we want these classification
functions to do.  Meanwhile, it might be easier to define your own
isalpha, etc, predicates; at least you get to choose the encoding!
Besides, it's really easy using pattern-matching, e.g. for ASCII:

let isalpha = function 'A'..'Z'|'a'..'z' -> true | _ -> false

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocaml interface to ctype.h functions
  2001-06-05 16:29 ` Xavier Leroy
@ 2001-06-05 16:44   ` Sylvain Kerjean
  2001-06-05 18:17   ` Chris Hecker
  2001-06-11 16:00   ` Shawn Wagner
  2 siblings, 0 replies; 9+ messages in thread
From: Sylvain Kerjean @ 2001-06-05 16:44 UTC (permalink / raw)
  To: caml-list

Xavier Leroy wrote:
> 
> It's mostly a matter of knowing what we want these classification
> functions to do.  Meanwhile, it might be easier to define your own
> isalpha, etc, predicates; at least you get to choose the encoding!
> Besides, it's really easy using pattern-matching, e.g. for ASCII:
> 
> let isalpha = function 'A'..'Z'|'a'..'z' -> true | _ -> false
>

I did it at the compiler level, shortcutting (is it english ??) the call
to  the C isalpha function
in the lexer and coded my own in order to support accented letters in my
compiler (it worked bad on BeOS). But i realize i coudln't share my
programs anymore with an unmodified compiler :/

So I think it is a good idea to have standard characters let's say
simple ascii and let users
re-implement an isalpha() predicate at the user level ...
And if there are some good unicode coders here, i am greatfully
interested !


-- 
Sylvain Kerjean
IRISA-INRIA, Campus de Beaulieu, 35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 75 99, Fax: +33 (0) 2 99 84 71 71
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocaml interface to ctype.h functions
  2001-06-05 16:29 ` Xavier Leroy
  2001-06-05 16:44   ` Sylvain Kerjean
@ 2001-06-05 18:17   ` Chris Hecker
  2001-06-11 16:00   ` Shawn Wagner
  2 siblings, 0 replies; 9+ messages in thread
From: Chris Hecker @ 2001-06-05 18:17 UTC (permalink / raw)
  To: Xavier Leroy, caml-list; +Cc: shawnw


>It's mostly a matter of knowing what we want these classification
>functions to do.  Meanwhile, it might be easier to define your own
>isalpha, etc, predicates; at least you get to choose the encoding!
>Besides, it's really easy using pattern-matching, e.g. for ASCII:

But, we've got someone willing to do the work for everybody, so it seems a shame not to use it just because we can't figure out what the 100% Right Thing is.  

How about putting a nested ASCII (or ISO_Latin_1, or whatever) module inside Char, and putting the ascii specific functions in there?  That way it's clear what they're doing and what they support, and we still get to use the fruits of his free labor.  ;)

And then you can add a Unicode one, when we find somebody to do that, etc.

Chris


-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocaml interface to ctype.h functions
  2001-06-05 16:29 ` Xavier Leroy
  2001-06-05 16:44   ` Sylvain Kerjean
  2001-06-05 18:17   ` Chris Hecker
@ 2001-06-11 16:00   ` Shawn Wagner
  2 siblings, 0 replies; 9+ messages in thread
From: Shawn Wagner @ 2001-06-11 16:00 UTC (permalink / raw)
  To: caml-list

On Tue, Jun 05, 2001 at 06:29:09PM +0200, Xavier Leroy wrote:
> > I've been working on some projects recently where it would be nice to have
> > access to the ctype.h character classification functions (isalpha(),
> > isspace(), etc.) in Ocaml, and couldn't find anything like them in a search
> > through the standard library. It's easy to whip up a library for this, but
> > before doing so, I thought I'd ask if there's any plans to put them in the
> > Character module or some other place it makes sense to have them.
> 
> It would make sense to have classification functions in the Char
> module.  The main issue is: what is a letter?, or: how to deal with
> character sets.
> 
> If only one, fixed character set is supported (e.g. US-ASCII or
> Latin-1), it's truly easy, but will not satisfy everyone.  OCaml has
> already been criticized for supporting ISO Latin-1 accented letters in
> identifiers!  (Look at the caml-list archives if you don't believe me.)
> 
> Building on the C functions isalpha(), etc, is a bit of a cop-out,
> because then we're dependent on what these functions actually do on a
> variety of Unix, Windows and Macintosh systems.  In particular, we
> become dependent on the ISO C internationalization framework ("locales"),
> which I think is a mess because it relies too much on a global state
> (the current locale).

Okay, I've done the isFOO() and setlocale() interface as a seperate library
for now, and will release it soon (Like, tonight). Am I correct in assuming
that it's not likely to make it into the standard library based on the
above, though?

I've discovered that setlocale() of LC_CTYPE is done already by the runtime,
by a function used in Char.escaped... so if locales are a mess, they're a
mess ocaml is already stuck with.  Also, someone else is asking about a way
to set LC_NUMERIC from in ocaml, so I'm not alone in having a need for
setlocale, at least.


> 
> To give an example of the kind of problems I fear, just doing
> setlocale(LC_ALL, "fr_FR") in an OCaml program causes
> float_of_string "3.14" to return 0.0.  Guess why?  float_of_string
> relies on the C function atof(), which is internationalized, and
> doesn't recognize "." as a decimal point -- French uses a "," instead...

This is why LC_ALL is bad, and why it's better to just use the specific
locale categories you want.


-- 
Shawn Wagner
shawnw@speakeasy.org
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-06-11 15:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-02  6:24 [Caml-list] Ocaml interface to ctype.h functions Shawn Wagner
2001-06-02 13:25 ` Michael Hicks
2001-06-02 21:04   ` Shawn Wagner
     [not found]     ` <shawnw@speakeasy.org>
2001-06-05  7:35       ` Luc MAZARDO
2001-06-05 13:59         ` Shawn Wagner
2001-06-05 16:29 ` Xavier Leroy
2001-06-05 16:44   ` Sylvain Kerjean
2001-06-05 18:17   ` Chris Hecker
2001-06-11 16:00   ` Shawn Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).