caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Supporting unicode in ocaml...
@ 2006-08-18 23:37 Jonathan Roewen
  2006-08-19  9:05 ` Richard Jones
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Roewen @ 2006-08-18 23:37 UTC (permalink / raw)
  To: OCaml

Hi,

Does the ocaml team ever plan on supporting unicode to some degree?

What about being able to parse utf-8 encoded files, but keeping the
ascii only grammar? Then with the only change that if it's a utf-8
file, that the utf-8 encoding of string constants are maintained. With
this scheme, you could theoretically bail on non-ascii characters
everywhere else. And a 3rd-party library like camomile could be used
for higher-level processing of the utf8-encoded string constants (from
the camomile docs, utf8 strings use the ocaml string type too).

I must admit that I have no idea how complex even a seemingly small
change like this may be, but at least the detection of the byte order
mark should make it a compatible change...

Jonathan


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Supporting unicode in ocaml...
  2006-08-18 23:37 [Caml-list] Supporting unicode in ocaml Jonathan Roewen
@ 2006-08-19  9:05 ` Richard Jones
  2006-08-19 11:02   ` Jonathan Roewen
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Jones @ 2006-08-19  9:05 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: OCaml

On Sat, Aug 19, 2006 at 11:37:47AM +1200, Jonathan Roewen wrote:
> Does the ocaml team ever plan on supporting unicode to some degree?
> 
> What about being able to parse utf-8 encoded files, but keeping the
> ascii only grammar? Then with the only change that if it's a utf-8
> file, that the utf-8 encoding of string constants are maintained. With
> this scheme, you could theoretically bail on non-ascii characters
> everywhere else. And a 3rd-party library like camomile could be used
> for higher-level processing of the utf8-encoded string constants (from
> the camomile docs, utf8 strings use the ocaml string type too).

Have a look at Camomile:

http://camomile.sourceforge.net/

Generally speaking, though, I just always use string == UTF-8 string
and avoid using some of the unsafe functions from the standard
library, such as String.lowercase.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Supporting unicode in ocaml...
  2006-08-19  9:05 ` Richard Jones
@ 2006-08-19 11:02   ` Jonathan Roewen
  2006-08-19 12:41     ` Peter Jolly
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Roewen @ 2006-08-19 11:02 UTC (permalink / raw)
  To: Richard Jones; +Cc: OCaml

> Have a look at Camomile:
>
> http://camomile.sourceforge.net/
>
> Generally speaking, though, I just always use string == UTF-8 string
> and avoid using some of the unsafe functions from the standard
> library, such as String.lowercase.

Yes I did (as if you read closer, I did mention). But being able to
use actual utf8 rather than manually encoding characters in the string
is what would be nice. Maybe I just need a script to preprocess
sources in utf8...


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Supporting unicode in ocaml...
  2006-08-19 11:02   ` Jonathan Roewen
@ 2006-08-19 12:41     ` Peter Jolly
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Jolly @ 2006-08-19 12:41 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: caml-list

Jonathan Roewen wrote:
> Yes I did (as if you read closer, I did mention). But being able to
> use actual utf8 rather than manually encoding characters in the string
> is what would be nice. Maybe I just need a script to preprocess
> sources in utf8...

AFAIK, all you need to do is get rid of the BOM, which is redundant in
UTF-8 anyway.  I have no problems at all using UTF-8 encoded string
constants in OCaml, and Camomile's UTF8 module then handles them
seamlessly.  YMMV.

(There is one minor issue, which is that there are legal OCaml
identifiers which become illegal when the file is encoded in UTF-8 --
namely those that use accented letters.)


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-08-19 12:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-18 23:37 [Caml-list] Supporting unicode in ocaml Jonathan Roewen
2006-08-19  9:05 ` Richard Jones
2006-08-19 11:02   ` Jonathan Roewen
2006-08-19 12:41     ` Peter Jolly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).