caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: David Allsopp <dra-news@metastack.com>
To: "'Christophe TROESTLER'" <Christophe.Troestler@umons.ac.be>,
	"'OCaml Mailing List'" <caml-list@inria.fr>
Subject: RE: [Caml-list] GSoC: better UTF-8 support
Date: Mon, 28 Feb 2011 10:07:10 +0000	[thread overview]
Message-ID: <E51C5B015DBD1348A1D85763337FB6D949100C3C@Remus.metastack.local> (raw)
In-Reply-To: <20110228.093528.996524125295855263.Christophe.Troestler@umons.ac.be>

Christophe TROESTLER wrote:
> - UTF8.Char and UTF8.String modules should be written with the same
>   interface as Char and String.  [Camomile should be adapted
>   consequently.]

Thinking of conventions like Unix/Pervasives.LargeFile, Bigarray.Genarray, Bigarray.Array1, etc. wouldn't it be better for these to be Char.UTF8 and String.UTF8? 

> - Printf/Scanf: %U of %cu for UTF8.Char.t
> 
> - Graphics: UTF-8 text printing
> 
> - Str: (character ranges)

If UTF-8 support is added to the standard library then it should be added everywhere where strings are manipulated or used - which rears the potentially ugly prospect of the Unix module?

> The questions are: would such changes be beneficial to you?

Personally, yes - it's an annoying limitation that you have to pull in a 3rd party library when all you want to do is handle a couple of accented characters accurately (my point being that not every application which needs UTF-8 needs it as a priority feature and isn't necessarily manipulating terabytes of data so requires completely optimised processing).

IMO it'd be better to have a standard library only supporting one particular Unicode encoding with a perhaps imperfect interface over a non-optimal storage representation than to have no support whatsoever, especially given that there are very good 3rd party libraries which provide the optimal (and with it, slightly more complex) implementations.

> Are there other issues to address?

I found this very old archive thread but it still poses some potentially relevant points: http://caml.inria.fr/pub/old_caml_site/caml-list/1224.html

> Is this enough for a GSoc proposal (seems a little light to me)?

I would posit that if this included the Unix module then it's a very big proposal!

> If it is done, is there a chance to have this work included in the standard distribution?

If the patches themselves are as potentially small as suggested (so maintenance issues aren't vastly increased) and the interfaces remain compatible (so nothing breaks) then it seems reasonable to hope, doesn't it?


David


  parent reply	other threads:[~2011-02-28 10:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-28  8:35 Christophe TROESTLER
2011-02-28  8:58 ` Daniel Bünzli
2011-02-28 10:07   ` David Allsopp
2011-02-28 11:21     ` Daniel Bünzli
2011-02-28 11:46       ` David Allsopp
2011-02-28 12:32         ` Daniel Bünzli
2011-02-28 12:59           ` [Caml-list] " Sylvain Le Gall
2011-02-28 10:59   ` Sylvain Le Gall
2011-02-28 14:39   ` [Caml-list] " David Rajchenbach-Teller
2011-02-28 10:07 ` David Allsopp [this message]
     [not found]   ` <20110228.143157.1265982603697554449.Christophe.Troestler+ocaml@umons.ac.be>
2011-02-28 14:11     ` Daniel Bünzli
2011-02-28 14:57       ` Dario Teixeira
2011-02-28 14:13 ` Gerd Stolpmann
2011-02-28 14:31   ` [Caml-list] " Sylvain Le Gall
2011-02-28 15:09   ` [Caml-list] " Dario Teixeira
2011-02-28 15:50   ` David Allsopp
2011-03-01  5:49     ` [Caml-list] " Yoriyuki Yamagata
2011-02-28 14:21 ` [Caml-list] " Michael Ekstrand
2011-03-03 15:37 ` Damien Doligez
2011-03-03 16:42   ` Dario Teixeira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E51C5B015DBD1348A1D85763337FB6D949100C3C@Remus.metastack.local \
    --to=dra-news@metastack.com \
    --cc=Christophe.Troestler@umons.ac.be \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).