From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p1SAC7M9022623 for ; Mon, 28 Feb 2011 11:12:07 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvQBAL4Da01RZ90xkWdsb2JhbACmRRUBAQEBCQsKBxEDIrtehWEEjB2MEQ X-IronPort-AV: E=Sophos;i="4.62,238,1297033200"; d="scan'208";a="88818987" Received: from mtaout03-winn.ispmail.ntl.com ([81.103.221.49]) by mail4-smtp-sop.national.inria.fr with ESMTP; 28 Feb 2011 11:12:02 +0100 Received: from aamtaout03-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout03-winn.ispmail.ntl.com (InterMail vM.7.08.04.00 201-2186-134-20080326) with ESMTP id <20110228101201.YFUK23441.mtaout03-winn.ispmail.ntl.com@aamtaout03-winn.ispmail.ntl.com>; Mon, 28 Feb 2011 10:12:01 +0000 Received: from romulus.metastack.com ([81.102.132.77]) by aamtaout03-winn.ispmail.ntl.com (InterMail vG.3.00.04.00 201-2196-133-20080908) with ESMTP id <20110228101201.YECG28282.aamtaout03-winn.ispmail.ntl.com@romulus.metastack.com>; Mon, 28 Feb 2011 10:12:01 +0000 Received: from remus.metastack.local ([172.16.0.1]) by romulus.metastack.com (8.14.2/8.14.2) with ESMTP id p1SABw4P003367 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Mon, 28 Feb 2011 10:11:58 GMT Received: from Remus.metastack.local ([fe80::547c:3c42:e1da:eda2]) by Remus.metastack.local ([fe80::547c:3c42:e1da:eda2%11]) with mapi; Mon, 28 Feb 2011 10:07:11 +0000 From: David Allsopp To: "'Christophe TROESTLER'" , "'OCaml Mailing List'" Thread-Topic: [Caml-list] GSoC: better UTF-8 support Thread-Index: AQHL1yHbpO6NgLMzxEmbDuJZqeBtYpQWq6YQ Date: Mon, 28 Feb 2011 10:07:10 +0000 Message-ID: References: <20110228.093528.996524125295855263.Christophe.Troestler@umons.ac.be> In-Reply-To: <20110228.093528.996524125295855263.Christophe.Troestler@umons.ac.be> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Organization: MetaStack Solutions Ltd. X-Scanned-By: MIMEDefang 2.65 on 81.102.132.77 X-Cloudmark-Analysis: v=1.1 cv=JvdXmxIgLJv2/GthKqHpGJEEHukvLcvELVXUanXFreg= c=1 sm=0 a=fAv-q_DikvIA:10 a=kj9zAlcOel0A:10 a=xqWC_Br6kY4A:10 a=ZOzjf2MOAAAA:8 a=uTjX2-eT7FZk5MFd6zgA:9 a=1MpDasaEKNaETqzqt9IA:7 a=JECzj2uYMXN6EWKr0c5eGgoDQ-MA:4 a=CjuIK1q_8ugA:10 a=gUSgYEiD6poA:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id p1SAC7M9022623 Subject: RE: [Caml-list] GSoC: better UTF-8 support Christophe TROESTLER wrote: > - UTF8.Char and UTF8.String modules should be written with the same > interface as Char and String. [Camomile should be adapted > consequently.] Thinking of conventions like Unix/Pervasives.LargeFile, Bigarray.Genarray, Bigarray.Array1, etc. wouldn't it be better for these to be Char.UTF8 and String.UTF8? > - Printf/Scanf: %U of %cu for UTF8.Char.t > > - Graphics: UTF-8 text printing > > - Str: (character ranges) If UTF-8 support is added to the standard library then it should be added everywhere where strings are manipulated or used - which rears the potentially ugly prospect of the Unix module? > The questions are: would such changes be beneficial to you? Personally, yes - it's an annoying limitation that you have to pull in a 3rd party library when all you want to do is handle a couple of accented characters accurately (my point being that not every application which needs UTF-8 needs it as a priority feature and isn't necessarily manipulating terabytes of data so requires completely optimised processing). IMO it'd be better to have a standard library only supporting one particular Unicode encoding with a perhaps imperfect interface over a non-optimal storage representation than to have no support whatsoever, especially given that there are very good 3rd party libraries which provide the optimal (and with it, slightly more complex) implementations. > Are there other issues to address? I found this very old archive thread but it still poses some potentially relevant points: http://caml.inria.fr/pub/old_caml_site/caml-list/1224.html > Is this enough for a GSoc proposal (seems a little light to me)? I would posit that if this included the Unix module then it's a very big proposal! > If it is done, is there a chance to have this work included in the standard distribution? If the patches themselves are as potentially small as suggested (so maintenance issues aren't vastly increased) and the interfaces remain compatible (so nothing breaks) then it seems reasonable to hope, doesn't it? David