From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.7 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE, DNS_FROM_RFC_POST autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 2DA88BBAF for ; Thu, 13 Aug 2009 15:14:41 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiMDAP6tg0pDww+LnGdsb2JhbACSOogsAQEBAQEICwgJE6cUgSKRBQEFAwGEFQWBTA X-IronPort-AV: E=Sophos;i="4.43,374,1246831200"; d="scan'208";a="34363863" Received: from web111502.mail.gq1.yahoo.com ([67.195.15.139]) by mail1-smtp-roc.national.inria.fr with SMTP; 13 Aug 2009 15:14:40 +0200 Received: (qmail 28913 invoked by uid 60001); 13 Aug 2009 13:14:39 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1250169279; bh=9a5XMKUpc2GFsNHegTCssIPERCOSuIQwdLk8LYhFRHg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=FyMeCn3ZWyH3lkRyW6WtHW++3a77ZkD1BLfI6CEGwcK8TCHehT3eWmuS6z4IbG/Z5s7CtH9kTir8kPcWfEnj/qY/mVhsdzHcyDKfL+pVRgAvz/DDJD3HydgZJryEIZ73Bd9rT8VbPfAtMNdPdnMYxvM6RWPWJvAPgf3Hkz6zFiA= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=x+cNh71wUiI054ftjOp2FlntrAm/nx9l2n/aeszQ9QBHOSGuD6bvIZ+OaA2fWGNiFGvz83bznVatX72l8l6p+xueyypKgXgagIM34QZDWv9dhmvPFvQpgIPmx5Dtrf7OIgBj/LfXlTIJSHN/3xaLE3t7pAeH4zMwYGaHFlfc24A=; Message-ID: <385630.28373.qm@web111502.mail.gq1.yahoo.com> X-YMail-OSG: NDSM884VM1kBHMXRAJKodMWNBmVhIoM4l2NVjXirJ1yPGpgAbz8oNxGbmqj36Rhxd7SFx2pHj8GSiXFNhRMuLkpn4fIcmReEffw8YGfsqvvrrQ6v.GR58218YjFA.HwZXoqpGvcHuLTFp87HjISxrL0bbeadk0drZAVBrDlzuJ9xlvcjB2LqZ5.HnNiqTohKTb.mvj.0kErfc8TSJnQarLKzRiY_u09_4LrkQVLkqkLvxfLFZvD41BkxdFygd8979._AbhDsiqBjI7wRbhIijKMfNkMaHqf4GjbevPu1TRFi_7Aon6RwwS27gLsj.86kB2KoV5A- Received: from [213.205.71.53] by web111502.mail.gq1.yahoo.com via HTTP; Thu, 13 Aug 2009 06:14:39 PDT X-Mailer: YahooMailClassic/6.1.2 YahooMailWebService/0.7.338.1 Date: Thu, 13 Aug 2009 06:14:39 -0700 (PDT) From: Dario Teixeira Subject: Re: [Caml-list] Storing UTF-8 in plain strings To: caml-list@yquem.inria.fr In-Reply-To: <570438.74652.qm@web111506.mail.gq1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; cheers:01 deprecated:01 deprecated:01 exception:01 exception:01 caml-list:01 functions:01 strings:01 module:03 raise:03 stream:04 correctly:04 raises:05 perhaps:05 problem:05 Hi,=0A=0A> Thank you all for your comments.=A0 Ulex has caught all the inte= ntionally=0A> malformed code points I've inserted in the stream, so I'm fai= rly confident=0A> it's up to the task.=A0 But if I find a problem I'll keep= Netconversion's=0A> and Extlib's validation functions in mind...=0A=0ABy t= he way, I just noticed that the 'validate' function in Extlib's UTF8=0Amodu= le accepts 5-byte and 6-byte sequences. Though these were part of=0AUTF-8'= s original specification, they have been deprecated by RFC 3629.=0APerhaps = adding a 'Deprecated_code' exception for these cases is in order?=0A(Or jus= t raise the existing 'Malformed_code' exception). Note that Ulex=0Acorrect= ly raises an exception if any of these deprecated sequences are=0Afound.=0A= =0ACheers,=0ADario Teixeira=0A=0A=0A=0A