From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.7 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE, DNS_FROM_RFC_POST autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 52D41BBAF for ; Thu, 13 Aug 2009 11:37:10 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvYCAJt7g0pDww+jmWdsb2JhbACSOYgnAQEBAQEICwoHE6YRgSKRGAEFAwGEFQU X-IronPort-AV: E=Sophos;i="4.43,373,1246831200"; d="scan'208";a="34354772" Received: from web111506.mail.gq1.yahoo.com ([67.195.15.163]) by mail1-smtp-roc.national.inria.fr with SMTP; 13 Aug 2009 11:37:09 +0200 Received: (qmail 74845 invoked by uid 60001); 13 Aug 2009 09:37:08 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1250156228; bh=M8q27ZYxdBIVY2o/nazcs2IJGKxEYHIaxJ6PeOjhgPY=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=frEI27c9XT9vaB/3HsHRLZWBrvY5AvuVLgOXRf4IKfUqB7VlYIrY7CU7CWa4hCE/0NHm2gZ/gLjO0m7vtxcqoS2SWcUvqjhB9w8q9fJt6ZbFoRHKEjKP2yM+hY5TGTyAAoGydT98NZpNjioO9OxHtHXJnSvIzWla5DWrOJGRdqM= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=ifgQHJ/a1LTmBD+eUgaVMeZlAsx/GACwYCXHifgjhQrhzsqha1i6AtRyK89jjIW9xk27iSVMgXAnhlGHBpJqe3cl+fzEjlEnWYzWoq4QDbgmIPVB9iJpvJ1OwcnbSjTfL6h5iXMQt2f/lKMu7oEPJSqzahb4J49ym/zE7b215SM=; Message-ID: <570438.74652.qm@web111506.mail.gq1.yahoo.com> X-YMail-OSG: 6TAy.UcVM1l_OpfBl0A7UAepfJHYgBkDpW8kOQ5E1CZdac8pBsN9Ohb8nGlZUsOYWDLp3E_L9ibYS0UH3Qb25cc8gD21bXCxFv1iORDTo.gPaWcTr48xOHyeLB60MY684Z0wjLtCI4_f0ZbpQvcewFKSchjiJgWKTfolqyDhpX2enibNNeF4l6oOFGdOXU01kPfMRrYORayqlb.HxzmUkWOS6Qasg5Q0P8A35aonGDQBuQJI3GjU81w824rF9XjitqUClq.fENuMQQ5U6mDUMS7FvB9TdX4NyHBIO0dv98Hld7B7eiq.CB6C9fGj6GUc7RhAfyo- Received: from [213.205.71.53] by web111506.mail.gq1.yahoo.com via HTTP; Thu, 13 Aug 2009 02:37:08 PDT X-Mailer: YahooMailClassic/6.1.2 YahooMailWebService/0.7.338.1 Date: Thu, 13 Aug 2009 02:37:08 -0700 (PDT) From: Dario Teixeira Subject: Re: [Caml-list] Storing UTF-8 in plain strings To: caml-list@yquem.inria.fr In-Reply-To: <51021.92432.qm@web111506.mail.gq1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; caml-list:01 functions:01 strings:01 strings:01 data:02 encoding:02 parse:02 string:02 module:03 stream:04 problem:05 long:06 storing:08 storing:08 i'm:09 Hi,=0A=0A> I'm using Ulex + Menhir to parse UTF-8 encoded source code, and = I'm relying=0A> on plain strings for processing and storing data.=A0 I *thi= nk* I can get away=0A> with using only the String module to handle this var= iable-length encoding=0A> as long as I am careful with the way I treat thes= e strings.=A0 Here are the=0A> assumptions I am making:=0A=0AThank you all = for your comments. Ulex has caught all the intentionally=0Amalformed code = points I've inserted in the stream, so I'm fairly confident=0Ait's up to th= e task. But if I find a problem I'll keep Netconversion's=0Aand Extlib's v= alidation functions in mind...=0A=0ABest regards,=0ADario Teixeira=0A=0A=0A= =0A