From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 89E9A7FAE1 for ; Thu, 18 Dec 2014 17:02:45 +0100 (CET) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of n.oje.bar@gmail.com) identity=pra; client-ip=209.85.212.179; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="n.oje.bar@gmail.com"; x-sender="n.oje.bar@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of n.oje.bar@gmail.com designates 209.85.212.179 as permitted sender) identity=mailfrom; client-ip=209.85.212.179; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="n.oje.bar@gmail.com"; x-sender="n.oje.bar@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-wi0-f179.google.com) identity=helo; client-ip=209.85.212.179; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="n.oje.bar@gmail.com"; x-sender="postmaster@mail-wi0-f179.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhUBAJz5klTRVdSzm2dsb2JhbABag1hYBIMCsFYOd4IAj0CFawMCgRgHFgEBAQEBEQEBAQEBBgsLCRQuhAwBAQEDARIRHQE4AQMBCwEFBQsNAgImAgIiEgEFARwGDAcih3YDCQgNsj4+MYsuhGKLTycNRYUVAQUOgROEX4ZEgnuDIoFBBZFIhTKBDTCEYoMxhDqBbhIjgRVbgVWBeCQxAYJCAQEB X-IPAS-Result: AhUBAJz5klTRVdSzm2dsb2JhbABag1hYBIMCsFYOd4IAj0CFawMCgRgHFgEBAQEBEQEBAQEBBgsLCRQuhAwBAQEDARIRHQE4AQMBCwEFBQsNAgImAgIiEgEFARwGDAcih3YDCQgNsj4+MYsuhGKLTycNRYUVAQUOgROEX4ZEgnuDIoFBBZFIhTKBDTCEYoMxhDqBbhIjgRVbgVWBeCQxAYJCAQEB X-IronPort-AV: E=Sophos;i="5.07,601,1413237600"; d="scan'208";a="114015459" Received: from mail-wi0-f179.google.com ([209.85.212.179]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 18 Dec 2014 17:02:44 +0100 Received: by mail-wi0-f179.google.com with SMTP id ex7so2317532wid.12; Thu, 18 Dec 2014 08:02:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=+BPvF/k/Yuc2+KSZ1rKo6W0kMvlUIJb/xp1TLcJ9Z0g=; b=vFJiRekE4tt5BvtxBYinqeyPjSHmASitHh6NU0BZGEqeoY0CA9YkSNh2wBxGiI18jS mDIMKisbF4CxRMhxa0Z6LcCprfgcDYIebjCp04FvE9iX7v4C+QkTehhV85HT2CDc0vdA C2Fw8y4tz2oVUF9u7TH4mbjE2leEZ153eKo/YW7b5JrltUrjI79nHqmTC4dCW0r6CCgT 6/Ef6smuIkqvnQh2bq/48DBNovlMCtgxYc14SJ5BIPpXmQkJMLJtM2VASzT3zBaj8J3h S2S7Z1MslybiS16TALsSPrsDmaGxGclPnm+gbNQ99JPa44X0AA2iBqmwAjf3i/UBCGD5 Y3yw== MIME-Version: 1.0 X-Received: by 10.194.187.235 with SMTP id fv11mr5962149wjc.16.1418918564708; Thu, 18 Dec 2014 08:02:44 -0800 (PST) Sender: n.oje.bar@gmail.com Received: by 10.27.45.205 with HTTP; Thu, 18 Dec 2014 08:02:44 -0800 (PST) In-Reply-To: <20141218153426.GI4679@emmental.inria.fr> References: <20141217201448.GA27253@yquem.inria.fr> <1418906703.5445.5.camel@e130.lan.sumadev.de> <20141218153426.GI4679@emmental.inria.fr> Date: Thu, 18 Dec 2014 13:02:44 -0300 X-Google-Sender-Auth: YtSyitY1Um33TwhDPKxL6A_zarI Message-ID: From: Nicolas Ojeda Bar To: Simon Cruanes Cc: =?UTF-8?Q?Daniel_B=C3=BCnzli?= , Gerd Stolpmann , Francois Pottier , caml-list@inria.fr Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Caml-list] New release of Menhir (20141215) Hi Simon, On Thu, Dec 18, 2014 at 12:34 PM, Simon Cruanes wrote: > I saw this on mirage's wiki: > https://github.com/mirage/mirage-www/wiki/Pioneer-Projects#bigarray-parse= r-generator > > Apart from using bigarrays for performance, the idea of using a ppx to > make the combinators more efficient looks interesting. If I understand correctly the optimizations done by the FastParser parsers are essentially a form of aggresive inlining. This goes hand-in-hand with encoding your parser as a set of mutually recursive functions. This type of encoding does not lend itself easily to incremental parsing (where the parser can be stopped and resumed as/when needed). That apart, for other use-cases it looks pretty nice. Best wishes, Nicolas > I also plan to try to mix recursive descent with a monadic "refill" > function for simple parsers (S-expressions, Bencode, etc.) It will be > interesting to compare that with Daniel's manual CPS. > > Cheers! > > Le Thu, 18 Dec 2014, Daniel B=C3=BCnzli a =C3=A9crit : >> Le jeudi, 18 d=C3=A9cembre 2014 =C3=A0 15:19, Nicolas Ojeda Bar a =C3=A9= crit : >> > When programming monadically you are reifying the continuation at >> > every `bind` point so you get the incremental bit for free (I think >> > this goes by the fancy name of `iteratees` nowadays). On the other >> > hand it suffers from poor time/space performance common to this type >> > of parser. Also, while combinator parsers can handle arbitrary >> > backtracking (and you end up paying for this), IMAP itself requires >> > very little, so it would seem that the full flexibility of combinators >> > are not needed. >> >> For a long time I played with combinator parsers but I never got to the = point of being satisfied with the result. I also played a little bit with i= teratees but couldn't get the performance I wanted with the full model in a= ll its compositional glory. >> >> Nowadays I simply write my streaming lexers/parsers manually in CPS and = you can drive them in non-blocking mode with a single, fixed size, buffer. = See here [1] for an interface and an implementation on a toy example. >> >> If you get the lowest continuation decoding bits correctly (e.g. don't b= ind on each byte) you can actually get good time and space performance. Onc= e you have handled that (mainly the painful bits about read overlapping two= input buffers) you get very nice parsing flexibility, e.g. for things like= text encoding discovery where you need to patch the continuation with the = appropriate character decoder. If you care for your users you can also get = very good error reporting and error recovery capabilities by applying knowl= edge specific to the decoded protocol. That's the way Uutf [2], Jsonm [3] (= on top of Uutf) and Dicomm [4] are programmed. > > -- > Simon > > http://weusepgp.info/ > key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3 7D8D 4AC0 1D08 49AA 6= 2B6