From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 43BBB7FAE1 for ; Thu, 18 Dec 2014 16:34:39 +0100 (CET) Received-SPF: Neutral (mail3-smtp-sop.national.inria.fr: domain of simon.cruanes.2007@m4x.org does not assert whether or not 129.104.30.34 is permitted sender) identity=pra; client-ip=129.104.30.34; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="SRS0=BKdS=BG=m4x.org=simon.cruanes.2007@bounces.m4x.org"; x-sender="simon.cruanes.2007@m4x.org"; x-conformance=sidf_compatible; x-record-type="spf2.0" Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of SRS0=BKdS=BG=m4x.org=simon.cruanes.2007@bounces.m4x.org designates 129.104.30.34 as permitted sender) identity=mailfrom; client-ip=129.104.30.34; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="SRS0=BKdS=BG=m4x.org=simon.cruanes.2007@bounces.m4x.org"; x-sender="SRS0=BKdS=BG=m4x.org=simon.cruanes.2007@bounces.m4x.org"; x-conformance=sidf_compatible; x-record-type="spf2.0" Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of postmaster@mx1.polytechnique.org designates 129.104.30.34 as permitted sender) identity=helo; client-ip=129.104.30.34; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="SRS0=BKdS=BG=m4x.org=simon.cruanes.2007@bounces.m4x.org"; x-sender="postmaster@mx1.polytechnique.org"; x-conformance=sidf_compatible; x-record-type="v=spf1" X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Am8AALzzklSBaB4inGdsb2JhbABag1iDXsMbhw8WAQEBAQERAQEBAQEICwkJFC6EDQEFI1YQCyEhAgIPBUmIPw2+SpZJjESGHYFBBYwzhRWFMYEOMIRigzGEOoM4gjCBX4MxAQEB X-IPAS-Result: Am8AALzzklSBaB4inGdsb2JhbABag1iDXsMbhw8WAQEBAQERAQEBAQEICwkJFC6EDQEFI1YQCyEhAgIPBUmIPw2+SpZJjESGHYFBBYwzhRWFMYEOMIRigzGEOoM4gjCBX4MxAQEB X-IronPort-AV: E=Sophos;i="5.07,601,1413237600"; d="scan'208";a="94030290" Received: from mx1.polytechnique.org ([129.104.30.34]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/ADH-AES256-SHA; 18 Dec 2014 16:34:29 +0100 Received: from emmental.inria.fr (emmental.inria.fr [128.93.0.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ssl.polytechnique.org (Postfix) with ESMTPSA id 671D3140C5BA7; Thu, 18 Dec 2014 16:34:28 +0100 (CET) Date: Thu, 18 Dec 2014 16:34:27 +0100 From: Simon Cruanes To: Daniel =?utf-8?Q?B=C3=BCnzli?= Cc: Nicolas Ojeda Bar , Gerd Stolpmann , Francois.Pottier@inria.fr, caml-list@inria.fr Message-ID: <20141218153426.GI4679@emmental.inria.fr> References: <20141217201448.GA27253@yquem.inria.fr> <1418906703.5445.5.camel@e130.lan.sumadev.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VSVNCtZB1QZ8vhj+" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) X-AV-Checked: ClamAV using ClamSMTP at svoboda.polytechnique.org (Thu Dec 18 16:34:28 2014 +0100 (CET)) X-Spam-Flag: No, tests=bogofilter, spamicity=0.095826, queueID=85CFF140C5BAB X-Org-Mail: simon.cruanes.2007@polytechnique.org Subject: Re: [Caml-list] New release of Menhir (20141215) --VSVNCtZB1QZ8vhj+ Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I saw this on mirage's wiki: https://github.com/mirage/mirage-www/wiki/Pioneer-Projects#bigarray-parser-= generator Apart from using bigarrays for performance, the idea of using a ppx to make the combinators more efficient looks interesting. I also plan to try to mix recursive descent with a monadic "refill" function for simple parsers (S-expressions, Bencode, etc.) It will be interesting to compare that with Daniel's manual CPS. Cheers! Le Thu, 18 Dec 2014, Daniel B=C3=BCnzli a =C3=A9crit : > Le jeudi, 18 d=C3=A9cembre 2014 =C3=A0 15:19, Nicolas Ojeda Bar a =C3=A9c= rit : > > When programming monadically you are reifying the continuation at > > every `bind` point so you get the incremental bit for free (I think > > this goes by the fancy name of `iteratees` nowadays). On the other > > hand it suffers from poor time/space performance common to this type > > of parser. Also, while combinator parsers can handle arbitrary > > backtracking (and you end up paying for this), IMAP itself requires > > very little, so it would seem that the full flexibility of combinators > > are not needed. >=20 > For a long time I played with combinator parsers but I never got to the p= oint of being satisfied with the result. I also played a little bit with it= eratees but couldn't get the performance I wanted with the full model in al= l its compositional glory.=20=20=20 >=20 > Nowadays I simply write my streaming lexers/parsers manually in CPS and y= ou can drive them in non-blocking mode with a single, fixed size, buffer. S= ee here [1] for an interface and an implementation on a toy example.=20=20 >=20 > If you get the lowest continuation decoding bits correctly (e.g. don't bi= nd on each byte) you can actually get good time and space performance. Once= you have handled that (mainly the painful bits about read overlapping two = input buffers) you get very nice parsing flexibility, e.g. for things like = text encoding discovery where you need to patch the continuation with the a= ppropriate character decoder. If you care for your users you can also get v= ery good error reporting and error recovery capabilities by applying knowle= dge specific to the decoded protocol. That's the way Uutf [2], Jsonm [3] (o= n top of Uutf) and Dicomm [4] are programmed.=20=20 --=20 Simon http://weusepgp.info/ key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3 7D8D 4AC0 1D08 49AA 62B6 --VSVNCtZB1QZ8vhj+ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJUkvQCAAoJEErAHQhJqmK2zT8P/1vI2bf11xf3qz2Xr4oAgmS6 6+hxyvCvpU0GFlkdUw3i8eu7UqB+xvPupqyphfe/F6l3sWZ3L3PGy392OMf9BLyq pSvLHUByrg1a2wzWIofLK5h23Py1WZISx2lfWDd7mvsoeS+aiy12zMJwbQ1A6LNc T7mSjWKY/qBnqOwbyLOJir7/VFGJ8/eoKtUMo9w5uz/GiyTkBHyjnWiBa0JpdlNz JI6fQZJfHAW3XbuTCVsvG8uK/r4SQY0Oo28T9r9OfxN2gD3C0Vu00AZ0wAUSBlo8 5V2273X0gZQiPpTIoHGUWmuZGyIBSc5MW55pwxcb4MADhTC8pFK401/bcGuPzPWf a8dkpTzijhVFHQVY51m7vQcADcVMlpGS/vOFcXJQxi1xSowrn21o3v7CGoGRweey z+R3hbfdh8O8n4MJJnBHNI6sB8bIqYmldMgX7ZF7rGg/jU2mSzf10z/o04st76X9 kjEE8GnGUm6mC/xkC1havreZE76lAMzE6fSpHV4b500s3SFlddqyhqnA/sWrOi/f MpPw3rUM3fgsqSMj3yM283nbihUEKPMDjD2hJR9vrXz7VurXK/HZi1zaXM2uqsrz PjIqFK9aibw/qp04sW//ByNF8XJ7GSVSrHepmLf0dZT0eUb6/U4DZASLrIZCPxEl KiLaqlQMA9EzbI1nOAI0 =iQGj -----END PGP SIGNATURE----- --VSVNCtZB1QZ8vhj+--