From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 60DE77FAE1 for ; Thu, 18 Dec 2014 16:25:39 +0100 (CET) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of info@gerd-stolpmann.de) identity=pra; client-ip=212.227.126.130; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="info@gerd-stolpmann.de"; x-sender="info@gerd-stolpmann.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of info@gerd-stolpmann.de) identity=mailfrom; client-ip=212.227.126.130; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="info@gerd-stolpmann.de"; x-sender="info@gerd-stolpmann.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mout.kundenserver.de) identity=helo; client-ip=212.227.126.130; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="info@gerd-stolpmann.de"; x-sender="postmaster@mout.kundenserver.de"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnAAAFfxklTU436Cm2dsb2JhbABaDgiCTnRYvGyJKQqFcgKBHBYBAQEBAREBAQEBAQYLCwkULoQMAQEBBAxJJBALEQMBAgEuTwgGEwkRiBMDCdRrAQEBAQEBBAEBAQEBAQEBGooKgjqCZSMVEQcGgidMgTAFhSWJGwGDB4Nxgk4whD6DVYQ6gziCIBAOgRM+bgGBA4E/AQEB X-IPAS-Result: AnAAAFfxklTU436Cm2dsb2JhbABaDgiCTnRYvGyJKQqFcgKBHBYBAQEBAREBAQEBAQYLCwkULoQMAQEBBAxJJBALEQMBAgEuTwgGEwkRiBMDCdRrAQEBAQEBBAEBAQEBAQEBGooKgjqCZSMVEQcGgidMgTAFhSWJGwGDB4Nxgk4whD6DVYQ6gziCIBAOgRM+bgGBA4E/AQEB X-IronPort-AV: E=Sophos;i="5.07,601,1413237600"; d="asc'?scan'208";a="114006607" Received: from mout.kundenserver.de ([212.227.126.130]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 18 Dec 2014 16:25:38 +0100 Received: from office1.lan.sumadev.de ([188.107.171.165]) by mrelayeu.kundenserver.de (mreue003) with ESMTPSA (Nemesis) id 0LpAdk-1XOykX2ToK-00exAJ; Thu, 18 Dec 2014 16:25:37 +0100 Received: from [IPv6:fd55:cf:6598:7f:b598:309f:12ce:780b] (unknown [IPv6:fd55:cf:6598:7f:b598:309f:12ce:780b]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id E2F84DC05D; Thu, 18 Dec 2014 16:25:36 +0100 (CET) Message-ID: <1418916329.21394.62.camel@thinkpad.lan.sumadev.de> From: Gerd Stolpmann To: Nicolas Ojeda Bar Cc: Francois.Pottier@inria.fr, caml-list@inria.fr Date: Thu, 18 Dec 2014 16:25:29 +0100 In-Reply-To: References: <20141217201448.GA27253@yquem.inria.fr> <1418906703.5445.5.camel@e130.lan.sumadev.de> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-ck/w2kitZJ+j+3Qdj/mU" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 X-Provags-ID: V03:K0:WKn5hmeLgwFZiy2UlDuhiT4zFuRFjaQgaySvGfp+jV8xRxeD6qb wGlp0aAH/e9lF6XeqjEacM6mbv2YMeaG/a21UORS/thxJG6u/ZQHMnLyFRbrtgPyZwdhTXk Lf1l4VELZ2W0gEk5MH/Q+nx8EJuALQavXKCykxtS2GsrgqrW5+X3XAGZAyvzrATLukj2syN s8Ae1hWj618ZLaUszoN8Q== X-UI-Out-Filterresults: notjunk:1; Subject: Re: [Caml-list] New release of Menhir (20141215) --=-ck/w2kitZJ+j+3Qdj/mU Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable Hi, thanks for pointing me to your project. Maybe I'm just going to use it (well, I'd need it to make it compatible with Ocamlnet, and I also need TLS and SASL). Regarding the question of whether Menhir is usable:=20 Part of the difficulty are the BLOBs that can occur anywhere. Here is an example from the RFC (C =3D sent by client, S =3D sent by server, i.e. what we need to parse): C: a004 fetch 12 body[header] S: * 12 FETCH (BODY[HEADER] {342} S: Date: Wed, 17 Jul 1996 02:23:25 -0700 (PDT) S: From: Terry Gray S: Subject: IMAP4rev1 WG mtg summary and minutes S: To: imap@cac.washington.edu S: cc: minutes@CNRI.Reston.VA.US, John Klensin S: Message-Id: S: MIME-Version: 1.0 S: Content-Type: TEXT/PLAIN; CHARSET=3DUS-ASCII S: S: ) S: a004 OK FETCH completed The {342} means that a BLOB with 342 bytes follows. This could be completely harmless if we could recognize such tokens in the lexer, and switch the lexer to a BLOB lexer before the tokens reach the parser. However, this is actually not as easy, as the grammar also allows such strings at other places where they do not prefix BLOBs. My idea was that the lexer marks such tokens as "maybe-prefixes", and we initially only parse to that point. These maybe-prefixes are always followed by a line ending, so either the response ends and the parser finishes (meaning that the maybe-prefix isn't a BLOB prefix), or the parser needs more tokens, in which case it is either a BLOB prefix followed by a BLOB or completely invalid. I haven't checked yet: could we inspect Menhir's state and see whether the only possible interpretation is a BLOB prefix? I.e. whether it is in the middle of a production blob :=3D blob-prefix blob-data and there is no other way of interpreting the tokens? This would make the decision somewhat safer how such context-dependent tokens are handled. Gerd Am Donnerstag, den 18.12.2014, 11:19 -0300 schrieb Nicolas Ojeda Bar: > Hi Gerd, >=20 > I wrote an IMAP client library in pure OCaml > (https://github.com/nojb/ocaml-imap). It can handle the full RFC 3501 > grammar. This library itself is not yet ready for prime-time, but I > spent quite a bit of time thinking about the question of parsing since > it is such a big part of the protocol. >=20 > While the incremental parsing in the new Menhir is great I do not > think it will help to parse IMAP because the token types of IMAP > grammar are highly context dependent. By this I mean that there are > many overlapping token definitions that are valid in different parts > of the grammar. While this could be hacked around (by changing the > lexer used in dummy non-terminals) the complexity of doing so quickly > gets out of hand. In my view this almost requires using a scannerless > parser, which rules out ocamlyacc/menhir-type parsers. >=20 > A parser generator that could handle the IMAP grammar nicely is Dypgen > (which generates GLR parsers). You can almost copy the IMAP grammar > verbatim from RFC 3501. However it can not be made incremental. This > means that you need to have all the input available before starting > the parse. Since the amount of data is potentially very big, this > rules using Dypgen for anything else than toy applications. >=20 > In my library I use a monadic combinator parser > (https://github.com/nojb/ocaml-imap/blob/master/imap/imapParser.ml). > When programming monadically you are reifying the continuation at > every `bind` point so you get the incremental bit for free (I think > this goes by the fancy name of `iteratees` nowadays). On the other > hand it suffers from poor time/space performance common to this type > of parser. Also, while combinator parsers can handle arbitrary > backtracking (and you end up paying for this), IMAP itself requires > very little, so it would seem that the full flexibility of combinators > are not needed. >=20 > It seems that the "scannerless, incremental" space in parser > technology is not very developed (in OCaml; maybe other languages is > different ?). One intriguing possibility that I am exploring (but > this is a much longer term project) is to use PEG parsers for this. > It would be easy to write an IMAP parser with PEG. The main problem is > that the usual implementation of PEG parsers uses memoization to keep > parsing time from blowing up exponentially on the length of input ( > (in OCaml the Aurochs parser generator is of this type). This > requires storage proportional to input length, which is a problem when > parsing large amounts of data. But for grammars that don't need a lot > of backtracking (like IMAP) the memoization is not actually required, > so I think there is an interesting space to explore between all these > constraints. >=20 > In any case, I would love to hear about your experience if you do try > to parse IMAP using Menhir. >=20 > Best wishes, > Nicolas >=20 > On Thu, Dec 18, 2014 at 9:45 AM, Gerd Stolpmann = wrote: > > Thanks for this, it is great news. A couple of months ago I tried to > > develop a monadic parser for the IMAP protocol, which has a weird > > grammar and needs some strange interactions between lexing and parsing. > > Lacking a parser generator I started to write the parser manually, but > > it never made good progress. It looks like Menhir can now be used for > > this case, and I'll try it. > > > > Gerd > > > > Am Mittwoch, den 17.12.2014, 21:14 +0100 schrieb Francois Pottier: > >> Dear OCaml users, > >> > >> I have recently started making a series of changes in Menhir, inspired= by > >> the work of Fr=E9d=E9ric Bour on Merlin (a smart emacs-mode for OCaml,= which > >> uses a modified version of Menhir). Thanks to Fr=E9d=E9ric for his sti= mulating > >> ideas, and thanks to Gabriel Scherer for not letting me sleep until I > >> promised I would do something about them! :-) > >> > >> I have made a first release a couple days ago. The relevant chunk of t= he > >> CHANGES file is appended below. In summary, a new incremental API is > >> available; Menhir now requires ocaml 4.02; and a couple of obscure fea= tures > >> (--error-recovery and $previouserror) have been removed in the interes= t of > >> speed and simplicity. > >> > >> The incremental API means that you can take a snapshot of the parser's= state, > >> essentially at no cost, at any time. It also means that the parser no = longer > >> drives the lexer; you drive the lexer, and you provide tokens to the p= arser > >> when it requests them. This can be convenient if the lexer is in a mon= ad (the > >> Lwt monad, for instance). > >> > >> More changes are planned. The type "env" exposed by the new incrementa= l API is > >> currently opaque. We are planning to offer a range of functions that a= llow > >> inspecting and building values of type "env". This should allow the us= er to > >> implement new error handling and error recovery strategies outside of = Menhir. > >> > >> The new release is available now as a .tar.gz archive: > >> > >> http://gallium.inria.fr/~fpottier/menhir/menhir-20141215.tar.gz > >> > >> It is also available via opam ("opam install menhir"). > >> > >> Cheers, > >> > >> -- > >> Fran=E7ois Pottier > >> Francois.Pottier@inria.fr > >> http://gallium.inria.fr/~fpottier/ > >> > >> 2014/12/15: > >> New incremental API (in --table mode only), inspired by Fr=E9d=E9ric B= our. > >> > >> 2014/12/11: > >> Menhir now reports an error if one of the start symbols produces > >> either the empty language or the singleton language {epsilon}. > >> > >> Although some people out there actually define a start symbol that rec= ognizes > >> {epsilon} (and use it as a way of initializing or re-initializing some= global > >> state), this is considered bad style. Furthermore, by ruling out this = case, we > >> are able to simplify the table back-end a little bit. > >> > >> 2014/12/12: > >> A speed improvement in the code back-end. > >> > >> 2014/12/08: > >> Menhir now requires OCaml 4.02 (instead of 3.09). > >> > >> 2014/12/02: > >> Removed support for the $previouserror keyword. > >> Removed support for --error-recovery mode. > >> > > > > -- > > ------------------------------------------------------------ > > Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de > > My OCaml site: http://www.camlcity.org > > Contact details: http://www.camlcity.org/contact.html > > Company homepage: http://www.gerd-stolpmann.de > > ------------------------------------------------------------ > > --=20 ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de My OCaml site: http://www.camlcity.org Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de ------------------------------------------------------------ --=-ck/w2kitZJ+j+3Qdj/mU Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUkvHpAAoJEAaM4b9ZLB5T3fEH/RtUi6Xi+3jjiZ56wqQGlJGR B1rs7qbRRnqqJeciAVpXqG10TmNBvBX1YB/Lg11UM5lqCh2FYRu6ZVZPKzipV+JE gErM9ssV/2T22uW2wDCznTNDx/5cuUlsl5uhet6pGqn1y5vWHkY6fhCD9y6/pcBq eGz3zMXoSEVprJwilIxXIgmApd89bvhnBl3s1xL7/YdqrQifeO5lWOyOr1C/fUP3 OTCJ2VWQ/uwAl219hqiqmGfi+kfqfSroJzTTBSBoosKC+Nn7xlA7ER1aF8xyflCE kmyZZVTbKa4fAOVfgLKAEHsvAiqryANtvorRTzLIiRTXFDE1AIznJ3fefi3qIlk= =+wbZ -----END PGP SIGNATURE----- --=-ck/w2kitZJ+j+3Qdj/mU--