From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 448A17FAE1 for ; Thu, 18 Dec 2014 15:19:57 +0100 (CET) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of n.oje.bar@gmail.com) identity=pra; client-ip=74.125.82.48; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="n.oje.bar@gmail.com"; x-sender="n.oje.bar@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of n.oje.bar@gmail.com designates 74.125.82.48 as permitted sender) identity=mailfrom; client-ip=74.125.82.48; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="n.oje.bar@gmail.com"; x-sender="n.oje.bar@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-wg0-f48.google.com) identity=helo; client-ip=74.125.82.48; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="n.oje.bar@gmail.com"; x-sender="postmaster@mail-wg0-f48.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjIBACTiklRKfVIwm2dsb2JhbABaFoNCWASCOkiwVw53ggCGC4kzhW0DAoEXBxYBAQEBAREBAQEBAQYLCwkULoQNAQEEDAYRHQEUJAEDDAEFAwILAwoCAiYCAiISAQUBHAYMBwkRCId2AxENoWKQSz4xiy6EYotSJw1FhQYMGgEFDoEThF+ECoI6gmUWgyKBQQWFJYwjg3GBQYENMIgThDqBbhIjgRVbgVWBeCQxAYEDgT8BAQE X-IPAS-Result: AjIBACTiklRKfVIwm2dsb2JhbABaFoNCWASCOkiwVw53ggCGC4kzhW0DAoEXBxYBAQEBAREBAQEBAQYLCwkULoQNAQEEDAYRHQEUJAEDDAEFAwILAwoCAiYCAiISAQUBHAYMBwkRCId2AxENoWKQSz4xiy6EYotSJw1FhQYMGgEFDoEThF+ECoI6gmUWgyKBQQWFJYwjg3GBQYENMIgThDqBbhIjgRVbgVWBeCQxAYEDgT8BAQE X-IronPort-AV: E=Sophos;i="5.07,601,1413237600"; d="scan'208";a="94015975" Received: from mail-wg0-f48.google.com ([74.125.82.48]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 18 Dec 2014 15:19:56 +0100 Received: by mail-wg0-f48.google.com with SMTP id y19so1737233wgg.35; Thu, 18 Dec 2014 06:19:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=/Ls3+facSKaNFXyUAcCzAAHFkCADH940CHgVS1H/GDI=; b=ib6RtoIC39ej+ziKq2/80PFRIfimtsi4nEL2myt+rrwEE7dFIjdvQVDRiyntpftzJz ZYNuDMVuqXx/WSpr+wc1lKXik9PBDrgSUp1zdHi13LbJhnxibjI79pwfgdh68LOwwuhh bVoijejC1yAaNAngBbTVW6ocFkVFlye4HaZOhrx+lgQFVNkxEo7lorsRUNe5pDGOjkVt WFV67t1ngfNJ6ib2xhqCR1ORUJfJ4TWdawaZLVEl2r2ams4PHPczxU0uvkDsHHi8q38v FV+uuD1CILAnv/Lg3SZaDkzYIoBFL0MQPK+RxFnw0ncxu3roTofx/h+zB89SypNG82yJ gWQQ== MIME-Version: 1.0 X-Received: by 10.194.108.202 with SMTP id hm10mr5090413wjb.72.1418912395686; Thu, 18 Dec 2014 06:19:55 -0800 (PST) Sender: n.oje.bar@gmail.com Received: by 10.27.45.205 with HTTP; Thu, 18 Dec 2014 06:19:55 -0800 (PST) In-Reply-To: <1418906703.5445.5.camel@e130.lan.sumadev.de> References: <20141217201448.GA27253@yquem.inria.fr> <1418906703.5445.5.camel@e130.lan.sumadev.de> Date: Thu, 18 Dec 2014 11:19:55 -0300 X-Google-Sender-Auth: zydWR8KO45B4E8GQ8KpFsuskNng Message-ID: From: Nicolas Ojeda Bar To: Gerd Stolpmann Cc: Francois.Pottier@inria.fr, caml-list@inria.fr Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Caml-list] New release of Menhir (20141215) Hi Gerd, I wrote an IMAP client library in pure OCaml (https://github.com/nojb/ocaml-imap). It can handle the full RFC 3501 grammar. This library itself is not yet ready for prime-time, but I spent quite a bit of time thinking about the question of parsing since it is such a big part of the protocol. While the incremental parsing in the new Menhir is great I do not think it will help to parse IMAP because the token types of IMAP grammar are highly context dependent. By this I mean that there are many overlapping token definitions that are valid in different parts of the grammar. While this could be hacked around (by changing the lexer used in dummy non-terminals) the complexity of doing so quickly gets out of hand. In my view this almost requires using a scannerless parser, which rules out ocamlyacc/menhir-type parsers. A parser generator that could handle the IMAP grammar nicely is Dypgen (which generates GLR parsers). You can almost copy the IMAP grammar verbatim from RFC 3501. However it can not be made incremental. This means that you need to have all the input available before starting the parse. Since the amount of data is potentially very big, this rules using Dypgen for anything else than toy applications. In my library I use a monadic combinator parser (https://github.com/nojb/ocaml-imap/blob/master/imap/imapParser.ml). When programming monadically you are reifying the continuation at every `bind` point so you get the incremental bit for free (I think this goes by the fancy name of `iteratees` nowadays). On the other hand it suffers from poor time/space performance common to this type of parser. Also, while combinator parsers can handle arbitrary backtracking (and you end up paying for this), IMAP itself requires very little, so it would seem that the full flexibility of combinators are not needed. It seems that the "scannerless, incremental" space in parser technology is not very developed (in OCaml; maybe other languages is different ?). One intriguing possibility that I am exploring (but this is a much longer term project) is to use PEG parsers for this. It would be easy to write an IMAP parser with PEG. The main problem is that the usual implementation of PEG parsers uses memoization to keep parsing time from blowing up exponentially on the length of input ( (in OCaml the Aurochs parser generator is of this type). This requires storage proportional to input length, which is a problem when parsing large amounts of data. But for grammars that don't need a lot of backtracking (like IMAP) the memoization is not actually required, so I think there is an interesting space to explore between all these constraints. In any case, I would love to hear about your experience if you do try to parse IMAP using Menhir. Best wishes, Nicolas On Thu, Dec 18, 2014 at 9:45 AM, Gerd Stolpmann wr= ote: > Thanks for this, it is great news. A couple of months ago I tried to > develop a monadic parser for the IMAP protocol, which has a weird > grammar and needs some strange interactions between lexing and parsing. > Lacking a parser generator I started to write the parser manually, but > it never made good progress. It looks like Menhir can now be used for > this case, and I'll try it. > > Gerd > > Am Mittwoch, den 17.12.2014, 21:14 +0100 schrieb Francois Pottier: >> Dear OCaml users, >> >> I have recently started making a series of changes in Menhir, inspired by >> the work of Fr=C3=A9d=C3=A9ric Bour on Merlin (a smart emacs-mode for OC= aml, which >> uses a modified version of Menhir). Thanks to Fr=C3=A9d=C3=A9ric for his= stimulating >> ideas, and thanks to Gabriel Scherer for not letting me sleep until I >> promised I would do something about them! :-) >> >> I have made a first release a couple days ago. The relevant chunk of the >> CHANGES file is appended below. In summary, a new incremental API is >> available; Menhir now requires ocaml 4.02; and a couple of obscure featu= res >> (--error-recovery and $previouserror) have been removed in the interest = of >> speed and simplicity. >> >> The incremental API means that you can take a snapshot of the parser's s= tate, >> essentially at no cost, at any time. It also means that the parser no lo= nger >> drives the lexer; you drive the lexer, and you provide tokens to the par= ser >> when it requests them. This can be convenient if the lexer is in a monad= (the >> Lwt monad, for instance). >> >> More changes are planned. The type "env" exposed by the new incremental = API is >> currently opaque. We are planning to offer a range of functions that all= ow >> inspecting and building values of type "env". This should allow the user= to >> implement new error handling and error recovery strategies outside of Me= nhir. >> >> The new release is available now as a .tar.gz archive: >> >> http://gallium.inria.fr/~fpottier/menhir/menhir-20141215.tar.gz >> >> It is also available via opam ("opam install menhir"). >> >> Cheers, >> >> -- >> Fran=C3=A7ois Pottier >> Francois.Pottier@inria.fr >> http://gallium.inria.fr/~fpottier/ >> >> 2014/12/15: >> New incremental API (in --table mode only), inspired by Fr=C3=A9d=C3=A9r= ic Bour. >> >> 2014/12/11: >> Menhir now reports an error if one of the start symbols produces >> either the empty language or the singleton language {epsilon}. >> >> Although some people out there actually define a start symbol that recog= nizes >> {epsilon} (and use it as a way of initializing or re-initializing some g= lobal >> state), this is considered bad style. Furthermore, by ruling out this ca= se, we >> are able to simplify the table back-end a little bit. >> >> 2014/12/12: >> A speed improvement in the code back-end. >> >> 2014/12/08: >> Menhir now requires OCaml 4.02 (instead of 3.09). >> >> 2014/12/02: >> Removed support for the $previouserror keyword. >> Removed support for --error-recovery mode. >> > > -- > ------------------------------------------------------------ > Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de > My OCaml site: http://www.camlcity.org > Contact details: http://www.camlcity.org/contact.html > Company homepage: http://www.gerd-stolpmann.de > ------------------------------------------------------------ >