caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Nicolas Ojeda Bar <no263@dpmms.cam.ac.uk>
To: Simon Cruanes <simon.cruanes.2007@m4x.org>
Cc: "Daniel Bünzli" <daniel.buenzli@erratique.ch>,
	"Gerd Stolpmann" <info@gerd-stolpmann.de>,
	"Francois Pottier" <Francois.Pottier@inria.fr>,
	caml-list@inria.fr
Subject: Re: [Caml-list] New release of Menhir (20141215)
Date: Thu, 18 Dec 2014 13:02:44 -0300	[thread overview]
Message-ID: <CAPunWhBggDkB_Ou8+qsO==xAc6U=0dk4UwJMOjbPquTK1fk=Qg@mail.gmail.com> (raw)
In-Reply-To: <20141218153426.GI4679@emmental.inria.fr>

Hi Simon,

On Thu, Dec 18, 2014 at 12:34 PM, Simon Cruanes
<simon.cruanes.2007@m4x.org> wrote:
> I saw this on mirage's wiki:
> https://github.com/mirage/mirage-www/wiki/Pioneer-Projects#bigarray-parser-generator
>
> Apart from using bigarrays for performance, the idea of using a ppx to
> make the combinators more efficient looks interesting.

If I understand correctly the optimizations done by the FastParser
parsers are essentially a form of aggresive inlining.  This goes
hand-in-hand with encoding your parser as a set of mutually recursive
functions.  This type of encoding does not lend itself easily to
incremental parsing (where the parser can be stopped and resumed
as/when needed).

That apart, for other use-cases it looks pretty nice.

Best wishes,
Nicolas

> I also plan to try to mix recursive descent with a monadic "refill"
> function for simple parsers (S-expressions, Bencode, etc.) It will be
> interesting to compare that with Daniel's manual CPS.
>
> Cheers!
>
> Le Thu, 18 Dec 2014, Daniel Bünzli a écrit :
>> Le jeudi, 18 décembre 2014 à 15:19, Nicolas Ojeda Bar a écrit :
>> > When programming monadically you are reifying the continuation at
>> > every `bind` point so you get the incremental bit for free (I think
>> > this goes by the fancy name of `iteratees` nowadays). On the other
>> > hand it suffers from poor time/space performance common to this type
>> > of parser. Also, while combinator parsers can handle arbitrary
>> > backtracking (and you end up paying for this), IMAP itself requires
>> > very little, so it would seem that the full flexibility of combinators
>> > are not needed.
>>
>> For a long time I played with combinator parsers but I never got to the point of being satisfied with the result. I also played a little bit with iteratees but couldn't get the performance I wanted with the full model in all its compositional glory.
>>
>> Nowadays I simply write my streaming lexers/parsers manually in CPS and you can drive them in non-blocking mode with a single, fixed size, buffer. See here [1] for an interface and an implementation on a toy example.
>>
>> If you get the lowest continuation decoding bits correctly (e.g. don't bind on each byte) you can actually get good time and space performance. Once you have handled that (mainly the painful bits about read overlapping two input buffers) you get very nice parsing flexibility, e.g. for things like text encoding discovery where you need to patch the continuation with the appropriate character decoder. If you care for your users you can also get very good error reporting and error recovery capabilities by applying knowledge specific to the decoded protocol. That's the way Uutf [2], Jsonm [3] (on top of Uutf) and Dicomm [4] are programmed.
>
> --
> Simon
>
> http://weusepgp.info/
> key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3  7D8D 4AC0 1D08 49AA 62B6

  reply	other threads:[~2014-12-18 16:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-17 20:14 Francois Pottier
2014-12-18 12:45 ` Gerd Stolpmann
2014-12-18 14:19   ` Nicolas Ojeda Bar
2014-12-18 15:20     ` Daniel Bünzli
2014-12-18 15:34       ` Simon Cruanes
2014-12-18 16:02         ` Nicolas Ojeda Bar [this message]
2014-12-18 15:25     ` Gerd Stolpmann
2014-12-18 17:25       ` Francois Pottier
2014-12-22 11:13     ` oleg
2014-12-22 18:40 ` Dario Teixeira
2014-12-24 23:30   ` Francois Pottier
2014-12-26 11:13     ` Dario Teixeira
2014-12-26 11:31       ` Frédéric Bour
2014-12-26 12:16         ` Dario Teixeira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPunWhBggDkB_Ou8+qsO==xAc6U=0dk4UwJMOjbPquTK1fk=Qg@mail.gmail.com' \
    --to=no263@dpmms.cam.ac.uk \
    --cc=Francois.Pottier@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=daniel.buenzli@erratique.ch \
    --cc=info@gerd-stolpmann.de \
    --cc=simon.cruanes.2007@m4x.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).