I saw this on mirage's wiki: https://github.com/mirage/mirage-www/wiki/Pioneer-Projects#bigarray-parser-generator Apart from using bigarrays for performance, the idea of using a ppx to make the combinators more efficient looks interesting. I also plan to try to mix recursive descent with a monadic "refill" function for simple parsers (S-expressions, Bencode, etc.) It will be interesting to compare that with Daniel's manual CPS. Cheers! Le Thu, 18 Dec 2014, Daniel Bünzli a écrit : > Le jeudi, 18 décembre 2014 à 15:19, Nicolas Ojeda Bar a écrit : > > When programming monadically you are reifying the continuation at > > every `bind` point so you get the incremental bit for free (I think > > this goes by the fancy name of `iteratees` nowadays). On the other > > hand it suffers from poor time/space performance common to this type > > of parser. Also, while combinator parsers can handle arbitrary > > backtracking (and you end up paying for this), IMAP itself requires > > very little, so it would seem that the full flexibility of combinators > > are not needed. > > For a long time I played with combinator parsers but I never got to the point of being satisfied with the result. I also played a little bit with iteratees but couldn't get the performance I wanted with the full model in all its compositional glory. > > Nowadays I simply write my streaming lexers/parsers manually in CPS and you can drive them in non-blocking mode with a single, fixed size, buffer. See here [1] for an interface and an implementation on a toy example. > > If you get the lowest continuation decoding bits correctly (e.g. don't bind on each byte) you can actually get good time and space performance. Once you have handled that (mainly the painful bits about read overlapping two input buffers) you get very nice parsing flexibility, e.g. for things like text encoding discovery where you need to patch the continuation with the appropriate character decoder. If you care for your users you can also get very good error reporting and error recovery capabilities by applying knowledge specific to the decoded protocol. That's the way Uutf [2], Jsonm [3] (on top of Uutf) and Dicomm [4] are programmed. -- Simon http://weusepgp.info/ key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3 7D8D 4AC0 1D08 49AA 62B6