From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 4E2C17FB38 for ; Mon, 22 Dec 2014 12:13:50 +0100 (CET) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of oleg@okmij.org) identity=pra; client-ip=66.39.3.115; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="oleg@okmij.org"; x-sender="oleg@okmij.org"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of oleg@okmij.org designates 66.39.3.115 as permitted sender) identity=mailfrom; client-ip=66.39.3.115; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="oleg@okmij.org"; x-sender="oleg@okmij.org"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@www1.g3.pair.com) identity=helo; client-ip=66.39.3.115; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="oleg@okmij.org"; x-sender="postmaster@www1.g3.pair.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjkBAOP7l1RCJwNzZ2dsb2JhbABbDoNKWII1xA+FcAKBGRYBAQEBARENBw9JhA0GeRshEyEdKxSIMQ3PGQEBAQEBBQEBAQEBHYxEgSGCDQcWgwCBEwWJR4gIhTMBgT2IGYd1g1NbIDGCQwEBAQ X-IPAS-Result: AjkBAOP7l1RCJwNzZ2dsb2JhbABbDoNKWII1xA+FcAKBGRYBAQEBARENBw9JhA0GeRshEyEdKxSIMQ3PGQEBAQEBBQEBAQEBHYxEgSGCDQcWgwCBEwWJR4gIhTMBgT2IGYd1g1NbIDGCQwEBAQ X-IronPort-AV: E=Sophos;i="5.07,623,1413237600"; d="scan'208";a="94442855" Received: from www1.g3.pair.com ([66.39.3.115]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/ADH-AES256-SHA; 22 Dec 2014 12:13:49 +0100 Received: by www1.g3.pair.com (Postfix, from userid 9370) id 68AD4C3829; Mon, 22 Dec 2014 06:13:46 -0500 (EST) From: oleg@okmij.org To: no263@dpmms.cam.ac.uk CC: Francois.Pottier@inria.fr, caml-list@inria.fr, daniel.buenzli@erratique.ch In-reply-to: Message-Id: <20141222111346.68AD4C3829@www1.g3.pair.com> Date: Mon, 22 Dec 2014 06:13:46 -0500 (EST) Subject: Re: [Caml-list] New release of Menhir (20141215) Regarding incremental parsing of protocols like IMAP: I have successfully (as in successfully deployed in production and being used around the clock, since about 2010 or so) used iteratees for incremental parsing of full XML, including CDATA, parsed entities and namespaces. The full XML is actually quite difficult to parse: for example, parsed entity references like & are not recognized within CDATA blocks; the content of attributes has its own whitespace handling rules. The parser is used for handling sometimes quite large XML documents. The parser is incremental and so can work in constant memory. http://okmij.org/ftp/Streams.html#xml I have also used iteratees to parse HTTP Log files, also incrementally. The log files have an (unintended, I hope) complication: the user-agent string (quoted in the log) may, according to RFC, itself contain quotes. Since the embedded quotes are not escaped (again, according to RFC), we may end up with quoted strings containing unescaped quote characters. Parsing will require unbounded look-ahead then. Iteratees can handle that -- and report errors precisely and recover. http://okmij.org/ftp/Streams.html#good-error http://okmij.org/ftp/Streams.html#fork Incidentally, there are quite many iteratee libraries. Some, like pipes, emphasize apparent simplicity and do no input buffering. The performance is indeed pretty bad then. I should also mention that a parser with a call-back interface and the absence of visible side-effects can _automatically_ be made incremental. The following web page describes incrementalization of stdlib's Genlex lexer. http://okmij.org/ftp/continuations/differentiating-parsers.html