From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id B2E4CBBB7 for ; Fri, 28 Jul 2006 11:58:21 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k6S9wLJu030551 for ; Fri, 28 Jul 2006 11:58:21 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id LAA24769 for ; Fri, 28 Jul 2006 11:58:20 +0200 (MET DST) Received: from m14s26.vip-server.net (m14s26.vip-server.net [193.138.181.188]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k6S9wFRk030511 for ; Fri, 28 Jul 2006 11:58:20 +0200 Received: from [/??????????`?L?$IPv6:::1] (localhost [127.0.0.1]) by m14s26.vip-server.net (Postfix) with ESMTP id 0B4BD89039; Fri, 28 Jul 2006 11:47:34 +0200 (CEST) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <244C09B7-E152-4B9E-8B7E-A27AC225B5E7@gaillourdet.net> Cc: ocaml Content-Transfer-Encoding: 7bit From: Jean-Marie Gaillourdet Subject: Re: [Caml-list] Partial parsing Date: Fri, 28 Jul 2006 11:58:59 +0200 To: Markus Mottl X-Pgp-Agent: GPGMail 1.1.2 (Tiger) X-Mailer: Apple Mail (2.752.2) X-Miltered: at nez-perce with ID 44C9DFBD.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 44C9DFB8.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; parsing:01 hash:01 markus:01 mottl:01 lexing:01 parsing:01 ocaml:01 lexing:01 buffer:01 lexer:01 parser:01 buffer:01 ocamllex:01 ocamlyacc:01 parser:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, On 28.07.2006, at 00:37, Markus Mottl wrote: > Hi, > > one feature that I'd love to see with our lexing/parsing tools for > OCaml would be the possibility to perform partial lexing / parsing. > > For example, imagine that you read a message from a socket into a > string buffer, but it is not yet complete. You may not be able to > know that in advance, because that would require parsing it. > > What I'd like to be able to do is to e.g. call a lexer or parser > function on this buffer, and it will recognize as much as possible > with the option to continue parsing in another buffer. For this we > could e.g. define a type "parse_result": > > type parse_result = > | Done of result * int > | Cont of parse_fun > and parse_fun = pos : int -> len : int -> string -> parse_result > > If a full parse was detected, it would return e.g. "Done (result, > next_pos)", where "result" is the result of the successful parse, and > "next_pos" is the position in the buffer that contains the character > following the successfully parsed text. > > If the buffer did not contain enough data to finish the parse, the > function would return e.g. "Cont f", where "f" can be called to > continue parsing in a new buffer starting at position "pos" and > reading at most "len" characters, e.g. in the case when the rest of > the message arrives in the input buffer. > > So far this kind of parsing can only be implemented by hand, which is > very tedious to do and inevitable if one wants to read self-delimiting > messages of variable size from sources that return partial data. I > don't see any inherent problem extending the code generators for > ocamllex and ocamlyacc to support this feature. > > Has anybody already implemented a generic solution to this kind of > parsing problem? Maybe Menhir, which seems like an ambitious parser > project, could add this feature to support partial parsing (of course, > this would require cooperation from the lexer, too)? You could implement such a feature on top of the existing lexer and parser technologies by using thread. Dedicate one thread to the lexer and parser, make the input buffer blocking and make every suitable intermediate result available to the main thread. I think it should also be possible to construct such a feature with the resumable exceptions module posted on this list some weeks ago. And I think it should be possible to implement that with the continuations posted here, too. But I have not much experience with continuations. And I am not sure whether that really fits into the framework of ocamllex/ocamlyacc. As Fermin Reig pointed out parser combinators provide that feature out of the box. AFAIK, they depend on lazy evaluation. I am not sure whether those libraries for OCaml are available to hide that or not. Best regards, Jean-Marie -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Darwin) iD8DBQFEyd/nNIUNP/I5YOgRAmWJAJ9WijR0B0f1FT/4smsum7Ln4qWXPACeLxkB 4Hp1UPcEseMzFYzGZfLhbYc= =IqFi -----END PGP SIGNATURE-----