From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 866A4BBB7 for ; Fri, 28 Jul 2006 00:37:55 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id k6RMbtHt021841 for ; Fri, 28 Jul 2006 00:37:55 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id AAA18226 for ; Fri, 28 Jul 2006 00:37:54 +0200 (MET DST) Received: from wx-out-0102.google.com (wx-out-0102.google.com [66.249.82.207]) by nez-perce.inria.fr (8.13.6/8.13.6) with ESMTP id k6RMbrs9009480 for ; Fri, 28 Jul 2006 00:37:54 +0200 Received: by wx-out-0102.google.com with SMTP id i27so1362568wxd for ; Thu, 27 Jul 2006 15:37:53 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=XXFEn1snq5a5ROPx4b7qU8wtvqR81OhkSHbW2BUgC+HiXVbjRUPxPyGOn00CU16beaNNQG5IPEKRZJERXjci4BBZabQwtgo/25NHMpWrBEmkx3seoTscfu92cTFhQrfeZ02oBH/gAMvj/a9tBahsiBNPiU3ENRPIV8j9fleEpMg= Received: by 10.70.8.6 with SMTP id 6mr1392474wxh; Thu, 27 Jul 2006 15:37:53 -0700 (PDT) Received: by 10.70.57.2 with HTTP; Thu, 27 Jul 2006 15:37:53 -0700 (PDT) Message-ID: Date: Thu, 27 Jul 2006 18:37:53 -0400 From: "Markus Mottl" To: ocaml Subject: Partial parsing MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-j-chkmail-Score: MSGID : 44C94041.002 on nez-perce : j-chkmail score : X : 0/20 1 X-Miltered: at concorde with ID 44C94043.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 44C94041.002 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; markus:01 mottl:01 markus:01 mottl:01 parsing:01 lexing:01 parsing:01 ocaml:01 lexing:01 buffer:01 lexer:01 parser:01 buffer:01 ocamllex:01 ocamlyacc:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.5 required=5.0 tests=INFO_TLD,RCVD_BY_IP autolearn=disabled version=3.0.3 Hi, one feature that I'd love to see with our lexing/parsing tools for OCaml would be the possibility to perform partial lexing / parsing. For example, imagine that you read a message from a socket into a string buffer, but it is not yet complete. You may not be able to know that in advance, because that would require parsing it. What I'd like to be able to do is to e.g. call a lexer or parser function on this buffer, and it will recognize as much as possible with the option to continue parsing in another buffer. For this we could e.g. define a type "parse_result": type parse_result = | Done of result * int | Cont of parse_fun and parse_fun = pos : int -> len : int -> string -> parse_result If a full parse was detected, it would return e.g. "Done (result, next_pos)", where "result" is the result of the successful parse, and "next_pos" is the position in the buffer that contains the character following the successfully parsed text. If the buffer did not contain enough data to finish the parse, the function would return e.g. "Cont f", where "f" can be called to continue parsing in a new buffer starting at position "pos" and reading at most "len" characters, e.g. in the case when the rest of the message arrives in the input buffer. So far this kind of parsing can only be implemented by hand, which is very tedious to do and inevitable if one wants to read self-delimiting messages of variable size from sources that return partial data. I don't see any inherent problem extending the code generators for ocamllex and ocamlyacc to support this feature. Has anybody already implemented a generic solution to this kind of parsing problem? Maybe Menhir, which seems like an ambitious parser project, could add this feature to support partial parsing (of course, this would require cooperation from the lexer, too)? Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com