From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id BAA00020; Wed, 9 Oct 2002 01:21:28 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id BAA11172 for ; Wed, 9 Oct 2002 01:21:27 +0200 (MET DST) Received: from athlon.baretta.com (host37-68.pool80116.interbusiness.it [80.116.68.37]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g98NLP520461; Wed, 9 Oct 2002 01:21:25 +0200 (MET DST) Received: from baretta.com (localhost.localdomain [127.0.0.1]) by athlon.baretta.com (Postfix) with ESMTP id 32151273AE; Wed, 9 Oct 2002 01:31:39 +0200 (CEST) Message-ID: <3DA36ADA.9050507@baretta.com> Date: Wed, 09 Oct 2002 01:31:38 +0200 From: Alessandro Baretta Organization: Baretta srl -- www.baretta.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826 X-Accept-Language: it, en MIME-Version: 1.0 To: Pierre Weis , Ocaml Subject: Re: [Caml-list] Bug somewhere... References: <200210082007.WAA27767@pauillac.inria.fr> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Pierre Weis wrote: > > A lot of problems in here: some are due to the semantics of the Scanf > module some are due to the implementation, some are even deeper than > those two! > > Indeed the two programs are not equivalent (and their behaviour are > indeed different!). They are meant to be equivalent under the following assumption: the input file is divided in lines which are terminated by either '\n' or '\r'. The difference is mostly due to the fact that Scanf 3.06 reads an extra character with respect to the specified format string. Any other differences are attributable to faulty connections in my brain. > The first reason is that you cannot match eof (as you did with your > lexer) using Scanf. This could be considered as a missing feature and > we may add a convention to match end of file (either ``@.'', ``@$'', > or ``$'' ?). I can live with this. What Scanf *really lacks* is a C-equivalent support for partial matches. If a C-format matches only partially, only the conversions specified in the matched prefix are performed. In O'Caml, Scanf throws an exception. A better solution would be for Scanf.scanf to have type : ('a, Scanning.scanbuf, 'b) format -> 'a option -> 'b If a conversion is performed then the callback function is passed Some(); otherwise, in a partial match f gets a number of None actual parameters from scanf. This approach would make Scanf much more useful. We would be able to explicitly code simple parsers in Ocaml logic and Scanf formats, when, at present, we would be forced to go with Ocamllex/yacc. Take my case, for example. > Second, your lexer uses an explicitely allocated buffer lexbuf, while > the scanf corresponding call allocates a new input buffer for each > invocation; but the semantics of Scanf imposes a look ahead of 1 > character to check that no other \n follows the \n that ends your > pattern (the semantics of \n being to match 0 or more \n, space, tab, > or return). For each line Scanf reads an extra character after the end > of line; it stores this character (wihch is a '(' by the way) in the > input buffer; but note that the character has been read from the > in_channel; now the next scanf invocation will allocate a new input > buffer that reads from stdin starting after the last character read by > the preceding invocation (the '(' looahead character). Hence you > see that a '(' is missing at the beginning of each line after the > first one! This behaviour is couterintuitive, and should be considered buggy. > To solve this problem, you should use bscanf and an explicitely > allocated input buffer that would survive from one call to scanf to > the next one. Considering that this phenomenon is general concerning > stdin and scanf, I rewrote the scanf code such that it allocates a > buffer once and for all. Hence this problem is solved in the working > sources. Very good. Thank you very much. > ... > Another semantical question is: should the call > > sscanf "" "%[^\n\r]\n" (fun x -> x) > > be successful or not ? If yes, what happens to your problem ? With the present semantics, it should raise an exception. With the semantics of partial matches it should succeed. > An interesting example indeed that helps precising the semantics of > Scanf patterns and functions, thank you very much! > > Pierre Weis I humbly bow to your kindness. Thank you very much for sharing your work with all of us. Alex ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners