From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA12890; Mon, 10 Sep 2001 17:31:23 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id RAA13468 for ; Mon, 10 Sep 2001 17:31:22 +0200 (MET DST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f8AFVI128029; Mon, 10 Sep 2001 17:31:18 +0200 (MET DST) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA13046; Mon, 10 Sep 2001 17:31:18 +0200 (MET DST) Date: Mon, 10 Sep 2001 17:31:18 +0200 From: Xavier Leroy To: Michael Leary Cc: caml Subject: Re: [Caml-list] lexer disambiguation? Message-ID: <20010910173118.A12822@pauillac.inria.fr> References: <20010831140925.U2959@ip178.usw22.rb1.bel.nwlink.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20010831140925.U2959@ip178.usw22.rb1.bel.nwlink.com>; from leary@nwlink.com on Fri, Aug 31, 2001 at 02:09:25PM -0700 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > since the lexer looks like an ordinary ocaml function (more or less), does > the disambiguation boil down to: > > 1. the longest series of bytes that matches a single rule > 2. match the first rule in the function that matches #1 I'm not sure which lexer you're talking about. Lexers generated by ocamllex do indeed implement the behavior you describe: longest match + first rule if several rules matches the same maximal-length substring. (But they sure don't look like ordinary OCaml functions: they just call an underlying table-driven DFA engine that does all the hard work!) Lexers written using stream parsers behave like all stream parsers: they select the first pattern that matches the beginning of the stream, then "commit" to this pattern, matching the remainder of the pattern without backtracking. This "commit" behavior is different from regular pattern-matching on (say) lists, which backtracks as necessary. The OCaml lexer (used by the compilers and the toplevel), as well as the generic lexer in module Genlex, also implement the longest-match rule, so that for instance abcd is one identifier, not four identifiers a, b, c, and d. I hope this answers your question. - Xavier Leroy ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr