From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id LAA14842; Thu, 17 Jun 2004 11:36:46 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id LAA11684; Thu, 17 Jun 2004 11:36:45 +0200 (MET DST) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i5H9afSH002929; Thu, 17 Jun 2004 11:36:43 +0200 Received: from [192.168.1.200] (ppp219-142.lns2.syd3.internode.on.net [203.122.219.142]) by smtp1.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id i5H9aZ4Y029311; Thu, 17 Jun 2004 19:06:36 +0930 (CST) Subject: Re: [Caml-list] ocamllex/yacc and camlp4 From: skaller Reply-To: skaller@users.sourceforge.net To: Pierre Weis Cc: Luc Maranget , jgoerzen@complete.org, caml-list In-Reply-To: <200406162248.AAA11976@pauillac.inria.fr> References: <200406162248.AAA11976@pauillac.inria.fr> Content-Type: text/plain Message-Id: <1087464994.16811.1476.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 17 Jun 2004 19:36:35 +1000 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 40D1662A.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 camlp:01 sourceforge:01 2004:99 pierre:01 weis:01 damned:01 lalr:01 lalr:01 coupling:01 camlp:01 examples'':01 camomile:01 distro:01 pcre:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Thu, 2004-06-17 at 08:48, Pierre Weis wrote: > 1) ocamllex and ocamlyacc implementation technologies are damned fast > and it is difficult to compete with them using streams. They're not so fast when your problem exceeds the constraints which determine what they're good at. All my lexers generate an in memory token list for this reason. > Last but not least, the actual ocamllex/ocamlyacc implementations work > pretty well, so that there is no clear necessity to rewrite them. I think there is: they haven't worked so well for any of the parsers I've had to write -- not even the Felix parser which is specifically designed to be unambiguious LALR(1) and Ocamlyaccable. The parser can't take a state argument, it can't accept a token type, the generated interface can't be added to by the client which is necessary when you need to define a function callable by the lexer and parser which depends on the type of a token, you can't use a meta-grammar notation with the obvious interpretation (a* makes a list, a? an option). Considerable effort is required to decouple the faulty interface which makes the parser depend on a lexbuf. LALR(1) is very hard to work with, and often the easiest workaround is by doing some lookahead in the tokeniser: the coupling of the parser and lexer make this difficult. Felix lexer/parser needs about 8 files: more than any other part of the compiler. Also Ocamllex is only an 8 bit lexer which isn't that useful these days where XML/Web stuff demands UTF-8 encoded Unicode. > In conclusion: pure Camlp4 implementation of ocamllex/ocamlyacc is > still an interesting and challenging progamming task for the next few > years, if you (or someone else) had the will and time to provide two > ``great camlp4 examples'' to the rest of us... > > Happy hacking :) Ulex already integrates lexing and provides UTF-8, Camomile already provides 32 bit lexers. Code exists. The problem here isn't hacking the code, but getting INRIA to agree to sit down and work with the community on designing an interface specification for a facility good enough to put directly in the standard distro. Once that were agreed I'm quite sure the non-INRIA community would rapidly provide an implementation. As an added incentive: an integrated lexer automatically provides a superior alternative to Str and PCRE, and if you do a parser as well, an even better system: quite a lot of 'regexp' stuff should actually be done by a real parser. -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners