From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id DAA20550; Sun, 23 Sep 2001 03:15:48 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id DAA20546 for ; Sun, 23 Sep 2001 03:15:47 +0200 (MET DST) Received: from lakeland.eecs.harvard.edu (lakeland.eecs.harvard.edu [140.247.62.173]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f8N1Fkn17716 for ; Sun, 23 Sep 2001 03:15:46 +0200 (MET DST) Received: from garp.eecs.harvard.edu (h000a27dde966.ne.mediaone.net [24.91.4.184]) by lakeland.eecs.harvard.edu (Postfix) with ESMTP id 7281317F67; Sat, 22 Sep 2001 21:15:45 -0400 (EDT) Received: by garp.eecs.harvard.edu (Postfix, from userid 1000) id F179D42B8; Sat, 22 Sep 2001 21:10:13 -0400 (EDT) Date: Sat, 22 Sep 2001 21:10:13 -0400 From: Christian Lindig To: Vesa Karvonen Cc: caml-list@inria.fr Subject: Re: [Caml-list] On ocamlyacc and ocamllex Message-ID: <20010922211013.A580@eecs.harvard.edu> Mail-Followup-To: Christian Lindig , Vesa Karvonen , caml-list@inria.fr References: <000b01c143aa$d5218690$422aa8c0@housemarque.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000b01c143aa$d5218690$422aa8c0@housemarque.fi> User-Agent: Mutt/1.3.20i Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Sun, Sep 23, 2001 at 12:09:14AM +0300, Vesa Karvonen wrote: > I would like to make it so that the lexer would record the positions > of line breaks so that I could directly give line number and column > information in error messages. I agree that more flexible lexer and parser generators would be nice and have myself lobbied for them in the past. On the other hand I have always found my way with the existing ones which probably is the reason that we still use them. The particular problem can be solved outside of Lex and Yacc: in the Quick C-- compiler we have a mutable Sourcemap.map data type that records the connection between character positions and (file,line,column) triples. The scanner call a function Sourcemap.nl for every newline that it encounters and to build up the connection. Later the map can be used to find the (file,line,column) position for every character offset. This method has the advantage that it can deal with input streams that are created from different source files using a pre-processor. You can find the module as part of the ocamlerror tool that annotates stack traces with source code positions: http://www.eecs.harvard.edu/~lindig/software/download/ocamlerror.tar.gz > Another issue with ocamllex and ocamlyacc (and lex/flex and > yacc/bison) is that the dependencies between the generated lexer and > parser are not quite optimal. Currently the generated lexer is > dependent on the parser, because the parser generates the token type. > This means that each time the grammar is modified, but not the token > definitions, the lexer is recompiled. This could be avoided by making > it so that the token type is defined in a separate module. This is a general problem with make: when you edit a comment, a file is touched and all dependent files must be recompiled. Knowing nothing about OCaml, Make must assume that a touched file has changed in a significant way. You can help Make out by comparing files explicitly (untested - you get the idea): foo.mli is only updated, if the token type has changed. foo.mli foo.ml: bar.mly ocamlyacc bar.mly cp bar.ml foo.ml cmp -s bar.mli foo.mli || cp bar.mli foo.mli -- Christian -- Christian Lindig Harvard University - EECS lindig@eecs.harvard.edu 33 Oxford St, MD 242, Cambridge MA 02138 phone: (617) 496-7157 http://www.eecs.harvard.edu/~lindig/ ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr