From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id WAA03559; Tue, 16 Sep 2003 22:08:27 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id WAA05466 for ; Tue, 16 Sep 2003 22:08:26 +0200 (MET DST) Received: from mail3.tpgi.com.au (mail.tpgi.com.au [203.12.160.59]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h8GK8Of21281 for ; Tue, 16 Sep 2003 22:08:25 +0200 (MET DST) Received: from 203-213-82-62-syd-ts14-2600.tpgi.com.au (203-213-82-62-syd-ts14-2600.tpgi.com.au [203.213.82.62]) by mail3.tpgi.com.au (8.11.6/8.11.6) with ESMTP id h8GK8Gc26668; Wed, 17 Sep 2003 06:08:17 +1000 Subject: Re: [Caml-list] line number information in abstract syntax trees From: skaller Reply-To: skaller@ozemail.com.au To: "Rafael 'Dido' Sevilla" Cc: caml-list@inria.fr In-Reply-To: <20030915075339.GA20719@imperium.ph> References: <20030915075339.GA20719@imperium.ph> Content-Type: text/plain Message-Id: <1063742870.14799.18.camel@pelican> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 17 Sep 2003 06:07:51 +1000 Content-Transfer-Encoding: 7bit X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 ozemail:01 rafael:01 'dido':01 decl:01 decl:01 datatype:01 datatype:01 int':01 combinator:01 srcref:01 elided:01 srcref:01 lessened:01 bug:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Mon, 2003-09-15 at 17:53, Rafael 'Dido' Sevilla wrote: > As some of you have suggested earlier, I have foregone doing some > preliminary semantic analysis for my compiler in my ocamlyacc grammar, > and instead am using the grammar solely to do syntactic analysis. Which > then brings me to another problem. I've created an abstract syntax tree > data type, but now I need to somehow embed line number information > obtained from the syntactic analysis phase so that I can later do error > reporting. I can't think of a clean way to do this. So far, I have a > syntax tree data type that kind of looks like: > > type program = { impmodule: string ; tdecls: topdecl list; plineno:int} > and topdecl = > Declaration of decl * int > and decl = { idents: string list ; dtype: xtype ; dlineno:int} > and xtype = > Data of datatype * int > | Func of fntype * int > | Alias of xtype * int > and datatype = > Byte of int > | Int of int > | Big of int > | Real of int > | String of int > | Tuple of (datatype list * int) > | Array of (datatype * int) > | List of (datatype * int) > | Chan of (datatype * int) > > Note that all the record types have additional fields that look like > 'plineno:int' and every variant type has an int tacked on somewhere. > That int is supposed to contain the line number. > > This works just fine, but it just seems to me like such a grossly ugly > hack into what is otherwise an elegant-looking data structure. Anyone > have style guidelines In Felix, every single node of the Abstract Syntax Tree contains a source reference (except type expressions). Whilst it is painful to construct this information, it is worthwhile. My nodes look like: | AST_name (sr,name) | AST_apply(sr,f1,e1) . | AST_literal (sr,9999) ... where sr is the source reference. An alternative for expressions is a dummy expression combinator: | AST_srcref (sr,e) which can be put where needed. When the source is just a 'span' of two nodes it can be elided, and you use a function let rec src x = match x with | AST_apply(f,e) -> range (src f) (src e) to compute the source location. My ranged source references have the type type range_srcref = string * (* filename *) int * (* start line number *) int * (* start column *) int * (* end line number *) int (* end column *) Even token lexed contains the filename, line number, and start and end columns of the lexeme the token was derived from. The pain of carrying the source references around is lessened when you consider that in any production quality compiler 70% of all the code is error reporting anyhow :-) Here's an error diagnostic (this one is actually a compiler bug) CLIENT ERROR [bind_exe] LHS[t](List::list[]) of initialisation must have same type as RHS(List::list[]) unfolded LHS = List::list[] In lpsrc/flx_lib.ipk: line 504, cols 19 to 20 503: | Empty => Empty 504: | Cons (?h, ?t) => Cons (h, rev t) ** 505: endmatch ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners