caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] line number information in abstract syntax trees
@ 2003-09-15  7:53 Rafael 'Dido' Sevilla
  2003-09-15 11:14 ` Michal Moskal
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Rafael 'Dido' Sevilla @ 2003-09-15  7:53 UTC (permalink / raw)
  To: caml-list

As some of you have suggested earlier, I have foregone doing some
preliminary semantic analysis for my compiler in my ocamlyacc grammar,
and instead am using the grammar solely to do syntactic analysis.  Which
then brings me to another problem.  I've created an abstract syntax tree
data type, but now I need to somehow embed line number information
obtained from the syntactic analysis phase so that I can later do error
reporting.  I can't think of a clean way to do this.  So far, I have a
syntax tree data type that kind of looks like:

type program = { impmodule: string ; tdecls: topdecl list; plineno:int}
and topdecl =
    Declaration of decl * int
and decl = { idents: string list ; dtype: xtype ; dlineno:int}
and xtype =
    Data of datatype * int
  | Func of fntype * int
  | Alias of xtype * int
and datatype =
    Byte of int
  | Int of int
  | Big of int
  | Real of int
  | String of int
  | Tuple of (datatype list * int)
  | Array of (datatype * int)
  | List of (datatype * int)
  | Chan of (datatype * int)

Note that all the record types have additional fields that look like
'plineno:int' and every variant type has an int tacked on somewhere.
That int is supposed to contain the line number.

This works just fine, but it just seems to me like such a grossly ugly
hack into what is otherwise an elegant-looking data structure.  Anyone
have style guidelines 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] line number information in abstract syntax trees
  2003-09-15  7:53 [Caml-list] line number information in abstract syntax trees Rafael 'Dido' Sevilla
@ 2003-09-15 11:14 ` Michal Moskal
  2003-09-16 20:07 ` skaller
  2003-09-17  8:44 ` Christian Lindig
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Moskal @ 2003-09-15 11:14 UTC (permalink / raw)
  To: Rafael 'Dido' Sevilla; +Cc: caml-list

On Mon, Sep 15, 2003 at 03:53:39PM +0800, Rafael 'Dido' Sevilla wrote:
> Note that all the record types have additional fields that look like
> 'plineno:int' and every variant type has an int tacked on somewhere.
> That int is supposed to contain the line number.
> 
> This works just fine, but it just seems to me like such a grossly ugly
> hack into what is otherwise an elegant-looking data structure.  Anyone
> have style guidelines 

type location = { l_line : int; l_file : string; }

type decl = { decl_location : location; decl_raw : raw_decl; }
and raw_decl =
  | Foo
  | Bar of decl list
  ...

This is the way used in OCaml compiler itself and few my compilers.

-- 
: Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv
: When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] line number information in abstract syntax trees
  2003-09-15  7:53 [Caml-list] line number information in abstract syntax trees Rafael 'Dido' Sevilla
  2003-09-15 11:14 ` Michal Moskal
@ 2003-09-16 20:07 ` skaller
  2003-09-17  8:44 ` Christian Lindig
  2 siblings, 0 replies; 6+ messages in thread
From: skaller @ 2003-09-16 20:07 UTC (permalink / raw)
  To: Rafael 'Dido' Sevilla; +Cc: caml-list

On Mon, 2003-09-15 at 17:53, Rafael 'Dido' Sevilla wrote:
> As some of you have suggested earlier, I have foregone doing some
> preliminary semantic analysis for my compiler in my ocamlyacc grammar,
> and instead am using the grammar solely to do syntactic analysis.  Which
> then brings me to another problem.  I've created an abstract syntax tree
> data type, but now I need to somehow embed line number information
> obtained from the syntactic analysis phase so that I can later do error
> reporting.  I can't think of a clean way to do this.  So far, I have a
> syntax tree data type that kind of looks like:
> 
> type program = { impmodule: string ; tdecls: topdecl list; plineno:int}
> and topdecl =
>     Declaration of decl * int
> and decl = { idents: string list ; dtype: xtype ; dlineno:int}
> and xtype =
>     Data of datatype * int
>   | Func of fntype * int
>   | Alias of xtype * int
> and datatype =
>     Byte of int
>   | Int of int
>   | Big of int
>   | Real of int
>   | String of int
>   | Tuple of (datatype list * int)
>   | Array of (datatype * int)
>   | List of (datatype * int)
>   | Chan of (datatype * int)
> 
> Note that all the record types have additional fields that look like
> 'plineno:int' and every variant type has an int tacked on somewhere.
> That int is supposed to contain the line number.
> 
> This works just fine, but it just seems to me like such a grossly ugly
> hack into what is otherwise an elegant-looking data structure.  Anyone
> have style guidelines 

In Felix, every single node of the Abstract Syntax Tree contains
a source reference (except type expressions). Whilst it is
painful to construct this information, it is worthwhile.
My nodes look like:

	| AST_name (sr,name)
	| AST_apply(sr,f1,e1)
.	| AST_literal (sr,9999)
	...

where sr is the source reference.

An alternative for expressions is a dummy expression 
combinator:

	| AST_srcref (sr,e)

which can be put where needed. When the source
is just a 'span' of two nodes it can be elided,
and you use a function

	let rec src x = match x with
	| AST_apply(f,e) -> range (src f) (src e)

to compute the source location. My ranged source
references have the type

	type range_srcref  = 
		string * (* filename *)
		int * (* start line number *)
		int * (* start column *)
		int * (* end line number *)
		int  (* end column *)


Even token lexed contains the filename,
line number, and start and end columns
of the lexeme the token was derived from.

The pain of carrying the source references
around is lessened when you consider that 
in any production quality compiler 70%
of all the code is error reporting anyhow :-)

Here's an error diagnostic (this one is actually
a compiler bug)

CLIENT ERROR
[bind_exe] LHS[t](List::list[<T1128>]) of initialisation must have same
type as
RHS(List::list[<T1104>]) unfolded LHS = List::list[<T1128>]
In lpsrc/flx_lib.ipk: line 504, cols 19 to 20
503:       | Empty => Empty
504:       | Cons (?h, ?t) => Cons (h, rev t)
                       **
505:       endmatch


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] line number information in abstract syntax trees
  2003-09-15  7:53 [Caml-list] line number information in abstract syntax trees Rafael 'Dido' Sevilla
  2003-09-15 11:14 ` Michal Moskal
  2003-09-16 20:07 ` skaller
@ 2003-09-17  8:44 ` Christian Lindig
  2003-09-18 19:01   ` skaller
  2 siblings, 1 reply; 6+ messages in thread
From: Christian Lindig @ 2003-09-17  8:44 UTC (permalink / raw)
  To: Rafael 'Dido' Sevilla; +Cc: Caml Mailing List

On Mon, Sep 15, 2003 at 03:53:39PM +0800, Rafael 'Dido' Sevilla wrote:
> and datatype =
>     Byte of int
>   | Int of int
>   | Big of int
>   | Real of int
>   | String of int
>   | Tuple of (datatype list * int)
>   | Array of (datatype * int)
>   | List of (datatype * int)
>   | Chan of (datatype * int)
> 
> This works just fine, but it just seems to me like such a grossly ugly
> hack into what is otherwise an elegant-looking data structure.  Anyone
> have style guidelines 

I like to suggest a technique (that I learned from Norman Ramsey) that
makes line numbers optional in the abstract syntax tree (AST).  The
suggested style leads to a separation of concerns.

  and expr =
        ExprAt of (expr * region)                   (* <--- *)
      | Int of (StdPrims.std_string * ty option)
      | Float of (StdPrims.std_string * ty option)
      | Char of (StdPrims.std_int * ty option)
      | Fetch of (name_or_mem)
      | BinOp of (expr * op * expr)
      | UnOp of (op * expr)
      | PrimOp of (name * actual list)

An expression is an Int or a Float or something else and by itself does
not contain a line number. But it may be wrapped in an ExprAt tag that
contains the expression and the line number (here of type region). When
you build the abstract syntax tree in a parser, you usually create
ExprAt nodes that contain the line number (but you don't have to). You
can easily implement a policy that attaches line numbers only at the
outer level on an expression, if you want to save space.  This
representation comes in handy when you create an AST not by parsing,
but directly. In this case you don't have to invent line numbers because
you simply never would generate ExprAt nodes. 

-- Christian

-- 
Christian Lindig         http://www.st.cs.uni-sb.de/~lindig/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] line number information in abstract syntax trees
  2003-09-17  8:44 ` Christian Lindig
@ 2003-09-18 19:01   ` skaller
  2003-09-19  6:50     ` Christian Lindig
  0 siblings, 1 reply; 6+ messages in thread
From: skaller @ 2003-09-18 19:01 UTC (permalink / raw)
  To: Christian Lindig; +Cc: Rafael 'Dido' Sevilla, Caml Mailing List

On Wed, 2003-09-17 at 18:44, Christian Lindig wrote:
> On Mon, Sep 15, 2003 at 03:53:39PM +0800, Rafael 'Dido' Sevilla wrote:

>   and expr =
>         ExprAt of (expr * region)                   (* <--- *)
>       | Int of (StdPrims.std_string * ty option)

>   This
> representation comes in handy when you create an AST not by parsing,
> but directly. In this case you don't have to invent line numbers because
> you simply never would generate ExprAt nodes. 

This is a false sense of security. Synthesised terms may
also contain errors (for example type errors). Therefore the
error reporting needs to reflect where the error occurred,
which means you need to synthesise an appropriate source reference.

The advantage of mandatory source references (as opposed to the
above style) is that you're *forced* as a programmer to consider
the issue at all times. I sometimes 'cheat' a bit, and dont give
a precise enough reference, but that's better than omiting
one: for every expression, mandatory source references *guarrantee*
the presence of a source reference -- and that assurance is useful
in a product which is 90% error reporting, all of which *should*
relate back to the input source.

In particular this means that term rewriting rules must attempt
to preserve source location data in some sensible way, which is often
somewhat more diffiult than the logic of the rewriting rules themselves.


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] line number information in abstract syntax trees
  2003-09-18 19:01   ` skaller
@ 2003-09-19  6:50     ` Christian Lindig
  0 siblings, 0 replies; 6+ messages in thread
From: Christian Lindig @ 2003-09-19  6:50 UTC (permalink / raw)
  To: skaller; +Cc: Caml Mailing List

On Fri, Sep 19, 2003 at 05:01:16AM +1000, skaller wrote:
> On Wed, 2003-09-17 at 18:44, Christian Lindig wrote:
> > On Mon, Sep 15, 2003 at 03:53:39PM +0800, Rafael 'Dido' Sevilla wrote:
> 
> >   and expr =
> >         ExprAt of (expr * region)                   (* <--- *)
> >       | Int of (StdPrims.std_string * ty option)
> 
> >   This representation comes in handy when you create an AST not by
> >   parsing, but directly. In this case you don't have to invent line
> >   numbers because you simply never would generate ExprAt nodes. 
> 
> This is a false sense of security. Synthesised terms may also contain
> errors (for example type errors). 

I should have made my example more precise. In the Quick C-- compiler we
have a pretty printer for the AST. An AST is normally produced by the
parser where we record line number information in *At nodes; in
addition, we can translate back our intermediate representation to
source code, via the AST and pretty printer. When we do so, we don't
bother with line number information because it is for debugging only. I
still consider this appropriate.

> The advantage of mandatory source references (as opposed to the
> above style) is that you're *forced* as a programmer to consider
> the issue at all times. 

Line number information become a dynamic property, and of course, they
are less secure than statically enforced line numbers. I guess everybody
here recognizes this as an instance of the familiar static vs. dynamic
discussion.

-- Christian

-- 
Christian Lindig         http://www.st.cs.uni-sb.de/~lindig/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-19  6:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-15  7:53 [Caml-list] line number information in abstract syntax trees Rafael 'Dido' Sevilla
2003-09-15 11:14 ` Michal Moskal
2003-09-16 20:07 ` skaller
2003-09-17  8:44 ` Christian Lindig
2003-09-18 19:01   ` skaller
2003-09-19  6:50     ` Christian Lindig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).