Labelling trees

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Labelling trees
@ 2007-06-06 19:43 David Teller
  2007-06-06 21:22 ` [Caml-list] " Christophe Raffalli
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: David Teller @ 2007-06-06 19:43 UTC (permalink / raw)
  To: OCaml


   Hi everyone,
 I'm currently in the very first stages of writing down a prototype
implementation of a type checker. At the moment, I'm looking for "the
nicest" way of labelling my abstract syntax tree with type annotations
-- without corrupting the AST at all, if possible.


Say I have

type my_expression = ESomeConstructor  of ...
                   | ESomeConstructor2 of ...
                   | ESomeConstructor3 of my_function
and  my_function   = FSomeConstructor ...


A first idea would be to replace this structures with

type 'a my_expression = ESomeConstructor  of ...
                      | ESomeConstructor2 of ...
                      | ESomeConstructor3 of 'a my_function
and  'a my_function   = FSomeConstructor ...

That would let me annotate instances of my_expression or my_function
with informations of type 'a. However, this won't scale in case I decide
that my static checker will need annotations of different types for
my_expression and my_function. Of course, my AST is actually a tad more
complex and would require about 15 type arguments, which I don't
consider very nice. 

Intuitively, using functors will yield the same kind of half-satisfying
results.

Any suggestions ?

Thanks,
 David

-- 
David Teller ------------------------------------------
Security of Distributed Systems -----------------------
-- http://www.univ-orleans.fr/lifo/Members/David.Teller
----- Laboratoire d'Informatique Fondamentale d'Orleans


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-06 19:43 Labelling trees David Teller
@ 2007-06-06 21:22 ` Christophe Raffalli
  2007-06-07  1:00 ` skaller
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Christophe Raffalli @ 2007-06-06 21:22 UTC (permalink / raw)
  To: David Teller, caml-list

David Teller a écrit :
>    Hi everyone,
>  I'm currently in the very first stages of writing down a prototype
> implementation of a type checker. At the moment, I'm looking for "the
> nicest" way of labelling my abstract syntax tree with type annotations
> -- without corrupting the AST at all, if possible.
>
>
> Say I have
>
> type my_expression = ESomeConstructor  of ...
>                    | ESomeConstructor2 of ...
>                    | ESomeConstructor3 of my_function
> and  my_function   = FSomeConstructor ...
>
>
> A first idea would be to replace this structures with
>
> type 'a my_expression = ESomeConstructor  of ...
>                       | ESomeConstructor2 of ...
>                       | ESomeConstructor3 of 'a my_function
> and  'a my_function   = FSomeConstructor ...
>
> That would let me annotate instances of my_expression or my_function
> with informations of type 'a. However, this won't scale in case I decide
> that my static checker will need annotations of different types for
> my_expression and my_function. Of course, my AST is actually a tad more
> complex and would require about 15 type arguments, which I don't
> consider very nice. 
>
> Intuitively, using functors will yield the same kind of half-satisfying
> results.
>
> Any suggestions ?
>
>   
I can't ressit ... use pml 
(http://www.lama.univ-savoie.fr/~raffalli/repos/pml) in one week from 
now (I am currently implementing what you need,
this is why I am answering).

You will be able to write things like :

type AST =
  Nope[]
| App[AST,AST]

val rec decorate_with_size  ast =
  Nope[] -> 
    { Nope[] with val size = 0 } (* Here I am lying pml does not support 
native int yet, nor it will in one week *)
| App[u,v] ->
    let u' = decorate_with_size u and v' = decorate_with_size v in
    { App[decorate u, decorate v] with val size = 1 + u'.size  +  v'.size }

Moreover, the result type of decorate_with_size (which is {AST with val 
size : int }) is a subtype of AST
and this subtyping is completely implicit (no typecast needed) ...

OK, pml does not have native int (it has multiprecision nat) and no IO 
yet, it is interpreted (but not that slow),
probably buggy ... But you will soon be able to specify and prove your 
code too ...

Cheers,
Christophe



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-06 19:43 Labelling trees David Teller
  2007-06-06 21:22 ` [Caml-list] " Christophe Raffalli
@ 2007-06-07  1:00 ` skaller
  2007-06-07 14:26   ` Till Varoquaux
  2007-06-07 14:25 ` Christian Stork
  2007-06-07 23:48 ` Jeremy Yallop
  3 siblings, 1 reply; 9+ messages in thread
From: skaller @ 2007-06-07  1:00 UTC (permalink / raw)
  To: David Teller; +Cc: OCaml

On Wed, 2007-06-06 at 21:43 +0200, David Teller wrote:
> Hi everyone,

> That would let me annotate instances of my_expression or my_function
> with informations of type 'a. However, this won't scale in case I decide
> that my static checker will need annotations of different types for
> my_expression and my_function. Of course, my AST is actually a tad more
> complex and would require about 15 type arguments, which I don't
> consider very nice. 

This is indeed a nasty problem I have too. In theory, polymorphic
variants with open recursion support what you want by making a new
parametric term with a new case

	`TYPE_ast of typ * 'ast

and then closing the recursion. This is a covariant extension.

The problem is that whilst this is relatively easy to encode
for 1 parameter, it gets messy with 2 .. and if you have
15 or so as I do as well .. it's just easier to use a dummy
type or magic .. neither of which is satisfactory.

Nor is the above encoding very nice, since it doesn't ensure
each branch of the tree is labelled, instead it simply caches
values where you chose, to speed up the functional calculation.

I just do the boring thing. I make a new type with the term
plus annotations and translate from the untyped to typed terms.

You CAN solve this with a list or array mapping the physical 
term address to the type, with O(N) performance (YUK). 
You can't use a Map or Hashtbl because they require ordering, 
and Ocaml moves objects around so ordering on addresses isn't stable.

Double indirection works though: instead of terms, use an integer
which Maps to a term .. you can then also Map the integer to 
your type. Of course .. this isn't statically safe.

BTW: this is basically 'parallel hierarchies' problem of OO
on a value level.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-06 19:43 Labelling trees David Teller
  2007-06-06 21:22 ` [Caml-list] " Christophe Raffalli
  2007-06-07  1:00 ` skaller
@ 2007-06-07 14:25 ` Christian Stork
  2007-06-07 23:48 ` Jeremy Yallop
  3 siblings, 0 replies; 9+ messages in thread
From: Christian Stork @ 2007-06-07 14:25 UTC (permalink / raw)
  To: David Teller, caml-list

On Wed, Jun 06, 2007 at 09:43:02PM +0200, David Teller wrote:
> 
>    Hi everyone,
>  I'm currently in the very first stages of writing down a prototype
> implementation of a type checker. At the moment, I'm looking for "the
> nicest" way of labelling my abstract syntax tree with type annotations
> -- without corrupting the AST at all, if possible.
> 
> 
> Say I have
> 
> type my_expression = ESomeConstructor  of ...
>                    | ESomeConstructor2 of ...
>                    | ESomeConstructor3 of my_function
> and  my_function   = FSomeConstructor ...
> 
> 
> A first idea would be to replace this structures with
> 
> type 'a my_expression = ESomeConstructor  of ...
>                       | ESomeConstructor2 of ...
>                       | ESomeConstructor3 of 'a my_function
> and  'a my_function   = FSomeConstructor ...
> 
> That would let me annotate instances of my_expression or my_function
> with informations of type 'a. However, this won't scale in case I decide
> that my static checker will need annotations of different types for
> my_expression and my_function. Of course, my AST is actually a tad more
> complex and would require about 15 type arguments, which I don't
> consider very nice. 

Jacques Garrigue once mentioned the following trick on this list.
Instead of writing out 15 type parameters each time one can use a single
type paramenter with an object type constraint.  For example,

type 'a expr = E1 of 'e * 'a func
             | E2 of 'e * 'a foo
             | ...
             constraint 'a = <e:'e; f:'f; ...>
and 'a func = F1 of 'f * 'a bar
            | F2 of 'f * 'a bla
            | ...
            constraint 'a = <e:'e; f:'f; ...>
...

One could even think of reducing the remaining constraints with some
camlp4 magic to a single instance.

-- 
Chris Stork   <>  Support eff.org!  <>   http://www.ics.uci.edu/~cstork/
OpenPGP fingerprint:  B08B 602C C806 C492 D069  021E 41F3 8C8D 50F9 CA2F


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-07  1:00 ` skaller
@ 2007-06-07 14:26   ` Till Varoquaux
  2007-06-07 23:09     ` skaller
  0 siblings, 1 reply; 9+ messages in thread
From: Till Varoquaux @ 2007-06-07 14:26 UTC (permalink / raw)
  To: skaller; +Cc: David Teller, OCaml

> This is indeed a nasty problem I have too. In theory, polymorphic
> variants with open recursion support what you want by making a new
> parametric term with a new case
>
>         `TYPE_ast of typ * 'ast
>
> and then closing the recursion. This is a covariant extension.
>
> The problem is that whilst this is relatively easy to encode
> for 1 parameter, it gets messy with 2 .. and if you have
> 15 or so as I do as well .. it's just easier to use a dummy
> type or magic .. neither of which is satisfactory.

Well... this is what type constraints make this slightly less messy:

module Gram =
  struct
    type 'cst binop = Add | Sub | Mul | Div
    type 'cst constant = Int of int | Float of float
    type 'cst expr =
      Cst of 'constant
      | Ident of 'ident
      | Call of ('expr * ('expr list))
      | Binop of ('binop * 'expr * 'expr)
      | Fundecl of (('ident list) * 'instr)
      constraint 'cst =
        < instr : 'instr; ident : 'ident; expr : 'expr; constant : 'constant;
          binop : 'binop; ..
        >
    type 'cst ident = string
    type 'cst instr =
      Let of ((('ident * 'expr) list) * 'instr)
      | Exp of 'expr
      | Assign of ('ident * 'expr)
      | If of ('expr * 'instr * 'instr)
      | Bloc of 'instr list
      constraint 'cst = < instr : 'instr; ident : 'ident; expr : 'expr; .. >
  end
type cst =
  < instr : instr; ident : ident; expr : expr; constant : constant;
    binop : binop
  >
  and binop =
  cst Gram.binop
  and constant =
  cst Gram.constant
  and expr =
  cst Gram.expr
  and ident =
  cst Gram.ident
  and instr =
  cst Gram.instr

is an example. It still gets very boring and administartive so you
might want to harness Camlp4 here [1]. This file was actually
generated from a source file like this:

gram{
 ident := string;
 constant:= Int int | Float float;
 expr :=
|Cst constant
| Ident ident
| Call expr,[expr]
| Binop binop,expr,expr
| Fundecl [ident],instr;
 binop := Add | Sub | Mul | Div ;
 instr :=
| Let [(ident,expr)],instr
| Exp expr
| Assign ident,expr
| If expr,instr,instr
| Bloc [instr]
}

In this case you do not need polymorphic variant since you could
always generate types like:

type cst =
  < instr : instr; ident : ident; expr : expr; constant : constant;
    binop : binop
  >
  and binop = cst Gram.binop
  and constant = cst Gram.constant
  and expr = edec*cst Gram.expr
  and ident = cst Gram.ident
  and instr = idec*cst Gram.instr

where `idec` and `edec` are the decorations you want to add on the
nodes for the instructions and the expressions respectively.

>
> Nor is the above encoding very nice, since it doesn't ensure
> each branch of the tree is labelled, instead it simply caches
> values where you chose, to speed up the functional calculation.

> You CAN solve this with a list or array mapping the physical
> term address to the type, with O(N) performance (YUK).
> You can't use a Map or Hashtbl because they require ordering,
> and Ocaml moves objects around so ordering on addresses isn't stable.

Hashtbl don't require ordering they require hashing. Anyways I'm
pretty sure both the functions `Pervasives.compare` and `Hashtbl.hash`
actually work on the representation of the data, not on its physical
address (that's why compare can fail on recursive values) . You can
use both Hashtables and Maps but you might aim for a weak data
structure to limit memory leaks.... BTW: for some akward reason
Weak.Hashtbl is not a hashtable.

>
> Double indirection works though: instead of terms, use an integer
> which Maps to a term .. you can then also Map the integer to
> your type. Of course .. this isn't statically safe.

Huh????

Till Varoquaux
[1] This camlp4 extension is in pre-pre-pre-alpha state. It can be
browsed online:
http://till.varoquaux.free.fr/projects/mini_js/OpenTrees.ml.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-07 14:26   ` Till Varoquaux
@ 2007-06-07 23:09     ` skaller
  2007-06-08  9:52       ` Till Varoquaux
  0 siblings, 1 reply; 9+ messages in thread
From: skaller @ 2007-06-07 23:09 UTC (permalink / raw)
  To: Till Varoquaux; +Cc: David Teller, OCaml

On Thu, 2007-06-07 at 16:26 +0200, Till Varoquaux wrote:

> Hashtbl don't require ordering they require hashing.

True, but the hash required would be unstable.

>  Anyways I'm
> pretty sure both the functions `Pervasives.compare` and `Hashtbl.hash`
> actually work on the representation of the data,

Yes, which is why you can't use them. The idea is:

	map1: address1 -> AST
	map2: address1 -> decoration

where address1 is the address of the AST term. map1 is just the
function:

	let id x = x

map2 can be an association list, searching by address.

Hashing or comparing using the value of a term as a key
is no good. It's too slow.

> > Double indirection works though: instead of terms, use an integer
> > which Maps to a term .. you can then also Map the integer to
> > your type. Of course .. this isn't statically safe.
> 
> Huh????

The Felix compiler maps like:

	int -> term
	int -> decoration

This isn't statically safe. There's no assurance of a 1-1 correspondence
between terms and decorations.

The encoding is safe .. the semantics aren't. I use Hashtbl for this
and I get 'Not_found' exception regularly. Correctness here depends
on control flow -- dynamically maintaining the correspondence.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-06 19:43 Labelling trees David Teller
                   ` (2 preceding siblings ...)
  2007-06-07 14:25 ` Christian Stork
@ 2007-06-07 23:48 ` Jeremy Yallop
  3 siblings, 0 replies; 9+ messages in thread
From: Jeremy Yallop @ 2007-06-07 23:48 UTC (permalink / raw)
  To: caml-list

David Teller wrote:
 > I'm currently in the very first stages of writing down a prototype
 > implementation of a type checker. At the moment, I'm looking for "the
 > nicest" way of labelling my abstract syntax tree with type annotations
 > -- without corrupting the AST at all, if possible.

My favourite encoding of annotated ASTs uses recursive modules and
polymorphic variants.  Instead of supplying type parameters for the
annotations and open recursion you can tie the knot using a recursive
module and some `with'-constraints on the signature.  Closing the
recursion twice (with different constraints) gives you annotated and
unannotated trees.

Here's an example from an implementation of the core of SML.  There
are 7 mutually-recursive types and three annotation types.  I hope
it's clear enough how it'd scale up to a more complicated data
structure.  Also, there are some type definitions missing, but you
should be able to fill them in easily enough.

First, a module type definition, listing all of the types involved.
This will be used in lieu of the type parameters that you'd normally
use for encoding open recursion.

    module type ExpSig =
    sig
      type atexp
      type exprow
      type exp
      type valdec
      type match_
      type mrule
      type valbind
      type pat
    end

Now the actual datatype definitions.  Notice that we use `T.t' instead
of 't' at each recursive point.  The type `pat' is a parameter
although it's not defined here because we'll want to use annotated and
unannotated patterns inside annotated and unannotated expressions.
The module component (T) here is analagous to a functor argument of a
module.

    module type Exp =
    sig
      module T : ExpSig

      type atexp = [
        `scon     of scon
      | `vid      of vid
      | `record   of T.exprow option
      | `let_     of T.valdec * T.exp
      | `paren    of T.exp ]
      and exp = [
        `atexp    of T.atexp
      | `apply    of T.exp * T.atexp
      | `typed    of T.exp * ty
      | `fn       of T.match_ ]
      and mrule = T.pat * T.exp
      and valdec = [
        `val_     of tyvarseq * T.valbind
      | `local    of T.valdec * T.valdec
      | `empty
      | `valdecs  of T.valdec * T.valdec ]
      and valbind = [
        `bindings of (T.pat * T.exp * T.valbind option)
      | `rec_     of T.valbind ]
      and match_ = [`match_ of T.mrule * T.match_ option]
      and exprow = [`exprow of (label * T.exp * T.exprow option)]
      type pat = T.pat
    end

Finally, we tie the knot, first directly (to obtain unannotated
trees):

    module rec UntypedExp : ExpSig
      with type atexp   = UntypedExp'.atexp
       and type exprow  = UntypedExp'.exprow
       and type exp     = UntypedExp'.exp
       and type match_  = UntypedExp'.match_
       and type mrule   = UntypedExp'.mrule
       and type valdec  = UntypedExp'.valdec
       and type valbind = UntypedExp'.valbind
       and type pat     = UntypedPat.pat
    = UntypedExp
    and UntypedExp' : Exp with module T = UntypedExp = UntypedExp'

and then adding annotations for each node type

    module rec TypedExp : ExpSig
      with type atexp   = TypedExp'.atexp   * type_
       and type exprow  = TypedExp'.exprow  * rowtype
       and type exp     = TypedExp'.exp     * type_
       and type valdec  = TypedExp'.valdec  * valenv
       and type match_  = TypedExp'.match_  * type_
       and type mrule   = TypedExp'.mrule   * type_
       and type valbind = TypedExp'.valbind * valenv
       and type pat     = TypedPat.pat
    = TypedExp
    and TypedExp' : Exp with module T = TypedExp = TypedExp'

Using modules to collect the definitions together has various other
advantages; for example, it's easy(ish) to write an evaluator that
works for both annotated and unannotated expressions using functors.

Hope this helps,

Jeremy.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-07 23:09     ` skaller
@ 2007-06-08  9:52       ` Till Varoquaux
  2007-06-08 10:32         ` skaller
  0 siblings, 1 reply; 9+ messages in thread
From: Till Varoquaux @ 2007-06-08  9:52 UTC (permalink / raw)
  To: skaller; +Cc: David Teller, OCaml

On 6/8/07, skaller <skaller@users.sourceforge.net> wrote:
> On Thu, 2007-06-07 at 16:26 +0200, Till Varoquaux wrote:

> True, but the hash required would be unstable.
>
> Hashing or comparing using the value of a term as a key
> is no good. It's too slow.

Hashing doesn't read the whole structure, it can be parametrized to
read less data. Do you really see a performance boost using addresses
instead?

Regards,
Till

-- 
http://till-varoquaux.blogspot.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Labelling trees
  2007-06-08  9:52       ` Till Varoquaux
@ 2007-06-08 10:32         ` skaller
  0 siblings, 0 replies; 9+ messages in thread
From: skaller @ 2007-06-08 10:32 UTC (permalink / raw)
  To: Till Varoquaux; +Cc: David Teller, OCaml

On Fri, 2007-06-08 at 11:52 +0200, Till Varoquaux wrote:
> On 6/8/07, skaller <skaller@users.sourceforge.net> wrote:
> > On Thu, 2007-06-07 at 16:26 +0200, Till Varoquaux wrote:
> 
> > True, but the hash required would be unstable.
> >
> > Hashing or comparing using the value of a term as a key
> > is no good. It's too slow.
> 
> Hashing doesn't read the whole structure, it can be parametrized to
> read less data. Do you really see a performance boost using addresses
> instead?

I have no idea because you can't.. what I did see when i tried
map terms to values (using hashtable) .. was a slowdown ;(

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-06-08 10:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-06 19:43 Labelling trees David Teller
2007-06-06 21:22 ` [Caml-list] " Christophe Raffalli
2007-06-07  1:00 ` skaller
2007-06-07 14:26   ` Till Varoquaux
2007-06-07 23:09     ` skaller
2007-06-08  9:52       ` Till Varoquaux
2007-06-08 10:32         ` skaller
2007-06-07 14:25 ` Christian Stork
2007-06-07 23:48 ` Jeremy Yallop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).