Re: [Caml-list] Cyclic data structures: internal representation

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Gabriel Scherer <gabriel.scherer@gmail.com>
To: Jean-Baptiste Jeannin <jeannin@cs.cornell.edu>
Cc: "caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] Cyclic data structures: internal representation
Date: Mon, 12 Nov 2012 10:43:13 +0100	[thread overview]
Message-ID: <CAPFanBHH=f9nQEbWZTcvfrfDPUXXujF2KubvcTWpyjXAzESFrw@mail.gmail.com> (raw)
In-Reply-To: <A7FC6DE86AF33142A1FA4CE04EEBF4F18596A9DD62@gobo.cs.cornell.edu>

> I am wondering what the internal representation (in the AST) of such a data structure is.
> It could be a cyclic data structure on the AST itself, but I doubt it because it would make
> it difficult to manipulate. Another solution I see is keeping a data structure with free variables
> inside (or references, for that matter) and binding these free variables in some other place.
> l0 would then bounded directly to  [1 ; 2 ; l0 ] in the AST. Any idea what the internals of OCaml do?

During compilation, the syntactic AST explicitly represent the
recursion through the "let rec", with something to the effect of
(simplified form) Let(Recursive, "l0", Cons(1, Cons(2, Var "l0")).
At runtime this is represented as any OCaml list, but with a cyclic
reference. You have a memory organization that looks a bit like:

cell1: Block(variant_tag, int 1, addr cell2)
cell2: Block(variant_tag, int 2, addr cell1)

This is implemented by first allocation a dummy node for l0, then
evaluating the "real definition" of l0 (with recursive references
pointing to the dummy node), and backpatching the dummy node with the
result of the evaluation. Again in simplified form, this is translated
to:
  let l0 = dummy
  update l0 (1 :: 2 :: dummy)

 See for yourself:
  $ echo "let rec l0 = 1 :: 2 :: l0" > test.ml
  $ ocamlopt -dcmm -c test.ml
    (data int 1024 global "camlTest" "camlTest": skip 8)
    (function camlTest__entry ()
     (let l0/1008 (extcall "caml_alloc_dummy" 5 addr)
       (extcall "caml_update_dummy" l0/1008 (alloc 2048 3 (alloc 2048
5 l0/1008))
         unit)
       (store "camlTest" l0/1008))
     1a)

This is correct because, while "l0" may appear recursively in its
definition, its evaluation is never "forced": it appears only in
places that return its adress directly without accessing its
(yet-undefined) value.
  let rec l0 = List.length l0 :: l0
would be incorrect and, in the general case of a function application,
"let rec l0 = f l0" is rejected because the compiler cannot in general
inspect "f" to check that it never "forces" its argument.

The full definition of which recursive value definitions are accepted
(as an under-approximation of the set of definitions that are actually valid)
is described by the manual as an "extension of the OCaml language": the
core language supports only recursively-defined functions, which have
a well-defined semantics (you know the identifiers are not forced, as
a function definition of the form (fun x -> ...) performs no direct
evaluation).
  http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual021.html#toc70

On Mon, Nov 12, 2012 at 7:44 AM, Jean-Baptiste Jeannin
<jeannin@cs.cornell.edu> wrote:
> - is there any easy way to check if a list is cyclic or not? The only way I see is to go down the list, checking at every step if this particular sublist has already been seen. But it's rather heavy.

You have to use some cycle detection algorithm. Some of them are
rather funny (the Tortoise and Hare algorithm) and they are described
on Wikipedia:
  http://en.wikipedia.org/wiki/Cycle_detection

However, I think there is something wrong with your application if you
need such a test. Manipulating such cyclic values is dangerous
(because of the non-termination problems you mention in your question,
that are not limited to equality check: you will not like the result
of List.map (fun n -> n) l0), and I would rather advise you to have
value cyclicity appears explicitly in your program, if it needs to.
That is, define a type of "wrapped lists" that represent cyclic lists
by having just the cyclic part as data, and re-defining your
manipulating functions to do the wrapping/cycling explicitly on this
representation. It's better when you are forced to be aware of where
the loops are.

> - the documentation on the = sign (http://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasives.html) mentions that "Equality between cyclic data structures may not terminate." It seems to terminate if one or the other of the data structures is not cyclic. Does it ever terminate when both data sstructures are cyclic, or does it always loop?

I don't think that is specified, but I would expect it to always loop
(implementing a performance-hurting cycle detection for the few mad
scientists to play with infinite lists would be unreasonable). If you
have any form of implicit cyclicity in your data, you must avoid the
default comparison and hashing functions. One more reason to handle
cyclicity explicitly.

The Lisp world, where cons cells have long been mutable by default,
has played with the fire of delicately handling implicit cyclicity in
values. Matías Giovannini has some fairly interesting examples of
doing this in a dialect of OCaml extended with unsafe features (namely
the Obj module). I think it's a bad idea to actually use these games
in a real program, but it's still a good read.
  http://alaska-kamtchatka.blogspot.fr/2007/10/in-place-reversal-of-trees.html
  http://alaska-kamtchatka.blogspot.fr/2007/11/unsafe-clasp.html

On Mon, Nov 12, 2012 at 7:44 AM, Jean-Baptiste Jeannin
<jeannin@cs.cornell.edu> wrote:
> Good morning,
>
> One can create cyclic data structures in OCaml, using the keyword let rec. For example:
>
>   # let rec l0 = 1 :: 2 :: l0;;
>   var l0 : int list =
>     [1; 2; 1; 2; 1; 2; 1; 2; 1; 2; 1; 2; 1; 2; 1; 2; 1; 2; ...]
>
> I am wondering what the internal representation (in the AST) of such a data structure is. It could be a cyclic data structure on the AST itself, but I doubt it because it would make it difficult to manipulate. Another solution I see is keeping a data structure with free variables inside (or references, for that matter) and binding these free variables in some other place. l0 would then bounded directly to  [1 ; 2 ; l0 ] in the AST. Any idea what the internals of OCaml do?
>
> Related questions:
> - is there any easy way to check if a list is cyclic or not? The only way I see is to go down the list, checking at every step if this particular sublist has already been seen. But it's rather heavy.
> - the documentation on the = sign (http://caml.inria.fr/pub/docs/manual-ocaml/libref/Pervasives.html) mentions that "Equality between cyclic data structures may not terminate." It seems to terminate if one or the other of the data structures is not cyclic. Does it ever terminate when both data sstructures are cyclic, or does it always loop?
>   # let l1 = [1; 2] in l0 = l1;;
>   - : bool = false
>   # l0 = l0;;
>   (* doesn't finish *)
>
> Thank you,
> Jean-Baptiste Jeannin
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

next prev parent reply	other threads:[~2012-11-12  9:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-12  6:44 Jean-Baptiste Jeannin
2012-11-12  8:55 ` Basile Starynkevitch
2012-11-12  9:43 ` Gabriel Scherer [this message]
2012-11-12 11:24 ` Dmitry Grebeniuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPFanBHH=f9nQEbWZTcvfrfDPUXXujF2KubvcTWpyjXAzESFrw@mail.gmail.com' \
    --to=gabriel.scherer@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=jeannin@cs.cornell.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).