caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: What exactly can be GC'ed?
@ 1998-01-06 13:28 Damien Doligez
  0 siblings, 0 replies; 4+ messages in thread
From: Damien Doligez @ 1998-01-06 13:28 UTC (permalink / raw)
  To: caml-list; +Cc: ctchou


This messages gives some details on the current implementation of
O'Caml.  Most users do not need to know such details and can skip this
message.


I wrote:
>   Caml is a language with static binding.  The new definition of x does
>   not replace the old one in any way, except in the table that maps
>   names to slots in the table of globals.  When you define the second x,
>   the first one only loses its name.  It still exists as a global
>   variable.

>From: Ching-Tsun Chou <ctchou@mipos2.intel.com>

>But an inaccessible global variable!  Indeed, since the new definition
>replaces the old one "in the table that maps names to slots in the
>table of globals", why is it difficult to figure out that the value
>object pointed to by the old definition is no longer reachable?

Each global binding is given a number by the compiler, its global
index.

There are two tables.  One (the global environment) maps variable
names to indices, the other (the table of globals) maps indices to
values (the value of the corresponding binding).

The global environment is a data structure of the compiler and
linker.  It doesn't exist in stand-alone programs.  It does exist in
the toplevel system because the compiler and linker are included in
the toplevel.

The table of globals exists in all programs.  When some piece of code
(e.g. the body of a function) refers to a global variable, this
reference is compiled into the variable's index.  At run time, the
index is used to get the value through the table of globals.

This means that references to global values (which are, conceptually,
pointers) can be embedded in the code, even though the GC doesn't (and
cannot) scan the code for pointers.  The drawback of this technique is
of course that the GC cannot deallocate the global values that are no
longer used (the whole table of globals is a root object, i.e. it is
never deallocated).

In most stand-alone programs, this is not a problem because all
globals are functions and global variables with the same lifetime as
the whole program.


>   Assume that you insert "let f () = x;;" anywhere between the two
>   bindings of x.  Then the old value is still reachable through f.
>
>But not through x.  In fact, suppose that the definition of f is:
>
>        let f () = String.sub x 0 1 ;;
>
>The old value of x, "aaa", should still be garbage-collectible after
>the re-definition of x.

In this case, no, because the old value of x could be changed by
another function, so that f() would give a result different from "a".

What happens here, is that "x" is given an global index, for example
42, and f is compiled to a piece of code that applies global 314
(String.sub) to (the value of global 42), 0, 1.


>Hence it is crucial that O'Caml does GC aggressively so that pointers
>from the O'Caml heap to those objects can be removed whenever possible
>to enable GC on those objects.  Many scientific applications of O'Caml
>can benefit from a more aggressive GC.

Memory leaks (pointers that live beyond their useful lifespan and
prevent the GC from collecting some objects) are a well-known
difficult problem for compiler implementors.  Of course we do our
best, but it is almost impossible to guarantee the absence of memory
leaks in the stack, independently of the problem with global
variables.


>There is another point that I'd like to make.  In O'Caml, GC *already*
>works as one (well, I anyway) would expect in most cases.
[examples omitted]

Yes, that's how it's supposed to work, and that's how it works in most
cases.  Except for global variables.


>I really don't see why it would be hard to "fix" it for top-level
>variables.

It could be done, for example by putting the referenced global into
the closures of the referencing functions.  However, this would add
some complexity to the compiler and linker, and it may add some
run-time overhead.  And, once again, this is not a big problem in
practice.

-- Damien







^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What exactly can be GC'ed?
@ 1998-01-06  0:03 Ching-Tsun Chou
  0 siblings, 0 replies; 4+ messages in thread
From: Ching-Tsun Chou @ 1998-01-06  0:03 UTC (permalink / raw)
  To: caml-list; +Cc: Damien.Doligez


Damien Doligez <Damien.Doligez@inria.fr> wrote:

   ># let x = String.make 3 'a' ;;
   >val x : string = "aaa"
   ># let w = Weak.create 1 ;;
   >val w : '_a Weak.t = <abstr>
   ># Weak.set w 0 (Some (x)) ;;
   >- : unit = ()
   ># Weak.get w 0 ;;
   >- : string option = Some "aaa"
   ># let x = "bbb" ;;
   >val x : string = "bbb"
   ># Gc.full_major () ;;
   >- : unit = ()
   ># Weak.get w 0 ;;
   >- : string option = Some "aaa"

   >It seems to me that since x has been re-bound to "bbb", its old value
   >"aaa" is no longer reachable and hence can be GC'ed.  But clearly I
   >was wrong.  Where did I go wrong?  What exactly can be GC'ed?

   Caml is a language with static binding.  The new definition of x does
   not replace the old one in any way, except in the table that maps
   names to slots in the table of globals.  When you define the second x,
   the first one only loses its name.  It still exists as a global
   variable.

But an inaccessible global variable!  Indeed, since the new definition
replaces the old one "in the table that maps names to slots in the
table of globals", why is it difficult to figure out that the value
object pointed to by the old definition is no longer reachable?

   Assume that you insert "let f () = x;;" anywhere between the two
   bindings of x.  Then the old value is still reachable through f.

But not through x.  In fact, suppose that the definition of f is:

        let f () = String.sub x 0 1 ;;

The old value of x, "aaa", should still be garbage-collectible after
the re-definition of x.

It seems to me that the fact that O'Caml is a language with static
binding (and eager evaluation), if anything, should enable more
aggressive GC, not less.  This is important for the following reason.
O'Caml is ideally suitable as a "meta language" (indeed, what does ML
stand for?) for manipulating objects which themselves are not
necessarily implemented in O'Caml.  These objects can be very large.
Hence it is crucial that O'Caml does GC aggressively so that pointers
from the O'Caml heap to those objects can be removed whenever possible
to enable GC on those objects.  Many scientific applications of O'Caml
can benefit from a more aggressive GC.

There is another point that I'd like to make.  In O'Caml, GC *already*
works as one (well, I anyway) would expect in most cases.  Consider
the following two O'Caml sessions:

----------------------------------------------------------------------------
# let w = Weak.create 1 ;;
val w : '_a Weak.t = <abstr>
# Weak.set w 0 (Some (String.make 3 'a')) ;;
- : unit = ()
# Weak.get w 0 ;;
- : string option = Some "aaa"
# Gc.full_major () ;;
- : unit = ()
# Weak.get w 0 ;;
- : string option = None
----------------------------------------------------------------------------

------------------------------------------------------------------------
# let dummy () =
    let x = String.make 3 'a' in
    let w = Weak.create 1 in
    let _ = Weak.set w 0 (Some (x)) in
    w ;;
val dummy : unit -> string Weak.t = <fun>
# let w = dummy () ;;
val w : string Weak.t = <abstr>
# Weak.get w 0 ;;
- : string option = Some "aaa"
# Gc.full_major () ;;
- : unit = ()
# Weak.get w 0 ;;
- : string option = None
------------------------------------------------------------------------

I really don't see why it would be hard to "fix" it for top-level
variables.

Thanks!
- Ching Tsun

========================================================================
  Ching-Tsun Chou                       E-mail: ctchou@mipos2.intel.com 
  Intel Corporation, RN6-16             Tel: (408) 765-5468             
  2200 Mission College Blvd.            Fax: (408) 653-7933             
  Santa Clara, CA 95052, U.S.A.         Sec: (408) 653-8849             
========================================================================






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re:  What exactly can be GC'ed?
@ 1997-12-27 13:37 Damien Doligez
  0 siblings, 0 replies; 4+ messages in thread
From: Damien Doligez @ 1997-12-27 13:37 UTC (permalink / raw)
  To: caml-list

>From: Ching-Tsun Chou <ctchou@mipos2.intel.com>

># let x = String.make 3 'a' ;;
>val x : string = "aaa"
># let w = Weak.create 1 ;;
>val w : '_a Weak.t = <abstr>
># Weak.set w 0 (Some (x)) ;;
>- : unit = ()
># Weak.get w 0 ;;
>- : string option = Some "aaa"
># let x = "bbb" ;;
>val x : string = "bbb"
># Gc.full_major () ;;
>- : unit = ()
># Weak.get w 0 ;;
>- : string option = Some "aaa"

>It seems to me that since x has been re-bound to "bbb", its old value
>"aaa" is no longer reachable and hence can be GC'ed.  But clearly I
>was wrong.  Where did I go wrong?  What exactly can be GC'ed?

Caml is a language with static binding.  The new definition of x does
not replace the old one in any way, except in the table that maps
names to slots in the table of globals.  When you define the second x,
the first one only loses its name.  It still exists as a global
variable.

Assume that you insert "let f () = x;;" anywhere between the two
bindings of x.  Then the old value is still reachable through f.

The compiler does not recycle globals table entries because it would
add useless complexity to the system (to determine whether an entry is
still in use or not), and it only makes sense under the toplevel
system, which is mostly intended for debugging anyway.

-- Damien





^ permalink raw reply	[flat|nested] 4+ messages in thread

* What exactly can be GC'ed?
@ 1997-12-23 18:57 Ching-Tsun Chou
  0 siblings, 0 replies; 4+ messages in thread
From: Ching-Tsun Chou @ 1997-12-23 18:57 UTC (permalink / raw)
  To: caml-list; +Cc: Pierre.Weis, doligez


I sent a message to caml-list asking what weak pointers are.  Pierre
Weis kindly pointed out that they had been extensively discussed in
caml-list.  I read those messages and realized that my question is
really about garbage collection (GC):  What exactly can be GC'ed?

Consider the following O'Caml session:

------------------------------------------------------------------------
# let x = ref (String.make 3 'a') ;;
val x : string ref = {contents="aaa"}
# let w = Weak.create 1 ;;
val w : '_a Weak.t = <abstr>
# Weak.set w 0 (Some (!x));;
- : unit = ()
# Weak.get w 0 ;;
- : string option = Some "aaa"
# x := "bbb" ;;
- : unit = ()
# Gc.full_major () ;;
- : unit = ()
# Weak.get w 0 ;;
- : string option = None
------------------------------------------------------------------------

As expected, the string "aaa" is GC'ed.

Now, consider the following (fresh) O'Caml session:

------------------------------------------------------------------------
# let x = String.make 3 'a' ;;
val x : string = "aaa"
# let w = Weak.create 1 ;;
val w : '_a Weak.t = <abstr>
# Weak.set w 0 (Some (x)) ;;
- : unit = ()
# Weak.get w 0 ;;
- : string option = Some "aaa"
# let x = "bbb" ;;
val x : string = "bbb"
# Gc.full_major () ;;
- : unit = ()
# Weak.get w 0 ;;
- : string option = Some "aaa"
------------------------------------------------------------------------

It seems to me that since x has been re-bound to "bbb", its old value
"aaa" is no longer reachable and hence can be GC'ed.  But clearly I
was wrong.  Where did I go wrong?  What exactly can be GC'ed?

I should mention that I'm using O'Caml v.1.05.

- Ching Tsun

========================================================================
  Ching-Tsun Chou                       E-mail: ctchou@mipos2.intel.com 
  Intel Corporation, RN6-16             Tel: (408) 765-5468             
  2200 Mission College Blvd.            Fax: (408) 653-7933             
  Santa Clara, CA 95052, U.S.A.         Sec: (408) 653-8849             
========================================================================






^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1998-01-06 18:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-01-06 13:28 What exactly can be GC'ed? Damien Doligez
  -- strict thread matches above, loose matches on Subject: below --
1998-01-06  0:03 Ching-Tsun Chou
1997-12-27 13:37 Damien Doligez
1997-12-23 18:57 Ching-Tsun Chou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).