caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Weird GC behaviour
@ 2011-09-27 17:30 Thomas Fischbacher
  2011-09-28 11:19 ` Damien Doligez
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Fischbacher @ 2011-09-27 17:30 UTC (permalink / raw)
  To: OCaML Mailing List


Dear Camels,

The example below shows GC behaviour which in my view is somewhat
weird: I am building two very similar sorts of hash tables that map
variant types to strings. The parameter "works" (which gets passed
through from the "demo" to the "setup" function) allows one to choose,
when calling the script, whether it behaves properly or shows weird
behaviour.

For Rtag, inlining get_rtag does not make a difference, but for Ftag,
inlining the get_ftag call breaks things both with the interpreter and
compiler, but in different ways.

So... my question is: is there a documented reliable way to avoid this
sort of funny behaviour?!?

=================================
type funny_tag = Ftag of string;;

type funny_ref = Rtag of string ref;;

let get_ftag s = Ftag s;;

let get_rtag s = Rtag (ref s);;

let fin_str ht k =
   let Ftag s = k in
   let () = Printf.printf "Removing str '%s'\n%!" s in
   Hashtbl.remove ht k
;;

let fin_ref ht k =
   let Rtag rs = k in
   let () = Printf.printf "Removing ref '%s'\n%!" !rs in
   Hashtbl.remove ht k
;;


let setup works =
   let ht_str = Hashtbl.create 17 in
   let ht_ref = Hashtbl.create 17 in
   let (funny_str, funny_ref) =
   if works
     then (get_ftag "funny",get_rtag "funny")
     else (Ftag "funny", Rtag (ref "funny"))
   in
   (* Note that they keys used below are equal but not identical to the 
entities above *)
   let () = Hashtbl.add ht_str (Ftag "funny") "situation" in
   let () = Hashtbl.add ht_ref (Rtag (ref "funny")) "situation" in
   let () = Gc.finalise (fin_str ht_str) funny_str in
   let () = Gc.finalise (fin_ref ht_ref) funny_ref in
   ((ht_str,ht_ref),(funny_str,funny_ref))
;;

let left (p,q) = p;;

let demo works =
   let (ht_str,ht_ref) = left (setup works) in
   (* Note that I drop the references on the keys that have finalizers 
registered. *)
   let () = Gc.full_major () in
   let () = Gc.compact () in
   Printf.printf "ht_str entries: %d ht_ref entries: %d\n%!" 
(Hashtbl.length ht_str) (Hashtbl.length ht_ref)
;;

let () = demo (Array.length Sys.argv > 1 && Sys.argv.(1) = "magic");;


(* === Behaviour ===

$ ocaml gc_bug.ml
Removing ref 'funny'
ht_str entries: 1 ht_ref entries: 0

$ ocaml gc_bug.ml magic
Removing ref 'funny'
Removing str 'funny'
ht_str entries: 0 ht_ref entries: 0

$ ocamlopt -o gc_bug gc_bug.ml

$ ./gc_bug
Fatal error: exception Invalid_argument("Gc.finalise")

$ ./gc_bug magic
Removing ref 'funny'
Removing str 'funny'
ht_str entries: 0 ht_ref entries: 0

$ ocaml
         Objective Caml version 3.12.0

#

$ uname -prs
FreeBSD 9.0-BETA3 amd64

============ *)

=================================

-- 
best regards,
Thomas Fischbacher
t.fischbacher@soton.ac.uk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Weird GC behaviour
  2011-09-27 17:30 [Caml-list] Weird GC behaviour Thomas Fischbacher
@ 2011-09-28 11:19 ` Damien Doligez
  2011-09-28 11:53   ` Thomas Fischbacher
  0 siblings, 1 reply; 7+ messages in thread
From: Damien Doligez @ 2011-09-28 11:19 UTC (permalink / raw)
  To: Thomas Fischbacher; +Cc: OCaML Mailing List

Hi Thomas,

On 2011-09-27, at 19:30, Thomas Fischbacher wrote:

> For Rtag, inlining get_rtag does not make a difference, but for Ftag,
> inlining the get_ftag call breaks things both with the interpreter and
> compiler, but in different ways.

That's the key.

>  if works
>    then (get_ftag "funny",get_rtag "funny")
>    else (Ftag "funny", Rtag (ref "funny"))
>  in

In this expression, you have four interesting subexpressions.  Three of
them involve calling some function with "funny" as argument, and these
functions allocate values that point to the "funny" string.  These values
are heap-allocated and behave as you expect.

The fourth one is (Ftag "funny").  This is a constant expression and its
evaluation does not allocate in the heap.  Since it's constant, it's
allocated once by the compiler, and every time you execute this code
it's the same value that is returned.

In byte-code it's still allocated in the heap, but at load-time, and
a reference to it is hidden in the closure of the enclosing function.
Hence, it never gets deallocated:

> $ ocaml gc_bug.ml
> Removing ref 'funny'
> ht_str entries: 1 ht_ref entries: 0

You don't get the "Removing str 'funny'" message because the value
stays alive throughout program execution.

In native code, it's even more optimized: the value is "allocated" at
compile time as constant data outside the heap.  Hence, you can't even
register a finalizer for it:

> $ ocamlopt -o gc_bug gc_bug.ml
> 
> $ ./gc_bug
> Fatal error: exception Invalid_argument("Gc.finalise")


Only heap data can be finalized.

So you have to be aware of the difference between (get_ftag "funny") and
(Ftag "funny") because they don't have exactly the same semantics: a call
to get_ftag is guaranteed to allocate a new block each time.

-- Damien


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Weird GC behaviour
  2011-09-28 11:19 ` Damien Doligez
@ 2011-09-28 11:53   ` Thomas Fischbacher
  2011-09-28 12:12     ` Gabriel Scherer
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Thomas Fischbacher @ 2011-09-28 11:53 UTC (permalink / raw)
  To: Damien Doligez; +Cc: OCaML Mailing List


Dear Damien,

>>  if works
>>    then (get_ftag "funny",get_rtag "funny")
>>    else (Ftag "funny", Rtag (ref "funny"))
>>  in
> 
> In this expression, you have four interesting subexpressions.  Three of
> them involve calling some function with "funny" as argument, and these
> functions allocate values that point to the "funny" string.  These values
> are heap-allocated and behave as you expect.
> 
> The fourth one is (Ftag "funny").  This is a constant expression and its
> evaluation does not allocate in the heap.  Since it's constant, it's
> allocated once by the compiler, and every time you execute this code
> it's the same value that is returned.

How come (Ftag "funny") is regarded as constant while
(Rtag (ref "funny")) is not? After all, strings are mutable in OCaml,
so there really is not that much of a conceptual difference between a
string and a string ref in that respect:

http://caml.inria.fr/pub/docs/manual-ocaml/manual010.html#toc44

# type funny_str = Ftag of string;;
type funny_str = Ftag of string
# let s1 = Ftag "Hello";;
val s1 : funny_str = Ftag "Hello"
# let Ftag s = s1 in s.[0]<-'h';;
- : unit = ()
# s1;;
- : funny_str = Ftag "hello"

The way I read the spec, it nowhere says that variant values that use a
non-constant constructor plus a value can be treated as constant. I do
see that in a sense, this may be a similar issue as the one that would
arise with lisp code such as this:

(defun example ()
   (let ((text-segment-list '(1 2 3 4 5)))
     (nreverse text-segment-list)))

Calling (example) twice gives weird behaviour, as we are destructively
modifying a lisp that conceptually was constant. So, one should have
used:

(defun example2 ()
   (let ((text-segment-list (list 1 2 3 4 5)))
     (nreverse text-segment-list)))

But the problem I think I have with OCaml is: there just seems to be no
way to properly express the conceptual difference between '(1 2 3 4 5)
and (list 1 2 3 4 5): All I can say above is: Ftag "Hello".

-- 
best regards,
Thomas Fischbacher
t.fischbacher@soton.ac.uk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Weird GC behaviour
  2011-09-28 11:53   ` Thomas Fischbacher
@ 2011-09-28 12:12     ` Gabriel Scherer
  2011-09-28 13:07     ` John Carr
  2011-09-28 15:30     ` Damien Doligez
  2 siblings, 0 replies; 7+ messages in thread
From: Gabriel Scherer @ 2011-09-28 12:12 UTC (permalink / raw)
  To: Thomas Fischbacher; +Cc: OCaML Mailing List

On Wed, Sep 28, 2011 at 1:53 PM, Thomas Fischbacher
<t.fischbacher@soton.ac.uk> wrote:
> How come (Ftag "funny") is regarded as constant while
> (Rtag (ref "funny")) is not? After all, strings are mutable in OCaml,
> so there really is not that much of a conceptual difference between a
> string and a string ref in that respect:

Of course references are not constant, and calls to `ref` allocate
each time they're evaluated. Doing otherwise would completely break
the reference semantics.

Your remark that strings are also mutable is pertinent : indeed, in
this case, the semantics of mutable strings is broken and lead to
weird things. The canonical example -- very close to your lisp example
-- is:

  # let should_be_fresh () = "toto";;
  val should_be_fresh : unit -> string = <fun>
  # let t1 = should_be_fresh ();;
  val t1 : string = "toto"
  # t1.[1] <- 'u';;
  - : unit = ()
  # should_be_fresh();;
  - : string = "tuto"

The reason why string are still optimized as if they were immutable is
that they are generally not mutated in practice, and that this
constant optimization may save quite a lot of memory. I believe there
is a consensus that making strings mutable is an historical mistake
that should not be repeated, where you to design a programming
language in the present times.

If you want to force a string to be freshly allocated, an easy way is
to concatenate the string literal with the empty string: "toto" ^ "".
I believe (Ftag ("hello"^"")) would suit your purposes here.

>
> http://caml.inria.fr/pub/docs/manual-ocaml/manual010.html#toc44
>
> # type funny_str = Ftag of string;;
> type funny_str = Ftag of string
> # let s1 = Ftag "Hello";;
> val s1 : funny_str = Ftag "Hello"
> # let Ftag s = s1 in s.[0]<-'h';;
> - : unit = ()
> # s1;;
> - : funny_str = Ftag "hello"
>
> The way I read the spec, it nowhere says that variant values that use a
> non-constant constructor plus a value can be treated as constant. I do
> see that in a sense, this may be a similar issue as the one that would
> arise with lisp code such as this:
>
> (defun example ()
>  (let ((text-segment-list '(1 2 3 4 5)))
>    (nreverse text-segment-list)))
>
> Calling (example) twice gives weird behaviour, as we are destructively
> modifying a lisp that conceptually was constant. So, one should have
> used:
>
> (defun example2 ()
>  (let ((text-segment-list (list 1 2 3 4 5)))
>    (nreverse text-segment-list)))
>
> But the problem I think I have with OCaml is: there just seems to be no
> way to properly express the conceptual difference between '(1 2 3 4 5)
> and (list 1 2 3 4 5): All I can say above is: Ftag "Hello".
>
> --
> best regards,
> Thomas Fischbacher
> t.fischbacher@soton.ac.uk
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Weird GC behaviour
  2011-09-28 11:53   ` Thomas Fischbacher
  2011-09-28 12:12     ` Gabriel Scherer
@ 2011-09-28 13:07     ` John Carr
  2011-09-28 15:30     ` Damien Doligez
  2 siblings, 0 replies; 7+ messages in thread
From: John Carr @ 2011-09-28 13:07 UTC (permalink / raw)
  To: Thomas Fischbacher; +Cc: OCaML Mailing List


> How come (Ftag "funny") is regarded as constant while
> (Rtag (ref "funny")) is not? After all, strings are mutable in OCaml,
> so there really is not that much of a conceptual difference between a
> string and a string ref in that respect:

This is just one of the axioms of the language.  A single
string literal in OCaml source code yields the same value
each time it is evaluated, even though arguably analgous
constructs are handled differently:

 [1;2;3] == [1;2;3] may be true or false
 [|1;2|] == [|1;2|] will be false
 "const" == "const" will be false
 let f () = "const" in f () == f () will be true

I would rather have identical string constants merged, as many C
compilers do.  (I am aware that it is possible to write programs
with bugs.)  The compiler could also treat string constants like
array constants and generate a new value each time the expression
is evaluated.  In that case my last example would be false.

If you want to argue analogies, consider

  let f () = F "hello"

as an alias for something like

  let hello = [|'h';'e';'l';'l';'o'|] (* at top level *)

  let f () = F hello

The name hello is as constant as any other top level binding.

> But the problem I think I have with OCaml is: there just seems to be no
> way to properly express the conceptual difference between '(1 2 3 4 5)
> and (list 1 2 3 4 5): All I can say above is: Ftag "Hello".

You can use Obj.dup to force a copy.  I don't know whether there is
a type safe interface.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Weird GC behaviour
  2011-09-28 11:53   ` Thomas Fischbacher
  2011-09-28 12:12     ` Gabriel Scherer
  2011-09-28 13:07     ` John Carr
@ 2011-09-28 15:30     ` Damien Doligez
  2011-09-28 23:32       ` Philippe Wang
  2 siblings, 1 reply; 7+ messages in thread
From: Damien Doligez @ 2011-09-28 15:30 UTC (permalink / raw)
  To: Thomas Fischbacher; +Cc: OCaML Mailing List

Dear Thomas,

Others have mostly answered your questions, but to be complete I want
to add the following:

> How come (Ftag "funny") is regarded as constant while
> (Rtag (ref "funny")) is not?

The difference here is that (ref "funny") is a function call.  The
compiler doesn't know that the "ref" function just allocates a record.

> After all, strings are mutable in OCaml,
> so there really is not that much of a conceptual difference between a
> string and a string ref in that respect:

You should rather look at the analogy between strings and arrays, but
as Gabriel said, string literals are very rarely mutated in practice,
so it makes more sense to have "funny" return the same string instead
of allocating.

> But the problem I think I have with OCaml is: there just seems to be no
> way to properly express the conceptual difference between '(1 2 3 4 5)
> and (list 1 2 3 4 5): All I can say above is: Ftag "Hello".

You should say (Ftag (String.copy "Hello")) if you want a fresh mutable
string.  I wouldn't recommend appending "" or using Obj.dup (yuck!)

I agree it was a mistake to make strings mutable, but we have to live
with it for the time being.  If you want to be perfectly safe, you can
wrap all string literals with String.copy in your program.

-- Damien


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Weird GC behaviour
  2011-09-28 15:30     ` Damien Doligez
@ 2011-09-28 23:32       ` Philippe Wang
  0 siblings, 0 replies; 7+ messages in thread
From: Philippe Wang @ 2011-09-28 23:32 UTC (permalink / raw)
  To: Damien Doligez; +Cc: OCaML Mailing List

On Wed, Sep 28, 2011 at 5:30 PM, Damien Doligez <damien.doligez@inria.fr> wrote:

> You should say (Ftag (String.copy "Hello")) if you want a fresh mutable
> string.  I wouldn't recommend appending "" or using Obj.dup (yuck!)
>
> I agree it was a mistake to make strings mutable, but we have to live
> with it for the time being.  If you want to be perfectly safe, you can
> wrap all string literals with String.copy in your program.

If the world tended to be ideal at some point, I'd say it should be a
compiler option, like
   # ocamlopt -unshared-litteral-strings
 (or -shared-literral-strings)
Sometimes it's painful that strings are mutable, but otherwise, it
would be painful too (maybe less often, I don't know). I believe that
the worst is to have *both* mutable strings *and* the necessity to
copy them every time mutability makes it weird.

When I teach programming using OCaml, I really dislike having to
explain that we have to live with this awful semantics (fortunately
the language has stunning great qualities).

Well, I guess this discussion has occurred thousands of times already.
Sorry for that.

Cheers,

-- 
Philippe Wang
   mail@philippewang.info


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-09-28 23:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-27 17:30 [Caml-list] Weird GC behaviour Thomas Fischbacher
2011-09-28 11:19 ` Damien Doligez
2011-09-28 11:53   ` Thomas Fischbacher
2011-09-28 12:12     ` Gabriel Scherer
2011-09-28 13:07     ` John Carr
2011-09-28 15:30     ` Damien Doligez
2011-09-28 23:32       ` Philippe Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).