caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings
@ 2016-01-12  9:12 Neuhaeusser, Martin
  2016-01-12  9:37 ` Kim Nguyễn
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Neuhaeusser, Martin @ 2016-01-12  9:12 UTC (permalink / raw)
  To: caml-list

Dear all,

during our work on some SMT solver bindings, a couple of question came up regarding the behavior of the OCaml garbage collector (see also https://github.com/Z3Prover/z3/issues/411). Assume we have defined a record type containing a native pointer to some object from an external C-DLL:

type ocaml_record = {
  native_ptr np;
  [... some more fields ...]
}

When creating an ocaml_record, we register a finalizer that makes the C library dispose the data belonging to the native pointer once the GC collects the OCaml record value:

let f ocaml_record = NativeLib.dispose ocaml_record.np

let create [...] =
  let new_ocaml_record = { ... } in
  Gc.finalise f new_ocaml_record;
  new_ocaml_record

When calling one of the C-stubs, we pass the native pointer from the OCaml record value:

let get_native_pointer ocaml_record = ocaml_record.np

NativeLib.native_function (get_native_pointer ocaml_record)

However, this has the problem that if ocaml_record has become otherwise unreachable, the GC might collect ocaml_record directly after evaluating (get_native_pointer ocaml_record), triggering the finalizer which disposes the data pointed at by ocaml_record.native_ptr. The successive call to NativeLib.native_function (i.e. the C-stub) results in a segfault, as it tries to access data that has previously been freed by the finalizer.

I assume this is a common problem in writing interfaces to C libraries. If so, is there a preferred way how to tackle it?
Two approaches that came to my mind are

1. One could design the C-stubs such that they accept values of type ocaml_record and extract the native pointer within the C stub. In the C-stubs, the GC must not collect an item that has been "pinned" by the CAMLparamX macros, right?
2. One could invent some function that takes an ocaml_record, but does nothing with it and whose calls do not get optimized away by the compiler... Evaluating such a function after the call to NativeLib.native_function would prevent the GC from collecting ocaml_record. However, this feels like a very ugly hack. 

Are there any better ideas? Any help and suggestions are highly appreciated.

Best,
Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings
  2016-01-12  9:12 [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings Neuhaeusser, Martin
@ 2016-01-12  9:37 ` Kim Nguyễn
  2016-01-12 10:33 ` Gerd Stolpmann
  2016-01-12 13:06 ` Richard W.M. Jones
  2 siblings, 0 replies; 4+ messages in thread
From: Kim Nguyễn @ 2016-01-12  9:37 UTC (permalink / raw)
  To: Neuhaeusser, Martin; +Cc: caml-list

Hi,

On Tue, Jan 12, 2016 at 10:12 AM, Neuhaeusser, Martin
<martin.neuhaeusser@siemens.com> wrote:
> Dear all,
>
> I assume this is a common problem in writing interfaces to C libraries. If so, is there a preferred way how to tackle it?
> Two approaches that came to my mind are
>
> 1. One could design the C-stubs such that they accept values of type ocaml_record and extract the native pointer within the C stub. In the C-stubs, the GC must not collect an item that has been "pinned" by the CAMLparamX macros, right?

That would work but in that case, the approach I prefer is to put the
native pointer in a custom block. This way you can attach your
finalizer (written in C) directly to the block and it will be called
when the (wraped) pointer itself is reclaimed. This also allows you to
fine tune the behaviour of the GC w.r.t. the use you make of such
pointers. Of course, one orthogonal problem is that if you create two
custom blocks with the same pointer inside, you must implement some
other mechanism to avoid double frees (like refcounting or something).
But this is a problem you already have with your OCaml record approach
(if two distinct records can hold the same native_ptr, then the
finalizer might get called twice). The advantage of custom blocks is
then that they are self sufficient and you can put such objects in
other data-structures (e.g. for debugging purposes). But indeed your C
bindings will need to extract the pointer from the custom block and
since they are given a heap allocated OCaml value (the custom block)
it will need to be protected bu CAMLparam macros. This also might make
your code more future-proof since it seems that at some point (just
from reading this mailing list) there will be (is ?) a version of the
OCaml runtime where naked pointer are forbidden on the heap (unless
they are wrapped in a custom block).


> 2. One could invent some function that takes an ocaml_record, but does nothing with it and whose calls do not get optimized away by the compiler... Evaluating such a function after the call to NativeLib.native_function would prevent the GC from collecting ocaml_record. However, this feels like a very ugly hack.

Very ugly indeed. And the OCaml compiler is getting better and better
at inlining stuff so it's quite hard to predict what is inlined and
what isn't (unless you write some "obviously" inefficient code that
has no chance what so ever to be inlined … but sill).

Cheers,
--
Kim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings
  2016-01-12  9:12 [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings Neuhaeusser, Martin
  2016-01-12  9:37 ` Kim Nguyễn
@ 2016-01-12 10:33 ` Gerd Stolpmann
  2016-01-12 13:06 ` Richard W.M. Jones
  2 siblings, 0 replies; 4+ messages in thread
From: Gerd Stolpmann @ 2016-01-12 10:33 UTC (permalink / raw)
  To: Neuhaeusser, Martin; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3469 bytes --]

Dealing with naked pointers from OCaml is notoriously difficult. As you
found out, you have no good ways of controlling GC cycles, and to limit
bad effects of that. Also, there is a dangling pointer problem -
essentially the naked pointer can be mistaken as heap pointer between
the time the memory has been freed and the naked pointer is set to null.
Note that this aspect is practically impossible to do right from OCaml
code, and even in a C function it is easy to get wrong, resulting in
random crashes that occur infrequently.

For these reasons naked pointers are strongly discouraged. The way to go
is to wrap pointers into custom blocks
(http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#sec458), and to
do all pointer management in C.

Gerd

Am Dienstag, den 12.01.2016, 09:12 +0000 schrieb Neuhaeusser, Martin:
> Dear all,
> 
> during our work on some SMT solver bindings, a couple of question came up regarding the behavior of the OCaml garbage collector (see also https://github.com/Z3Prover/z3/issues/411). Assume we have defined a record type containing a native pointer to some object from an external C-DLL:
> 
> type ocaml_record = {
>   native_ptr np;
>   [... some more fields ...]
> }
> 
> When creating an ocaml_record, we register a finalizer that makes the C library dispose the data belonging to the native pointer once the GC collects the OCaml record value:
> 
> let f ocaml_record = NativeLib.dispose ocaml_record.np
> 
> let create [...] =
>   let new_ocaml_record = { ... } in
>   Gc.finalise f new_ocaml_record;
>   new_ocaml_record
> 
> When calling one of the C-stubs, we pass the native pointer from the OCaml record value:
> 
> let get_native_pointer ocaml_record = ocaml_record.np
> 
> NativeLib.native_function (get_native_pointer ocaml_record)
> 
> However, this has the problem that if ocaml_record has become otherwise unreachable, the GC might collect ocaml_record directly after evaluating (get_native_pointer ocaml_record), triggering the finalizer which disposes the data pointed at by ocaml_record.native_ptr. The successive call to NativeLib.native_function (i.e. the C-stub) results in a segfault, as it tries to access data that has previously been freed by the finalizer.
> 
> I assume this is a common problem in writing interfaces to C libraries. If so, is there a preferred way how to tackle it?
> Two approaches that came to my mind are
> 
> 1. One could design the C-stubs such that they accept values of type ocaml_record and extract the native pointer within the C stub. In the C-stubs, the GC must not collect an item that has been "pinned" by the CAMLparamX macros, right?
> 2. One could invent some function that takes an ocaml_record, but does nothing with it and whose calls do not get optimized away by the compiler... Evaluating such a function after the call to NativeLib.native_function would prevent the GC from collecting ocaml_record. However, this feels like a very ugly hack. 
> 
> Are there any better ideas? Any help and suggestions are highly appreciated.
> 
> Best,
> Martin
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings
  2016-01-12  9:12 [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings Neuhaeusser, Martin
  2016-01-12  9:37 ` Kim Nguyễn
  2016-01-12 10:33 ` Gerd Stolpmann
@ 2016-01-12 13:06 ` Richard W.M. Jones
  2 siblings, 0 replies; 4+ messages in thread
From: Richard W.M. Jones @ 2016-01-12 13:06 UTC (permalink / raw)
  To: Neuhaeusser, Martin; +Cc: caml-list

The other replies have covered some of the problems.  You may also be
interested in example code, and we've got lots :-)  Most of the bugs
have now been fixed, after several iterations.  Here are some things
to get you started.

(1) Simple example of a finalizer attached to a custom block:

https://github.com/libguestfs/libguestfs/blob/master/mllib/progress-c.c
https://github.com/libguestfs/libguestfs/blob/master/mllib/progress.ml
https://github.com/libguestfs/libguestfs/blob/master/mllib/progress.mli

(2) A more complex example using generational roots to deal with
callbacks from OCaml back to C:

https://github.com/libguestfs/libguestfs/blob/master/ocaml/guestfs-c.c

This one had a major bug, when we discovered that the root caused the
handle to be always reachable, so the finalizer was not called until
the program exited (the fact that we also have a #close method, which
we always called, made this less obvious than you might think at
first).  That was fixed in:

https://github.com/libguestfs/libguestfs/commit/8bbc5e73cb5b56b5cfbe979ac0e1c14d1701a0d8

(3) A tricky binding to libxml2.

Because libxml2 has objects containing pointers to other objects (at
the C level) we need to shadow these with OCaml structs, to ensure
that an OCaml object doesn't become unreachable when it is still
pointed to from another object.

https://github.com/libguestfs/libguestfs/blob/master/v2v/xml-c.c
https://github.com/libguestfs/libguestfs/blob/master/v2v/xml.ml
https://github.com/libguestfs/libguestfs/blob/master/v2v/xml.mli

If you look at the history of these files, you'll see we discovered
and fixed major bugs, eg this one concerned with the order in which we
freed objects:

https://github.com/libguestfs/libguestfs/commit/3888582da89c757d0740d11c3a62433d748c85aa

Note that (3) is a counter-example to the idea that you should use
custom blocks.  Custom block finalizers in OCaml have no ordering
guarantee - if you need an ordering guarantee you must use an OCaml
finalizer.

You can probably see this stuff gets complicated quickly.  Thank
goodness for valgrind!

Rich.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-12 13:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12  9:12 [Caml-list] The OCaml garbage collector, finalisers, and the right way of disposing native pointers in C bindings Neuhaeusser, Martin
2016-01-12  9:37 ` Kim Nguyễn
2016-01-12 10:33 ` Gerd Stolpmann
2016-01-12 13:06 ` Richard W.M. Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).