caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Why is struct caml_ba_proxy allocated outside the GC heap and why doesn't it have a finalizer?
@ 2014-02-01 16:42 Goswin von Brederlow
  2014-02-01 20:49 ` Markus Mottl
  0 siblings, 1 reply; 5+ messages in thread
From: Goswin von Brederlow @ 2014-02-01 16:42 UTC (permalink / raw)
  To: Ocaml Mailing List

Hi,

in ZMQ (C library) I have zmq_msg_t structs that contain some metadata
(like the length) and potentially a pointer to data. For short
messages the data is part of zmq_msg_t and for larger it is allocated
seperately. The zmq_msg_t is an abstract type and I need to call
zmq_msg_close() when I no longer need that data of a message.

Now when I write bindings for this I would like to use Bigarray to
grant access to the data and there I run into a problem. I need to
call zmq_msg_close() when the Bigarray is freed by the GC. Normaly I
could use Gc.finalise to register a function that calls
zmq_msg_close(). But not so with Bigarray because they can be sliced.
The slicing creates a new Bigarray that points to the same data. The
data can only be freed when every Bigarray pointing to it is
unreachable. The way this works now in ocaml is using a struct
caml_ba_proxy.

The problem for me now is that the caml_ba_proxy is allocated outside
the GC heap and not reachable from the ocaml side. But I would have to
call Gc.finalise for the caml_ba_proxy object instead of the Bigarray.


Currently a Bigarray is a custom block that optionally contains a
pointer to the caml_ba_proxy. The pointer is set when a Bigarray is
sliced for the first time. And the caml_ba_proxy does reference
counting. A Bigarray also has a flag to say that the memory it points
to is not to be freed by the GC, is to be freed or is to be
munmap()ed, hardcoding 3 options.

I can see 3 possible improvements there:

1) add a "void (*free)(struct caml_ba_proxy *)" to the caml_ba_proxy
structure that, if not NULL, gets called when the caml_ba_proxy is
freed. In the case of GC managed memory this would be set to free the
memory. In the case of mmap it would be set to munmap. And in
unmanaged memory it would be NULL. And for C bindings using Bigarray
they can pass in their own free function pointer.

2) Like 1 but also add a "void *private". Additional state for the
Bigarray can be stored there. In my case a pointer to the zmq_msg_t
would be stored. Actually forget about 1 and just do 2.

3) Bigarray becomes a normal OCaml record and the caml_ba_proxy
becomes a custom block. The finaliser, compare, serialise, ...
functions of the Bigarray and some flags move to the caml_ba_proxy,
which is no longer optional. Reference counting gets dropped since the
GC already covers that better now and the Bigarray module adds a
function to add finalisers to the caml_ba_proxy object.


So what do you think? Would it make sense to patch ocaml to support
option 2 or 3?

MfG
	Goswin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Why is struct caml_ba_proxy allocated outside the GC heap and why doesn't it have a finalizer?
  2014-02-01 16:42 [Caml-list] Why is struct caml_ba_proxy allocated outside the GC heap and why doesn't it have a finalizer? Goswin von Brederlow
@ 2014-02-01 20:49 ` Markus Mottl
  2014-02-02  1:38   ` Goswin von Brederlow
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Mottl @ 2014-02-01 20:49 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Ocaml Mailing List

I'd opt for 2).

1) seems problematic, e.g. what if several functions from possibly
independent libraries needed to be executed?  2) would allow you to
store a previous function pointer and the old contents of the private
field in the new private field so you can "chain" finalization calls.
This would also work seamlessly with "free" and "munmap".

3) would seem less efficient.  Storing pointers to C-data in a normal
record could cause troubles with the GC so you might have to store
another unscanned abstract/custom block in the record, adding another
layer of indirection.

Regards,
Markus

On Sat, Feb 1, 2014 at 11:42 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> Hi,
>
> in ZMQ (C library) I have zmq_msg_t structs that contain some metadata
> (like the length) and potentially a pointer to data. For short
> messages the data is part of zmq_msg_t and for larger it is allocated
> seperately. The zmq_msg_t is an abstract type and I need to call
> zmq_msg_close() when I no longer need that data of a message.
>
> Now when I write bindings for this I would like to use Bigarray to
> grant access to the data and there I run into a problem. I need to
> call zmq_msg_close() when the Bigarray is freed by the GC. Normaly I
> could use Gc.finalise to register a function that calls
> zmq_msg_close(). But not so with Bigarray because they can be sliced.
> The slicing creates a new Bigarray that points to the same data. The
> data can only be freed when every Bigarray pointing to it is
> unreachable. The way this works now in ocaml is using a struct
> caml_ba_proxy.
>
> The problem for me now is that the caml_ba_proxy is allocated outside
> the GC heap and not reachable from the ocaml side. But I would have to
> call Gc.finalise for the caml_ba_proxy object instead of the Bigarray.
>
>
> Currently a Bigarray is a custom block that optionally contains a
> pointer to the caml_ba_proxy. The pointer is set when a Bigarray is
> sliced for the first time. And the caml_ba_proxy does reference
> counting. A Bigarray also has a flag to say that the memory it points
> to is not to be freed by the GC, is to be freed or is to be
> munmap()ed, hardcoding 3 options.
>
> I can see 3 possible improvements there:
>
> 1) add a "void (*free)(struct caml_ba_proxy *)" to the caml_ba_proxy
> structure that, if not NULL, gets called when the caml_ba_proxy is
> freed. In the case of GC managed memory this would be set to free the
> memory. In the case of mmap it would be set to munmap. And in
> unmanaged memory it would be NULL. And for C bindings using Bigarray
> they can pass in their own free function pointer.
>
> 2) Like 1 but also add a "void *private". Additional state for the
> Bigarray can be stored there. In my case a pointer to the zmq_msg_t
> would be stored. Actually forget about 1 and just do 2.
>
> 3) Bigarray becomes a normal OCaml record and the caml_ba_proxy
> becomes a custom block. The finaliser, compare, serialise, ...
> functions of the Bigarray and some flags move to the caml_ba_proxy,
> which is no longer optional. Reference counting gets dropped since the
> GC already covers that better now and the Bigarray module adds a
> function to add finalisers to the caml_ba_proxy object.
>
>
> So what do you think? Would it make sense to patch ocaml to support
> option 2 or 3?
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs



-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Why is struct caml_ba_proxy allocated outside the GC heap and why doesn't it have a finalizer?
  2014-02-01 20:49 ` Markus Mottl
@ 2014-02-02  1:38   ` Goswin von Brederlow
  2014-02-02  2:06     ` Markus Mottl
  0 siblings, 1 reply; 5+ messages in thread
From: Goswin von Brederlow @ 2014-02-02  1:38 UTC (permalink / raw)
  To: caml-list

On Sat, Feb 01, 2014 at 03:49:55PM -0500, Markus Mottl wrote:
> I'd opt for 2).
> 
> 1) seems problematic, e.g. what if several functions from possibly
> independent libraries needed to be executed?  2) would allow you to
> store a previous function pointer and the old contents of the private
> field in the new private field so you can "chain" finalization calls.
> This would also work seamlessly with "free" and "munmap".

The free pointer would only be for the one finalize function needed to
free the blob of memory. Not a general Gc.finalise thing. So there
should never be more than one function.
 
> 3) would seem less efficient.  Storing pointers to C-data in a normal
> record could cause troubles with the GC so you might have to store
> another unscanned abstract/custom block in the record, adding another
> layer of indirection.

The GC knows which pointers point inside the GC heap and which point
outside. So there is no problem there. The pointer must just not exist
after the destination has been freed or it might accidentaly point to
inside the next GC heap. Can't happen in this use case.
 
> Regards,
> Markus
> 
> On Sat, Feb 1, 2014 at 11:42 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> > Hi,
> >
> > in ZMQ (C library) I have zmq_msg_t structs that contain some metadata
> > (like the length) and potentially a pointer to data. For short
> > messages the data is part of zmq_msg_t and for larger it is allocated
> > seperately. The zmq_msg_t is an abstract type and I need to call
> > zmq_msg_close() when I no longer need that data of a message.
> >
> > Now when I write bindings for this I would like to use Bigarray to
> > grant access to the data and there I run into a problem. I need to
> > call zmq_msg_close() when the Bigarray is freed by the GC. Normaly I
> > could use Gc.finalise to register a function that calls
> > zmq_msg_close(). But not so with Bigarray because they can be sliced.
> > The slicing creates a new Bigarray that points to the same data. The
> > data can only be freed when every Bigarray pointing to it is
> > unreachable. The way this works now in ocaml is using a struct
> > caml_ba_proxy.
> >
> > The problem for me now is that the caml_ba_proxy is allocated outside
> > the GC heap and not reachable from the ocaml side. But I would have to
> > call Gc.finalise for the caml_ba_proxy object instead of the Bigarray.
> >
> >
> > Currently a Bigarray is a custom block that optionally contains a
> > pointer to the caml_ba_proxy. The pointer is set when a Bigarray is
> > sliced for the first time. And the caml_ba_proxy does reference
> > counting. A Bigarray also has a flag to say that the memory it points
> > to is not to be freed by the GC, is to be freed or is to be
> > munmap()ed, hardcoding 3 options.
> >
> > I can see 3 possible improvements there:
> >
> > 1) add a "void (*free)(struct caml_ba_proxy *)" to the caml_ba_proxy
> > structure that, if not NULL, gets called when the caml_ba_proxy is
> > freed. In the case of GC managed memory this would be set to free the
> > memory. In the case of mmap it would be set to munmap. And in
> > unmanaged memory it would be NULL. And for C bindings using Bigarray
> > they can pass in their own free function pointer.
> >
> > 2) Like 1 but also add a "void *private". Additional state for the
> > Bigarray can be stored there. In my case a pointer to the zmq_msg_t
> > would be stored. Actually forget about 1 and just do 2.
> >
> > 3) Bigarray becomes a normal OCaml record and the caml_ba_proxy
> > becomes a custom block. The finaliser, compare, serialise, ...
> > functions of the Bigarray and some flags move to the caml_ba_proxy,
> > which is no longer optional. Reference counting gets dropped since the
> > GC already covers that better now and the Bigarray module adds a
> > function to add finalisers to the caml_ba_proxy object.
> >
> >
> > So what do you think? Would it make sense to patch ocaml to support
> > option 2 or 3?
> >
> > MfG
> >         Goswin
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa.inria.fr/sympa/arc/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 
> 
> -- 
> Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Why is struct caml_ba_proxy allocated outside the GC heap and why doesn't it have a finalizer?
  2014-02-02  1:38   ` Goswin von Brederlow
@ 2014-02-02  2:06     ` Markus Mottl
  2014-02-04 15:32       ` Goswin von Brederlow
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Mottl @ 2014-02-02  2:06 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: caml-list

On Sat, Feb 1, 2014 at 8:38 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> The GC knows which pointers point inside the GC heap and which point
> outside. So there is no problem there. The pointer must just not exist
> after the destination has been freed or it might accidentaly point to
> inside the next GC heap. Can't happen in this use case.

Exactly.  Though the GC checks whether a pointer actually points into
the OCaml heap before following it, it does require some care to
guarantee that the pointer, if still reachable by the GC, is
invalidated (e.g. set to NULL) before associated memory is freed.
This may indeed not be that hard to achieve for the bigarray module.
I still prefer 2), which would also be easier to add to the current
implementation without breaking existing code.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Why is struct caml_ba_proxy allocated outside the GC heap and why doesn't it have a finalizer?
  2014-02-02  2:06     ` Markus Mottl
@ 2014-02-04 15:32       ` Goswin von Brederlow
  0 siblings, 0 replies; 5+ messages in thread
From: Goswin von Brederlow @ 2014-02-04 15:32 UTC (permalink / raw)
  To: caml-list

On Sat, Feb 01, 2014 at 09:06:54PM -0500, Markus Mottl wrote:
> On Sat, Feb 1, 2014 at 8:38 PM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> > The GC knows which pointers point inside the GC heap and which point
> > outside. So there is no problem there. The pointer must just not exist
> > after the destination has been freed or it might accidentaly point to
> > inside the next GC heap. Can't happen in this use case.
> 
> Exactly.  Though the GC checks whether a pointer actually points into
> the OCaml heap before following it, it does require some care to
> guarantee that the pointer, if still reachable by the GC, is
> invalidated (e.g. set to NULL) before associated memory is freed.
> This may indeed not be that hard to achieve for the bigarray module.
> I still prefer 2), which would also be easier to add to the current
> implementation without breaking existing code.
> 
> Regards,
> Markus
> 
> -- 
> Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

Time to try out that new ocaml program to use git-hub for patches to
the compiler. I will go with option 2 because it will not change
anything in the compiler and the optimizations for bigarrays itself.
It will only affect the C code fo allocation and dinalization. So
should be far easier.

I think I will add a new flag to the Bigarray structure that says when
a "free" function was added to the proxy object. That way even if some
software creates the proxy object themself (and won't have the free_fn
pointer) this will work.

MfG
	Goswin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-02-04 15:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-01 16:42 [Caml-list] Why is struct caml_ba_proxy allocated outside the GC heap and why doesn't it have a finalizer? Goswin von Brederlow
2014-02-01 20:49 ` Markus Mottl
2014-02-02  1:38   ` Goswin von Brederlow
2014-02-02  2:06     ` Markus Mottl
2014-02-04 15:32       ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).