caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Malcolm Matalka <mmatalka@gmail.com>
To: Jeremy Yallop <yallop@gmail.com>
Cc: "David Sheets" <sheets@alum.mit.edu>,
	"Jeremie Dimino" <jdimino@janestreet.com>,
	"Christoph Höger" <christoph.hoeger@tu-berlin.de>,
	"caml users" <caml-list@inria.fr>
Subject: Re: [Caml-list] Save callbacks from OCaml to C
Date: Thu, 04 Feb 2016 07:26:06 +0000	[thread overview]
Message-ID: <86io25q6lt.fsf@gmail.com> (raw)
In-Reply-To: <CAAxsn=H5OSbFmp=KB9QHXEuzHfXcUiNDmkD8=p7wUBXsR0HDeQ@mail.gmail.com> (Jeremy Yallop's message of "Wed, 3 Feb 2016 16:14:55 -0800")

Jeremy Yallop <yallop@gmail.com> writes:

> On 3 February 2016 at 12:15, Malcolm Matalka <mmatalka@gmail.com> wrote:
>> Jeremy Yallop <yallop@gmail.com> writes:
>>
>>> On 3 February 2016 at 05:44, David Sheets <sheets@alum.mit.edu> wrote:
>>>> On Wed, Feb 3, 2016 at 12:26 PM, Malcolm Matalka <mmatalka@gmail.com> wrote:
>>>>> Jeremie Dimino <jdimino@janestreet.com> writes:
>>>>>> You need to register [ml_t], [ml_x] and [ml_g
>>>>>> ] as GC roots. Otherwise if the GC runs in caml_ba_alloc for instance,
>>>>>> [ml_t] might ends up containing garbage even before reaching
>>>>>> [caml_callback3]. You can use the normal macros for that:
>>>>>>
>>>>> If one is using ctypes, is all of this taken care of?  I have a library
>>>>> that registers a bunch of Ocaml functions in C code, which the C code
>>>>> calls.  I haven't experienced anything bad happening yet, but that
>>>>> doesn't mean much...
>>>>
>>>> If you use ctypes and pass OCaml closures to C, you *must* retain a
>>>> reference to the closure to avoid it being GCed. If you do not, you
>>>> may experience the exception CallToExpiredClosure sporadically.
>>>
>>> Besides David's caveat, the answer is yes: ctypes will take care of
>>> registering arguments as GC roots as necessary.
>>
>> Can you clarify this a bit?  I'm not that familiar with how the C FFI
>> works.  If I pass in a closure to a C function and it is registered as a
>> GC root, doesn't that mean it won't be GCd if my Ocaml program forgets
>> about it or?
>
> That's how roots behave, yes: while a value is registered as a root,
> the value won't be collected.   There are (roughly speaking) two types
> of root in OCaml: local roots, which persist for the duration of a
> function call, and global roots, which persist until explicitly
> released.  A C function binding written by hand must ensure that OCaml
> values passed to it as arguments are registered as local roots, so
> that if a collection occurs while the function is running the values
> won't be prematurely collected.
>
> A C binding written using ctypes can generally ignore the matter of
> roots.  That's partly because ctypes takes care of root registration,
> but also because most types passed between OCaml and C in a ctypes
> binding are C values, not OCaml values.  For example, if you want to
> pass a structure with several fields between OCaml and C there are two
> approaches.  One approach is to represent the structure as an OCaml
> record, which involves accessing the fields of the value in your C
> binding using various macros, taking care to register values as roots
> to protect them from the GC.  The other approach is to represent the
> structure as a C struct, which involves accessing its fields in OCaml
> using the functions ctypes provides.  (If you enjoy programming in an
> untyped dialect of C with ubiquitous concurrency, you'll probably
> favour the first approach.  If you prefer programming in OCaml then
> the second approach might have some appeal.)
>
> Using the C value representation for values that cross the C-OCaml
> boundary generally works well, but when things become higher-order,
> the situation changes a bit.  When a C library expects to be given a
> first-order value such as a struct we have to give it a struct with
> the appropriate layout, since C functions can directly access the
> representation of values.  However, when the library expects a
> function pointer we have a bit more freedom, since the representation
> of functions isn't accessible -- in fact, the only thing that can be
> done with a function pointer, besides passing it from place to place,
> is calling it.  This freedom means that we can pass an OCaml function,
> suitably packaged up, where a C function pointer is expected.
>
> Passing OCaml functions to C as function pointers raises some
> interesting issues relating to object lifetime and the garbage
> collector.  The main difficulty arises from the fact that once you
> pass a function pointer to a C library there's no way of knowing how
> long the library holds on to it: for example, the library might
> discard the function pointer when the call into the library returns,
> or it might store the function pointer in a global variable to be
> retrieved and called later.  In order to prevent the associated
> function from being collected prematurely, some kind of action is
> needed on the OCaml side, whether registering a global root, or
> ensuring that the function is reachable from the OCaml program.
>
>> Also, David and I were talking about how to solve this on IRC.  In my
>> specific case, callbacks are one-shot, which means I know they need to
>> be remembered until they are called then they can (possibly) be freed.
>> Is there a nice solution here?  I'd prefer not to store them in some
>> other data structure and remove them later just to keep a reference
>> alive, if possible.
>
> Storing some kind of references to the functions in a place that the
> collector can see is essential to prevent the functions from being
> collected prematurely.  The situation is the same whether you use
> ctypes or write bindings by hand.
>
> Storing the functions in a table, and removing them automatically
> after they're called is one approach.  An alternative is to use the
> new Ctypes.Roots module, which will be available in the next release:
>
>    https://github.com/ocamllabs/ocaml-ctypes/blob/182a9e64src/ctypes/ctypes.mli#L419-L435

Thank you for the thorough response.  It seems like Ctypes.Roots might
solve my problem, although the URL gives me a 404.  Do you have an
estimation of when this will be released (or anything someone like
myself can do to help?)

>
>> That is overhead I'd prefer to avoid, if possible.
>> I plan on having possibly hundreds of thousands of these callbacks alive
>> at any point in time.
>
> In that case it sounds like there'll be an overhead of up to a few megabytes.

Any suggestions for a datatype to use here?  I do have an object that is
long lived that represents the event loop I'm integrating against, so I
can store anything I want in there.  Last night I was really concerned
about storing this extra information in the loop, just seemed like a
waste, but in the morning light I'm less worried about it.  I could just
use a Hashtbl I guess with some reference to the closure.  My current
idea is to make some integer value and wrap the closure up in something
like:

(fun () -> Hashtbl.remove t id; closure ())

What kind of sucks about that is the wrapper needs to be unique to each
type of closure that gets called, there doesn't seem like a really
generic way to do this wrapping.  Am I on the wrong track?

Thanks again,
/Malcolm

  reply	other threads:[~2016-02-04  7:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 10:54 Christoph Höger
2016-02-03 11:48 ` Jeremie Dimino
2016-02-03 12:26   ` Malcolm Matalka
2016-02-03 13:44     ` David Sheets
2016-02-03 18:02       ` Jeremy Yallop
2016-02-03 20:15         ` Malcolm Matalka
2016-02-04  0:14           ` Jeremy Yallop
2016-02-04  7:26             ` Malcolm Matalka [this message]
2016-02-04 19:29               ` Jeremy Yallop
     [not found]   ` <56B1EC33.2090303@tu-berlin.de>
2016-02-03 13:49     ` Jeremie Dimino
2016-02-03 14:38       ` Christoph Höger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86io25q6lt.fsf@gmail.com \
    --to=mmatalka@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=christoph.hoeger@tu-berlin.de \
    --cc=jdimino@janestreet.com \
    --cc=sheets@alum.mit.edu \
    --cc=yallop@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).