caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Global roots causing performance problems
@ 2008-03-06 23:51 Markus Mottl
  2008-03-07 14:10 ` [Caml-list] " Xavier Leroy
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Mottl @ 2008-03-06 23:51 UTC (permalink / raw)
  To: ocaml; +Cc: ocaml-users

Hi,

this is actually a somewhat long-standing issue, but we have only
recently been hit by it in a way that makes working around it hard,
and we would like to know whether there is a reasonable chance that
this could be solved in a future OCaml-release:

Many C-bindings use C-datastructures that refer to OCaml-values.
This, of course, means that these OCaml-values must not be reclaimed
as long as they can still be referred to.  The current OCaml-FFI
already provides functions (caml_register_global_root, etc.) for
protecting such values.  Unfortunately, it seems that this feature was
not implemented with the possibility in mind that one day people might
want to refer to many thousands of OCaml-values (most often closures)
from C-data.  The garbage collector obviously scans all global roots
on every minor collection, which leads to a very substantial
performance loss in such situations.  The minor heap may often even be
almost empty, i.e. we might only need to chase a few live values,
while we would still have to scan through thousands of global roots.

A common and unfortunately quite clumsy trick to work around this
limitation is to use a hash table from integers to the values to be
protected.  Instead of storing the OCaml-value itself in the
C-datastructure and protecting it, we only need to store the integer
handle for the value, which does not need to be protected.  C-land
then only needs to perform a callback into OCaml-land with the integer
handle in order to use the value it refers to, or to remove it it from
the hash table if it isn't needed anymore.

The problem here is that this only makes sense when writing a new
library or when modifying a small existing one.  We have run into this
issue with LablGTK, which is just way too large and complex to make
this a feasible thing to do.

We therefore wonder whether it wouldn't be much more effective to fix
the runtime.  I don't know the exact details of how things currently
work, but I guess that it would be possible to have two separate sets
of global roots for the minor and major heap.  Then, once a value gets
oldified, the global root, too, could wander to the corresponding set.
 The set for the major heap could then be scanned only once per full
major cycle, maybe even in slices, too.  Would this suggestion be easy
to implement?

Best regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Global roots causing performance problems
  2008-03-06 23:51 Global roots causing performance problems Markus Mottl
@ 2008-03-07 14:10 ` Xavier Leroy
  2008-03-07 14:52   ` Berke Durak
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Xavier Leroy @ 2008-03-07 14:10 UTC (permalink / raw)
  To: Markus Mottl; +Cc: ocaml, ocaml-users

> [GC overhead of having many global memory roots]
> We therefore wonder whether it wouldn't be much more effective to fix
> the runtime.  I don't know the exact details of how things currently
> work, but I guess that it would be possible to have two separate sets
> of global roots for the minor and major heap.  Then, once a value gets
> oldified, the global root, too, could wander to the corresponding set.
>  The set for the major heap could then be scanned only once per full
> major cycle, maybe even in slices, too.  Would this suggestion be easy
> to implement?

This "generational" approach is the natural solution to the problem
you mention.  However, it is not compatible with the current API for
global root registration: when a program registers a "value *" pointer
using caml_register_global_root(), the program is free to change the
value contained in that placeholder at any time without notifying the
Caml memory manager.  As a consequence, the minor GC has no choice but
scanning all global roots every time, because any of them could have
been overwritten with a freshly-allocated Caml block since the
previous minor GC.

There are 2 ways to go about this problem:

1- Change the specs of caml_register_global_root() to prohibit
in-place updates to the value contained in the registered value
pointer.  If programmers need to do this, they must un-register the
value pointer, update its contents, then re-register it.
How much existing code would that break?  I don't know.

2- Keep the current API for backward compatibility and add a
caml_register_global_immutable_root() function that would implement
generational scanning of global roots, in exchange for the
programmer's guarantee that the values contained in those roots are
never changed.  Then, convince authors of Caml-C bindings to use the
new API.

I'm willing to implement any of these 2 approaches, but it is not a
transparent change in either case.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Global roots causing performance problems
  2008-03-07 14:10 ` [Caml-list] " Xavier Leroy
@ 2008-03-07 14:52   ` Berke Durak
  2008-03-07 16:45   ` Richard Jones
  2008-03-07 17:05   ` Markus Mottl
  2 siblings, 0 replies; 5+ messages in thread
From: Berke Durak @ 2008-03-07 14:52 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Markus Mottl, ocaml-users, ocaml

Xavier Leroy a écrit :

Hello,

> This "generational" approach is the natural solution to the problem
> you mention.  However, it is not compatible with the current API for
> global root registration: when a program registers a "value *" pointer
> using caml_register_global_root(), the program is free to change the
> value contained in that placeholder at any time without notifying the
> Caml memory manager.  As a consequence, the minor GC has no choice but
> scanning all global roots every time, because any of them could have
> been overwritten with a freshly-allocated Caml block since the
> previous minor GC.
> 
> There are 2 ways to go about this problem:
> 
> 1- Change the specs of caml_register_global_root() to prohibit
> in-place updates to the value contained in the registered value
> pointer.  If programmers need to do this, they must un-register the
> value pointer, update its contents, then re-register it.
> How much existing code would that break?  I don't know.

I'm using caml_register_global_root extensively in Aurochs when building
the parse tree, and updating it in-place (consing a list of children nodes,
actually.)

If you change the semantics of caml_register_global_root() it would
be nice to have a caml_modify_global_root() macro that does what is
needed.

> 2- Keep the current API for backward compatibility and add a
> caml_register_global_immutable_root() function that would implement
> generational scanning of global roots, in exchange for the
> programmer's guarantee that the values contained in those roots are
> never changed.  Then, convince authors of Caml-C bindings to use the
> new API.

That's the better solution IMHO, as it won't break existing code and gives
a clear migration path to better performance and improved customer satisfaction.

> I'm willing to implement any of these 2 approaches, but it is not a
> transparent change in either case.

-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Global roots causing performance problems
  2008-03-07 14:10 ` [Caml-list] " Xavier Leroy
  2008-03-07 14:52   ` Berke Durak
@ 2008-03-07 16:45   ` Richard Jones
  2008-03-07 17:05   ` Markus Mottl
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Jones @ 2008-03-07 16:45 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Markus Mottl, ocaml-users, ocaml

On Fri, Mar 07, 2008 at 03:10:05PM +0100, Xavier Leroy wrote:
> > [GC overhead of having many global memory roots]
> > We therefore wonder whether it wouldn't be much more effective to fix
> > the runtime.  I don't know the exact details of how things currently
> > work, but I guess that it would be possible to have two separate sets
> > of global roots for the minor and major heap.  Then, once a value gets
> > oldified, the global root, too, could wander to the corresponding set.
> >  The set for the major heap could then be scanned only once per full
> > major cycle, maybe even in slices, too.  Would this suggestion be easy
> > to implement?
> 
> This "generational" approach is the natural solution to the problem
> you mention.  However, it is not compatible with the current API for
> global root registration: when a program registers a "value *" pointer
> using caml_register_global_root(), the program is free to change the
> value contained in that placeholder at any time without notifying the
> Caml memory manager.  As a consequence, the minor GC has no choice but
> scanning all global roots every time, because any of them could have
> been overwritten with a freshly-allocated Caml block since the
> previous minor GC.
> 
> There are 2 ways to go about this problem:
> 
> 1- Change the specs of caml_register_global_root() to prohibit
> in-place updates to the value contained in the registered value
> pointer.  If programmers need to do this, they must un-register the
> value pointer, update its contents, then re-register it.
> How much existing code would that break?  I don't know.
> 
> 2- Keep the current API for backward compatibility and add a
> caml_register_global_immutable_root() function that would implement
> generational scanning of global roots, in exchange for the
> programmer's guarantee that the values contained in those roots are
> never changed.  Then, convince authors of Caml-C bindings to use the
> new API.

The second option is much preferable for two reasons:

(a) If libraries don't change then at least they don't break.

(b) It is possible to update a library by grepping through the source
for caml_register_global_root and then examining each call to see if
you can prove the new constraint.  If you can't be certain, well no
sweat, just leave it as it is.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Global roots causing performance problems
  2008-03-07 14:10 ` [Caml-list] " Xavier Leroy
  2008-03-07 14:52   ` Berke Durak
  2008-03-07 16:45   ` Richard Jones
@ 2008-03-07 17:05   ` Markus Mottl
  2 siblings, 0 replies; 5+ messages in thread
From: Markus Mottl @ 2008-03-07 17:05 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: ocaml, ocaml-users

On Fri, Mar 7, 2008 at 9:10 AM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>  1- Change the specs of caml_register_global_root() to prohibit
>  in-place updates to the value contained in the registered value
>  pointer.  If programmers need to do this, they must un-register the
>  value pointer, update its contents, then re-register it.
>  How much existing code would that break?  I don't know.

I have just looked at a fairly large number of bindings.  Only a few
actually register global roots, and those that do would be perfectly
safe - with the exception of LablGTK it seems.  I'm a bit wary of
changing the semantics of global root registration in a way that could
seriously break existing libraries even though almost all of them seem
fine.

>  2- Keep the current API for backward compatibility and add a
>  caml_register_global_immutable_root() function that would implement
>  generational scanning of global roots, in exchange for the
>  programmer's guarantee that the values contained in those roots are
>  never changed.  Then, convince authors of Caml-C bindings to use the
>  new API.

After having given it some thought, I think I prefer this approach.
As a maintainer of some C-bindings that would be affected by this
change, I'd be perfectly happy to upgrade them, which seems like
extremely little work.

I also second Berke Durak's proposal to add some kind of modification
macro/function.  Concerning the naming we should maybe avoid the name
"immutable" then, because one can obviously update the value, but has
to do so explicitly.  How about "caml_(un)register_generational_root"
and "caml_modify_generational_root"?

> I'm willing to implement any of these 2 approaches, but it is not a
> transparent change in either case.

Thanks a lot, this would be great!

Best regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-03-07 17:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-06 23:51 Global roots causing performance problems Markus Mottl
2008-03-07 14:10 ` [Caml-list] " Xavier Leroy
2008-03-07 14:52   ` Berke Durak
2008-03-07 16:45   ` Richard Jones
2008-03-07 17:05   ` Markus Mottl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).