caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Segfault in C++ stub with many 'new' allocations
@ 2012-11-17 18:29 Samuel Hornus
  2012-11-17 18:35 ` Kakadu
  2012-11-17 19:46 ` Markus Mottl
  0 siblings, 2 replies; 8+ messages in thread
From: Samuel Hornus @ 2012-11-17 18:29 UTC (permalink / raw)
  To: O Caml


Dear all,

I have two questions.

1/ I'm writing a stub to the C++ ANN library [1] to find geometric neighboring points in space.
The constructor of the main class in this library uses a lot of allocation with the "new" C++ keyword.
For small input point sets (e.g. 2500 points), it all seems to work fine.
For larger ones (50 K points), the C++ constructor crashes.
My question is : is it possible that the C++ "new" allocator differs sufficiently from the C-style malloc, that bad interactions with OCaml heap happen ?

(I'm passing the input points coordinates in a plain bigarray.)

2/ Regarding bigarray: before using them, I let the C++ constructor access, and keep pointers inside regular OCaml [float array] or [float array array]. It was working well (again, for small input point set), but is that safe ? Or can the garbage collector eventually relocate the content of a  [float array]  or of a [float array array] ? so that the pointer kept in the C++ class would become dangling ?

Thank you in advance,
Sam

[1] Approximate Nearest Neighbors http://www.cs.umd.edu/~mount/ANN/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Segfault in C++ stub with many 'new' allocations
  2012-11-17 18:29 [Caml-list] Segfault in C++ stub with many 'new' allocations Samuel Hornus
@ 2012-11-17 18:35 ` Kakadu
  2012-11-17 18:42   ` Samuel Hornus
  2012-11-17 19:46 ` Markus Mottl
  1 sibling, 1 reply; 8+ messages in thread
From: Kakadu @ 2012-11-17 18:35 UTC (permalink / raw)
  To: Samuel Hornus; +Cc: O Caml

[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]

2) AFAIR there are functions like register_global_root which prevents GC
from moving values.

Best wishes,
Kakadu


On Sat, Nov 17, 2012 at 10:29 PM, Samuel Hornus <Samuel.Hornus@inria.fr>wrote:

>
> Dear all,
>
> I have two questions.
>
> 1/ I'm writing a stub to the C++ ANN library [1] to find geometric
> neighboring points in space.
> The constructor of the main class in this library uses a lot of allocation
> with the "new" C++ keyword.
> For small input point sets (e.g. 2500 points), it all seems to work fine.
> For larger ones (50 K points), the C++ constructor crashes.
> My question is : is it possible that the C++ "new" allocator differs
> sufficiently from the C-style malloc, that bad interactions with OCaml heap
> happen ?
>
> (I'm passing the input points coordinates in a plain bigarray.)
>
> 2/ Regarding bigarray: before using them, I let the C++ constructor
> access, and keep pointers inside regular OCaml [float array] or [float
> array array]. It was working well (again, for small input point set), but
> is that safe ? Or can the garbage collector eventually relocate the content
> of a  [float array]  or of a [float array array] ? so that the pointer kept
> in the C++ class would become dangling ?
>
> Thank you in advance,
> Sam
>
> [1] Approximate Nearest Neighbors http://www.cs.umd.edu/~mount/ANN/
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

[-- Attachment #2: Type: text/html, Size: 2285 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Segfault in C++ stub with many 'new' allocations
  2012-11-17 18:35 ` Kakadu
@ 2012-11-17 18:42   ` Samuel Hornus
  2012-11-17 19:10     ` Török Edwin
  0 siblings, 1 reply; 8+ messages in thread
From: Samuel Hornus @ 2012-11-17 18:42 UTC (permalink / raw)
  To: O Caml


On 17 nov. 2012, at 19:35, Kakadu wrote:

> 2) AFAIR there are functions like register_global_root which prevents GC from moving values.

Thank you Kakadu. It is interesting to know that one can use this function directly. I knew its existence only through the CAML[x]param* macros…

Regarding my first question, if it can help, here is the relevant part of my C++ code:
---8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<---
ANNbd_tree * create_bd_tree_flat(ANNcoord * arr, int n, int dim)
{
	// ANNcoord = double
	// ANNpoint = double*
	// ANNpointArray = double**

    ANNbd_tree * ptr;
    std::cerr << "\nCreating flat Bd_tree with " << n << " points of dim " << dim;

    ANNpointArray ptarr = new ANNpoint[n];
    for( int i = 0; i < n; ++i )
    {
        ptarr[i] = arr + dim * i;
    }
    std::cerr << " CHK ";
    ptr = new ANNbd_tree(ptarr,n,dim); // Crashes here when 'n' is big
    std::cerr << "done";
    return ptr;
}

extern "C" value ann_create_bd_tree_flat_ba (value ba, value n, value dim)
{
    CAMLparam3(ba,n,dim);
    ANNbd_tree * ptr = create_bd_tree_flat((ANNcoord*)Caml_ba_data_val(ba), Int_val(n), Int_val(dim));
    CAMLreturn ((value)ptr);
}
---8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<---

The second function is called from OCaml, and the crash happens in the first function, in the constructor ANNbd_tree() ("CHK" is printed on the terminal, "done" is not)
-- 
Sam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Segfault in C++ stub with many 'new' allocations
  2012-11-17 18:42   ` Samuel Hornus
@ 2012-11-17 19:10     ` Török Edwin
  2012-11-17 19:18       ` Samuel Hornus
  2012-11-17 19:50       ` Markus Mottl
  0 siblings, 2 replies; 8+ messages in thread
From: Török Edwin @ 2012-11-17 19:10 UTC (permalink / raw)
  To: caml-list

On 11/17/2012 08:42 PM, Samuel Hornus wrote:
> 
> On 17 nov. 2012, at 19:35, Kakadu wrote:
> 
>> 2) AFAIR there are functions like register_global_root which prevents GC from moving values.
> 
> Thank you Kakadu. It is interesting to know that one can use this function directly. I knew its existence only through the CAML[x]param* macros…
> 
> Regarding my first question, if it can help, here is the relevant part of my C++ code:
> extern "C" value ann_create_bd_tree_flat_ba (value ba, value n, value dim)
> {
>     CAMLparam3(ba,n,dim);
>     ANNbd_tree * ptr = create_bd_tree_flat((ANNcoord*)Caml_ba_data_val(ba), Int_val(n), Int_val(dim));
>     CAMLreturn ((value)ptr);

I don't think this is safe. Shouldn't the return value be wrapped in a custom block?
If the GC tries to interpret this as an OCaml value (which it isn't) it might crash.

--Edwin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Segfault in C++ stub with many 'new' allocations
  2012-11-17 19:10     ` Török Edwin
@ 2012-11-17 19:18       ` Samuel Hornus
  2012-11-17 19:50       ` Markus Mottl
  1 sibling, 0 replies; 8+ messages in thread
From: Samuel Hornus @ 2012-11-17 19:18 UTC (permalink / raw)
  To: O Caml


On 17 nov. 2012, at 20:10, Török Edwin wrote:

> On 11/17/2012 08:42 PM, Samuel Hornus wrote:
>> 
>> Regarding my first question, if it can help, here is the relevant part of my C++ code:
>> extern "C" value ann_create_bd_tree_flat_ba (value ba, value n, value dim)
>> {
>>    CAMLparam3(ba,n,dim);
>>    ANNbd_tree * ptr = create_bd_tree_flat((ANNcoord*)Caml_ba_data_val(ba), Int_val(n), Int_val(dim));
>>    CAMLreturn ((value)ptr);
> 
> I don't think this is safe. Shouldn't the return value be wrapped in a custom block?
> If the GC tries to interpret this as an OCaml value (which it isn't) it might crash.

Currently, as advised in F. Monier webpage http://www.linux-nantes.org/~fmonnier/ocaml/ocaml-wrapping-c.php#ref_finalise, I wrap the pointer, in the OCaml code, inside an abstract OCaml type with a finalizer. Could there still be problems with the GC ?
That being said, my crash happens *inside* the C++ constructor, well before the garbage collection may have kicked in. The same code, compiled as a pure C++ program works without problem, even with millions of points.
Is it possible that C++ called from OCaml has a limited memory allocation space?
-- 
Sam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Segfault in C++ stub with many 'new' allocations
  2012-11-17 18:29 [Caml-list] Segfault in C++ stub with many 'new' allocations Samuel Hornus
  2012-11-17 18:35 ` Kakadu
@ 2012-11-17 19:46 ` Markus Mottl
  2012-11-17 22:45   ` Samuel Hornus
  1 sibling, 1 reply; 8+ messages in thread
From: Markus Mottl @ 2012-11-17 19:46 UTC (permalink / raw)
  To: Samuel Hornus; +Cc: O Caml

On Sat, Nov 17, 2012 at 1:29 PM, Samuel Hornus <Samuel.Hornus@inria.fr> wrote:
> 1/ I'm writing a stub to the C++ ANN library [1] to find geometric neighboring points in space.
> The constructor of the main class in this library uses a lot of allocation with the "new" C++ keyword.
> For small input point sets (e.g. 2500 points), it all seems to work fine.
> For larger ones (50 K points), the C++ constructor crashes.
> My question is : is it possible that the C++ "new" allocator differs sufficiently from the C-style malloc, that bad interactions with OCaml heap happen ?

Having written quite some C++ bindings, I'm unaware of any such
problems.  Segfaults are typically due to incorrect interaction with
the OCaml runtime.

> (I'm passing the input points coordinates in a plain bigarray.)
>
> 2/ Regarding bigarray: before using them, I let the C++ constructor access, and keep pointers inside regular OCaml [float array] or [float array array]. It was working well (again, for small input point set), but is that safe ? Or can the garbage collector eventually relocate the content of a  [float array]  or of a [float array array] ? so that the pointer kept in the C++ class would become dangling ?

No, it's not safe to keep pointers into any standard OCaml array if
allocations can happen.  Kakadu mentions register_global_root in a
reply, but it is not correct that this prevents the GC from moving
values, it merely protects against their reclamation (i.e. it keeps
the values live).

Even if you want to use bigarrays, you will have to protect them from
being reclaimed while your C++ code is accessing them.  But otherwise
their contents will remain fixed in memory, because unlike with
standard arrays it lives outside the OCaml heap.

For numerical calculations I'd strongly suggest using bigarrays,
especially if the C/C++ functions can take a long time to run.  In
this case you could then release the OCaml runtime lock and benefit
from parallelism and/or improved latencies if your OCaml program needs
to react to the outside world.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Segfault in C++ stub with many 'new' allocations
  2012-11-17 19:10     ` Török Edwin
  2012-11-17 19:18       ` Samuel Hornus
@ 2012-11-17 19:50       ` Markus Mottl
  1 sibling, 0 replies; 8+ messages in thread
From: Markus Mottl @ 2012-11-17 19:50 UTC (permalink / raw)
  To: Török Edwin; +Cc: caml-list

On Sat, Nov 17, 2012 at 2:10 PM, Török Edwin <edwin+ml-ocaml@etorok.net> wrote:
> I don't think this is safe. Shouldn't the return value be wrapped in a custom block?
> If the GC tries to interpret this as an OCaml value (which it isn't) it might crash.

The GC will never follow pointers that point outside the OCaml heap.
When done correctly (which requires great care), you can therefore
pass around pointers to C-land.  You'll, of course, then have to
manually deallocate C/C++-heap objects by making the appropriate calls
(free / delete).  Just make sure you never pass around pointers to
memory that has already been deallocated.  Otherwise OCaml might
eventually claim it and cause crashes when following pointers.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Segfault in C++ stub with many 'new' allocations
  2012-11-17 19:46 ` Markus Mottl
@ 2012-11-17 22:45   ` Samuel Hornus
  0 siblings, 0 replies; 8+ messages in thread
From: Samuel Hornus @ 2012-11-17 22:45 UTC (permalink / raw)
  To: O Caml

On 17 nov. 2012, at 20:46, Markus Mottl wrote:

> On Sat, Nov 17, 2012 at 1:29 PM, Samuel Hornus <Samuel.Hornus@inria.fr> wrote:
>> 1/ I'm writing a stub to the C++ ANN library [1] to find geometric neighboring points in space.
>> The constructor of the main class in this library uses a lot of allocation with the "new" C++ keyword.
>> For small input point sets (e.g. 2500 points), it all seems to work fine.
>> For larger ones (50 K points), the C++ constructor crashes.
>> My question is : is it possible that the C++ "new" allocator differs sufficiently from the C-style malloc, that bad interactions with OCaml heap happen ?
> 
> Having written quite some C++ bindings, I'm unaware of any such
> problems.  Segfaults are typically due to incorrect interaction with
> the OCaml runtime.

It turns out I was putting a bit too much faith in libANN. I must use it wrongly & need to investigate more since, even after removing input points with same coordinates, it does enter an infinite loop while computing the nearest-neighbors data structure, and exhausts the stack, hence the crash.
I'm glad that this is not a C++::new problem, but a more easily debug-able one.

>> 2/ Regarding bigarray: before using them, I let the C++ constructor access, and keep pointers inside regular OCaml [float array] or [float array array]. It was working well (again, for small input point set), but is that safe ? Or can the garbage collector eventually relocate the content of a  [float array]  or of a [float array array] ? so that the pointer kept in the C++ class would become dangling ?
> 
> No, it's not safe to keep pointers into any standard OCaml array if
> allocations can happen.  Kakadu mentions register_global_root in a
> reply, but it is not correct that this prevents the GC from moving
> values, it merely protects against their reclamation (i.e. it keeps
> the values live).

OK ! I'll stick to bigarrays.

> Even if you want to use bigarrays, you will have to protect them from
> being reclaimed while your C++ code is accessing them.  But otherwise
> their contents will remain fixed in memory, because unlike with
> standard arrays it lives outside the OCaml heap.

So my Ann module defines

	type internal
	type t = internal * bigarray

and the Ann.mli abstracts the type t.

The creation of the ANN data structure returns a block of size 1 with tag Abstract_tag, containing the C++pointer, as suggested in the OCaml manual. The resulting value, call it 'ptr', is stored with the abstract type "internal" in the pair (ptr,ba) where ba is the bigarray containing the points (so that the big array lives at least as long as my ANN C++ object). A special destructor is attached to that pair with Gc.finalise. I believe (perhaps naively?) that this is sufficient to avoid memory corruption (unless I write stupid code in the Ann module itself).

> For numerical calculations I'd strongly suggest using bigarrays,
> especially if the C/C++ functions can take a long time to run.  In
> this case you could then release the OCaml runtime lock and benefit
> from parallelism and/or improved latencies if your OCaml program needs
> to react to the outside world.

Thank you Markus,
-- 
Sam

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-11-17 22:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-17 18:29 [Caml-list] Segfault in C++ stub with many 'new' allocations Samuel Hornus
2012-11-17 18:35 ` Kakadu
2012-11-17 18:42   ` Samuel Hornus
2012-11-17 19:10     ` Török Edwin
2012-11-17 19:18       ` Samuel Hornus
2012-11-17 19:50       ` Markus Mottl
2012-11-17 19:46 ` Markus Mottl
2012-11-17 22:45   ` Samuel Hornus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).