caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Segfaults with Dynlink with OCaml 3.11
@ 2010-08-23 10:57 Paul Steckler
  2010-08-23 11:06 ` [Caml-list] " Stéphane Glondu
  2010-08-25  4:00 ` Paul Steckler
  0 siblings, 2 replies; 10+ messages in thread
From: Paul Steckler @ 2010-08-23 10:57 UTC (permalink / raw)
  To: caml-list

I'm getting segmentation faults when using dynamically linked native
code in 64-bit OCaml 3.11 running
on Linux (Fedora 12 x64).

The .cmxs file loads fine.  There's a glue module that's "open"d in
the code for the dynamic module, and
linked against the main program.  The dynamic module calls functions
that modifies lists in the glue module;
the main code calls functions in the glue module that return the
current values of those lists.  The code that
modifies the lists seems to work OK, but the query functions reliably
give a crash.

I've written some small example programs with a similar structure, and
those work just fine.  In my real,
large program, which pulls in a lot of OCaml libraries, I get segfaults.

Any ideas what might be going wrong?  My code is not compiled with -nodynlink.

-- Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Segfaults with Dynlink with OCaml 3.11
  2010-08-23 10:57 Segfaults with Dynlink with OCaml 3.11 Paul Steckler
@ 2010-08-23 11:06 ` Stéphane Glondu
  2010-08-23 11:47   ` Paul Steckler
  2010-08-25  4:00 ` Paul Steckler
  1 sibling, 1 reply; 10+ messages in thread
From: Stéphane Glondu @ 2010-08-23 11:06 UTC (permalink / raw)
  To: Paul Steckler; +Cc: caml-list

Le 23/08/2010 12:57, Paul Steckler a écrit :
> I'm getting segmentation faults when using dynamically linked native
> code in 64-bit OCaml 3.11 running
> on Linux (Fedora 12 x64). [...]
> I've written some small example programs with a similar structure, and
> those work just fine.  In my real,
> large program, which pulls in a lot of OCaml libraries, I get segfaults.

Does your real large program use C bindings? Are you able to reproduce
the segfaults with pure OCaml code?


Cheers,

-- 
Stéphane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Segfaults with Dynlink with OCaml 3.11
  2010-08-23 11:06 ` [Caml-list] " Stéphane Glondu
@ 2010-08-23 11:47   ` Paul Steckler
  2010-08-23 12:05     ` Mark Shinwell
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Steckler @ 2010-08-23 11:47 UTC (permalink / raw)
  To: Stéphane Glondu; +Cc: caml-list

On Mon, Aug 23, 2010 at 9:06 PM, Stéphane Glondu <steph@glondu.net> wrote:
> Does your real large program use C bindings? Are you able to reproduce
> the segfaults with pure OCaml code?

Yes, the large program has C bindings, including calls into dynamically loaded
.so files (Linux dynamic libraries).  Anything I should look into for those?

Thanks,
-- Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Segfaults with Dynlink with OCaml 3.11
  2010-08-23 11:47   ` Paul Steckler
@ 2010-08-23 12:05     ` Mark Shinwell
  2010-08-23 12:12       ` Paul Steckler
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Shinwell @ 2010-08-23 12:05 UTC (permalink / raw)
  To: Paul Steckler; +Cc: caml-list

On Mon, Aug 23, 2010 at 09:47:43PM +1000, Paul Steckler wrote:
> On Mon, Aug 23, 2010 at 9:06 PM, Stéphane Glondu <steph@glondu.net> wrote:
> > Does your real large program use C bindings? Are you able to reproduce
> > the segfaults with pure OCaml code?
> 
> Yes, the large program has C bindings, including calls into dynamically loaded
> .so files (Linux dynamic libraries).  Anything I should look into for those?

It can be a time-consuming task, but double-check all the rules on
http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html are being followed.
For example, watch out for things like variables of type [value] that are
wrongly not protected with CAMLlocal/CAMLparam macros across allocation points
when the variable's contents are needed later, uses of the Field macro as an
lvalue in a situation where instead Store_field must be used, assigning to
generational global roots without using the correct function, etc.  Just one
minor transgression could be the cause of a hard-to-find error at some random
point during program execution.

Have you tried using gdb to determine the stack backtrace when it segfaults?
Also, if it can be done without disturbing too much code, it might be worth
trying to eliminate Dynlink from the program as a test.

Mark


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Segfaults with Dynlink with OCaml 3.11
  2010-08-23 12:05     ` Mark Shinwell
@ 2010-08-23 12:12       ` Paul Steckler
  2010-08-23 12:15         ` Daniel Bünzli
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Paul Steckler @ 2010-08-23 12:12 UTC (permalink / raw)
  To: Mark Shinwell; +Cc: caml-list

On Mon, Aug 23, 2010 at 10:05 PM, Mark Shinwell
<mshinwell@janestreet.com> wrote:
> It can be a time-consuming task, but double-check all the rules on
> http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html are being followed.
> For example, watch out for things like variables of type [value] that are
> wrongly not protected with CAMLlocal/CAMLparam macros across allocation points
> when the variable's contents are needed later, uses of the Field macro as an
> lvalue in a situation where instead Store_field must be used, assigning to
> generational global roots without using the correct function, etc.  Just one
> minor transgression could be the cause of a hard-to-find error at some random
> point during program execution.

OK, I'll do an audit of those calls.

> Have you tried using gdb to determine the stack backtrace when it segfaults?
> Also, if it can be done without disturbing too much code, it might be worth
> trying to eliminate Dynlink from the program as a test.

I've already tried gdb, which is how I learned that the segfault
occurs during a call
to one of the query functions in my glue module.

Oh, we just added the Dynlink stuff.  There haven't been any recent
crashes until
just now.  That could be an unhappy coincidence; the real issue might
lurk in unrelated
code, as you point out.

-- Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Segfaults with Dynlink with OCaml 3.11
  2010-08-23 12:12       ` Paul Steckler
@ 2010-08-23 12:15         ` Daniel Bünzli
  2010-08-23 12:28         ` Anil Madhavapeddy
  2010-08-23 15:48         ` Stéphane Glondu
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel Bünzli @ 2010-08-23 12:15 UTC (permalink / raw)
  To: Paul Steckler; +Cc: Mark Shinwell, caml-list

Here are also a few tips (also in the comments) to chase that bug.

http://rwmj.wordpress.com/2010/01/22/tip-tracking-down-ocaml-heap-corruptors/

Daniel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Segfaults with Dynlink with OCaml 3.11
  2010-08-23 12:12       ` Paul Steckler
  2010-08-23 12:15         ` Daniel Bünzli
@ 2010-08-23 12:28         ` Anil Madhavapeddy
  2010-08-23 15:48         ` Stéphane Glondu
  2 siblings, 0 replies; 10+ messages in thread
From: Anil Madhavapeddy @ 2010-08-23 12:28 UTC (permalink / raw)
  To: Paul Steckler; +Cc: Mark Shinwell, caml-list

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On 23 Aug 2010, at 13:12, Paul Steckler wrote:

> On Mon, Aug 23, 2010 at 10:05 PM, Mark Shinwell
> <mshinwell@janestreet.com> wrote:
> 
>> Have you tried using gdb to determine the stack backtrace when it segfaults?
>> Also, if it can be done without disturbing too much code, it might be worth
>> trying to eliminate Dynlink from the program as a test.
> 
> I've already tried gdb, which is how I learned that the segfault
> occurs during a call
> to one of the query functions in my glue module.
> 
> Oh, we just added the Dynlink stuff.  There haven't been any recent
> crashes until
> just now.  That could be an unhappy coincidence; the real issue might
> lurk in unrelated
> code, as you point out.
> 

OCaml's runtime library also has a debug version which performs additional integrity checks on the heap during garbage collection and other operations. This can help catch problems much closer to their source than the production version of the library.  I build it by:

$ cd ocaml-3.11.2/
$ ./configure <usual args>
$ cd asmrun
$ make libasmrund.a
$ cp libasmrund.a /opt/local/lib/ocaml/

Then I swap the installed libasmrun.a with libasmrund.a when a debug version is needed, and replace the original when done.  Regularly calling Gc.compact() helps triggers the additional checks more often.

I'm not sure if there is a way of using the debug version more easily --- libasmrunp.a (the profiling version) is installed by default, but libasmrund.a is not.

-anil


[-- Attachment #2: Type: text/html, Size: 2125 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Segfaults with Dynlink with OCaml 3.11
  2010-08-23 12:12       ` Paul Steckler
  2010-08-23 12:15         ` Daniel Bünzli
  2010-08-23 12:28         ` Anil Madhavapeddy
@ 2010-08-23 15:48         ` Stéphane Glondu
  2 siblings, 0 replies; 10+ messages in thread
From: Stéphane Glondu @ 2010-08-23 15:48 UTC (permalink / raw)
  To: caml-list

Le 23/08/2010 14:12, Paul Steckler a écrit :
> [...]
> Oh, we just added the Dynlink stuff.  There haven't been any recent
> crashes until
> just now.  That could be an unhappy coincidence; the real issue might
> lurk in unrelated
> code, as you point out.

Note that Dynlink can point out bugs (even in your own code) that don't
happen without Dynlink. For example, code implicitly assuming that
top-level declarations are executed while the program is loading.

-- 
Stéphane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Segfaults with Dynlink with OCaml 3.11
  2010-08-23 10:57 Segfaults with Dynlink with OCaml 3.11 Paul Steckler
  2010-08-23 11:06 ` [Caml-list] " Stéphane Glondu
@ 2010-08-25  4:00 ` Paul Steckler
  2010-09-03 14:59   ` [Caml-list] " Damien Doligez
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Steckler @ 2010-08-25  4:00 UTC (permalink / raw)
  To: caml-list

On Mon, Aug 23, 2010 at 8:57 PM, Paul Steckler <steck@stecksoft.com> wrote:
> I'm getting segmentation faults when using dynamically linked native
> code in 64-bit OCaml 3.11 running on Linux (Fedora 12 x64).

Many thanks to all who gave useful advice on tracking down this problem.

We have three chunks of C code we're calling, so we went through those
and audited our use of the FFI conventions.  Indeed, we found a number
of instances where we used return instead of CAMLreturn, and so on.

But the segfaults were occurring before our C code was ever called,
and before any code was called in OCaml packages we use that are
linked against C code, such as sqlite3.  So the segfaults occurred
even after patching our C code.

Today, I found the culprit.  Here's the pattern:

   dynamically load .cmxs file
   query list mutated by .cmxs file      (* no problem *)
   Gc.set { (Gc.get()) with Gc.minor_heap_size         = ...};
   Gc.set { (Gc.get()) with Gc.major_heap_increment = ... };
   query mutated list   (* segfault! *)

If I move the Gc.set's to the program initialization code, before the
loading of dynamic code, no segfaults occur.

Is this expected behavior?  I don't see caveats about interaction with
the garbage collector in the documentation for the Dynlink module, nor
anything about dynamic linking in the Gc module documentation.

-- Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Re: Segfaults with Dynlink with OCaml 3.11
  2010-08-25  4:00 ` Paul Steckler
@ 2010-09-03 14:59   ` Damien Doligez
  0 siblings, 0 replies; 10+ messages in thread
From: Damien Doligez @ 2010-09-03 14:59 UTC (permalink / raw)
  To: caml-list caml-list; +Cc: Paul Steckler


On 2010-08-25, at 06:00, Paul Steckler wrote:

> Today, I found the culprit.  Here's the pattern:
> 
>   dynamically load .cmxs file
>   query list mutated by .cmxs file      (* no problem *)
>   Gc.set { (Gc.get()) with Gc.minor_heap_size         = ...};
>   Gc.set { (Gc.get()) with Gc.major_heap_increment = ... };
>   query mutated list   (* segfault! *)
> 
> If I move the Gc.set's to the program initialization code, before the
> loading of dynamic code, no segfaults occur.

I bet the second Gc.set doesn't matter, and if you replace them with
Gc.minor () you get the same behaviour.

> Is this expected behavior?

Definitely not.  You should file a bug report here, preferably with
a complete repro case:
> Bug reports: http://caml.inria.fr/bin/caml-bugs


-- Damien


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-09-03 14:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-23 10:57 Segfaults with Dynlink with OCaml 3.11 Paul Steckler
2010-08-23 11:06 ` [Caml-list] " Stéphane Glondu
2010-08-23 11:47   ` Paul Steckler
2010-08-23 12:05     ` Mark Shinwell
2010-08-23 12:12       ` Paul Steckler
2010-08-23 12:15         ` Daniel Bünzli
2010-08-23 12:28         ` Anil Madhavapeddy
2010-08-23 15:48         ` Stéphane Glondu
2010-08-25  4:00 ` Paul Steckler
2010-09-03 14:59   ` [Caml-list] " Damien Doligez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).