caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* segfault in 3.10.0
@ 2007-08-01 21:10 Andres Varon
  2007-08-01 22:27 ` [Caml-list] " Andres Varon
  2007-08-02  5:05 ` Markus Mottl
  0 siblings, 2 replies; 7+ messages in thread
From: Andres Varon @ 2007-08-01 21:10 UTC (permalink / raw)
  To: OCaml List

Hello Everyone,

Has anyone observed sudden segfaults in OCaml 3.10.0 amd-64 under  
linux? I have this program that has been run for quite a while  
without any segfault (almost a year). It may run for weeks at a time  
in parallel in our cluster, using as many as 256 processors at a  
time. We delayed updating to 3.10.0 due to the changes in camlp4, but  
the day before yesterday I worked on it, upgraded, and suddenly a lot  
of the nightly unit tests in 64 bits fail with a segfault (a LOT of  
them), when every test passed clean with 3.09.3. None of the tests  
for other architectures fail though (windows, mac os X intel and 32- 
bit ppc). One down side is that we have C structures wrapped, so one  
may blame our program.

However, efence and valgrind show no sign of problem, and the fact  
that we have been using those structures for a while, and many test  
iterations have passed for many inputs, and not only us, but many  
people in other computers have compiled and run our program without  
having segfaults ... I'm doubtful.

Unfortunately I have been unable to compile 3.10.0 for 64 bits in our  
g5 under Mac OS X, so that architecture remains untested. The  
segfault is occurring within the OCaml code. Any pointer would be  
greatly appreciated, or suggestions of tools that could help us  
hunting this down ... I'm awfully clueless about what to do today ...  
and it's been just a couple of days :-(

best,

Andres 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] segfault in 3.10.0
  2007-08-01 21:10 segfault in 3.10.0 Andres Varon
@ 2007-08-01 22:27 ` Andres Varon
  2007-08-02  0:39   ` Yaron Minsky
  2007-08-02  5:05 ` Markus Mottl
  1 sibling, 1 reply; 7+ messages in thread
From: Andres Varon @ 2007-08-01 22:27 UTC (permalink / raw)
  To: OCaml List

Another bit of information is that I can segfault the program even  
for the most trivial input, with minimal computations involved. And  
when run under gdb, the backtrace lists pure OCaml function calls,  
and always the same one.

On Aug 1, 2007, at 5:10 PM, Andres Varon wrote:

> Hello Everyone,
>
> Has anyone observed sudden segfaults in OCaml 3.10.0 amd-64 under  
> linux? I have this program that has been run for quite a while  
> without any segfault (almost a year). It may run for weeks at a  
> time in parallel in our cluster, using as many as 256 processors at  
> a time. We delayed updating to 3.10.0 due to the changes in camlp4,  
> but the day before yesterday I worked on it, upgraded, and suddenly  
> a lot of the nightly unit tests in 64 bits fail with a segfault (a  
> LOT of them), when every test passed clean with 3.09.3. None of the  
> tests for other architectures fail though (windows, mac os X intel  
> and 32-bit ppc). One down side is that we have C structures  
> wrapped, so one may blame our program.
>
> However, efence and valgrind show no sign of problem, and the fact  
> that we have been using those structures for a while, and many test  
> iterations have passed for many inputs, and not only us, but many  
> people in other computers have compiled and run our program without  
> having segfaults ... I'm doubtful.
>
> Unfortunately I have been unable to compile 3.10.0 for 64 bits in  
> our g5 under Mac OS X, so that architecture remains untested. The  
> segfault is occurring within the OCaml code. Any pointer would be  
> greatly appreciated, or suggestions of tools that could help us  
> hunting this down ... I'm awfully clueless about what to do  
> today ... and it's been just a couple of days :-(
>
> best,
>
> Andres
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] segfault in 3.10.0
  2007-08-01 22:27 ` [Caml-list] " Andres Varon
@ 2007-08-02  0:39   ` Yaron Minsky
  0 siblings, 0 replies; 7+ messages in thread
From: Yaron Minsky @ 2007-08-02  0:39 UTC (permalink / raw)
  To: OCaml List

[-- Attachment #1: Type: text/plain, Size: 2548 bytes --]

It might be useful to post the stack backtrace.

y

On 8/1/07, Andres Varon <avaron@gmail.com> wrote:
>
> Another bit of information is that I can segfault the program even
> for the most trivial input, with minimal computations involved. And
> when run under gdb, the backtrace lists pure OCaml function calls,
> and always the same one.
>
> On Aug 1, 2007, at 5:10 PM, Andres Varon wrote:
>
> > Hello Everyone,
> >
> > Has anyone observed sudden segfaults in OCaml 3.10.0 amd-64 under
> > linux? I have this program that has been run for quite a while
> > without any segfault (almost a year). It may run for weeks at a
> > time in parallel in our cluster, using as many as 256 processors at
> > a time. We delayed updating to 3.10.0 due to the changes in camlp4,
> > but the day before yesterday I worked on it, upgraded, and suddenly
> > a lot of the nightly unit tests in 64 bits fail with a segfault (a
> > LOT of them), when every test passed clean with 3.09.3. None of the
> > tests for other architectures fail though (windows, mac os X intel
> > and 32-bit ppc). One down side is that we have C structures
> > wrapped, so one may blame our program.
> >
> > However, efence and valgrind show no sign of problem, and the fact
> > that we have been using those structures for a while, and many test
> > iterations have passed for many inputs, and not only us, but many
> > people in other computers have compiled and run our program without
> > having segfaults ... I'm doubtful.
> >
> > Unfortunately I have been unable to compile 3.10.0 for 64 bits in
> > our g5 under Mac OS X, so that architecture remains untested. The
> > segfault is occurring within the OCaml code. Any pointer would be
> > greatly appreciated, or suggestions of tools that could help us
> > hunting this down ... I'm awfully clueless about what to do
> > today ... and it's been just a couple of days :-(
> >
> > best,
> >
> > Andres
> > _______________________________________________
> > Caml-list mailing list. Subscription management:
> > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> > Archives: http://caml.inria.fr
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 3447 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] segfault in 3.10.0
  2007-08-01 21:10 segfault in 3.10.0 Andres Varon
  2007-08-01 22:27 ` [Caml-list] " Andres Varon
@ 2007-08-02  5:05 ` Markus Mottl
  2007-08-02 15:41   ` Andres Varon
  1 sibling, 1 reply; 7+ messages in thread
From: Markus Mottl @ 2007-08-02  5:05 UTC (permalink / raw)
  To: Andres Varon; +Cc: OCaml List

On 8/1/07, Andres Varon <avaron@gmail.com> wrote:
> Has anyone observed sudden segfaults in OCaml 3.10.0 amd-64 under
> linux?

I would almost bet it's that one:

  http://caml.inria.fr/mantis/view.php?id=4300

Avoid enabling native stack backtraces (i.e. don't set
OCAMLRUNPARAM=b=1).  I haven't seen any segfaults yet unless these
were turned on in which case you may see them often.  Native stack
backtraces unfortunately seem still broken.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] segfault in 3.10.0
  2007-08-02  5:05 ` Markus Mottl
@ 2007-08-02 15:41   ` Andres Varon
  2007-08-02 16:46     ` Alain Frisch
  0 siblings, 1 reply; 7+ messages in thread
From: Andres Varon @ 2007-08-02 15:41 UTC (permalink / raw)
  To: OCaml List

Hello Everyone,

This is what I have found:

1. The segfault is eliminated if I use position-dependent machine code
(-fno-PIC), which is the main difference that I can see in the
generated code for amd-64 between 3.09.3 and 3.10.0 (-fPIC is the
default now). When using -fno-PIC there was little difference between
the code spilled using -dlinear between the two versions. (I could not
get my program to compile with the branch release310 (3.10.1+dev0
(2007-05-21)) due to a type error that ... ehm ... isn't really a type
error ;-) ).

2. The bug occurs even when I have removed all calls to my C wrappers.
Therefore, I believe this an OCaml issue.

3. The segfault always happens when switching between a pair of
modules that come from the same functor with slightly different
paramenters.

I will continue trying to produce a small example to report to INRIA.

Finally, I have received some very nice suggestions in private and I
would like to thank those who have kindly sent them.

best,

Andres

On 8/2/07, Markus Mottl <markus.mottl@gmail.com> wrote:
> On 8/1/07, Andres Varon <avaron@gmail.com> wrote:
> > Has anyone observed sudden segfaults in OCaml 3.10.0 amd-64 under
> > linux?
>
> I would almost bet it's that one:
>
>   http://caml.inria.fr/mantis/view.php?id=4300
>
> Avoid enabling native stack backtraces (i.e. don't set
> OCAMLRUNPARAM=b=1).  I haven't seen any segfaults yet unless these
> were turned on in which case you may see them often.  Native stack
> backtraces unfortunately seem still broken.

I was hopping it would be the case ,but no, I don't have

>
> Regards,
> Markus
>
> --
> Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] segfault in 3.10.0
  2007-08-02 15:41   ` Andres Varon
@ 2007-08-02 16:46     ` Alain Frisch
  2007-08-02 17:31       ` Andres Varon
  0 siblings, 1 reply; 7+ messages in thread
From: Alain Frisch @ 2007-08-02 16:46 UTC (permalink / raw)
  To: Andres Varon, caml-list

Andres Varon wrote:
> 1. The segfault is eliminated if I use position-dependent machine code
> (-fno-PIC), which is the main difference that I can see in the
> generated code for amd-64 between 3.09.3 and 3.10.0 (-fPIC is the
> default now). When using -fno-PIC there was little difference between
> the code spilled using -dlinear between the two versions. (I could not
> get my program to compile with the branch release310 (3.10.1+dev0
> (2007-05-21)) due to a type error that ... ehm ... isn't really a type
> error ;-) ).

It would be very helpful if you could try to compile your code with the 
natdynlink branch (based on OCaml 3.10.0). The new -dlcode option 
triggers a different compilation mode (real PIC code, even for OCaml 
symbols).

-- Alain


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] segfault in 3.10.0
  2007-08-02 16:46     ` Alain Frisch
@ 2007-08-02 17:31       ` Andres Varon
  0 siblings, 0 replies; 7+ messages in thread
From: Andres Varon @ 2007-08-02 17:31 UTC (permalink / raw)
  To: Alain Frisch; +Cc: caml-list


On Aug 2, 2007, at 12:46 PM, Alain Frisch wrote:

> Andres Varon wrote:
>> 1. The segfault is eliminated if I use position-dependent machine  
>> code
>> (-fno-PIC), which is the main difference that I can see in the
>> generated code for amd-64 between 3.09.3 and 3.10.0 (-fPIC is the
>> default now). When using -fno-PIC there was little difference between
>> the code spilled using -dlinear between the two versions. (I could  
>> not
>> get my program to compile with the branch release310 (3.10.1+dev0
>> (2007-05-21)) due to a type error that ... ehm ... isn't really a  
>> type
>> error ;-) ).
>
> It would be very helpful if you could try to compile your code with  
> the natdynlink branch (based on OCaml 3.10.0). The new -dlcode  
> option triggers a different compilation mode (real PIC code, even  
> for OCaml symbols).

OK, I cvs update -r natdynlink on top of the branch ocaml3100. I  
tried compiling native code with and without the -dlcode option, and  
in both cases I had the same segfault, in the same place.

Let me know if I can do other tests here.

Andres

>
> -- Alain
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-08-02 17:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-01 21:10 segfault in 3.10.0 Andres Varon
2007-08-01 22:27 ` [Caml-list] " Andres Varon
2007-08-02  0:39   ` Yaron Minsky
2007-08-02  5:05 ` Markus Mottl
2007-08-02 15:41   ` Andres Varon
2007-08-02 16:46     ` Alain Frisch
2007-08-02 17:31       ` Andres Varon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).