From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p5T8qcJG004977 for ; Wed, 29 Jun 2011 10:52:38 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhQCAKbmCk7RVdo2mGdsb2JhbAA8AQMSglGVWYcPAYgTCBQBAQEBAQgJDQcUJYh4ozOMGoJLhFE5iGgCAwaGKgSCSoRmimeMGDyDWw X-IronPort-AV: E=Sophos;i="4.65,442,1304287200"; d="scan'208";a="102182132" Received: from mail-yi0-f54.google.com ([209.85.218.54]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 29 Jun 2011 10:52:32 +0200 Received: by yic13 with SMTP id 13so617642yic.27 for ; Wed, 29 Jun 2011 01:52:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=kh52Ir73N7lx0lOdUsJxTIFyxdNfay3/7O45el0Y+P4=; b=QW3E4ugDBVspdPvp8TjVBhd7hu+ahlXyshWQ+AThLOywsEtzKzH/U/eOvbalKUFlqx fTSUCTju+xCGn87oCRhbwZ2xBekZWmho6UP0vyIlLW7OF5cqX6x6Bdv/jS17eel9GQPI Hsig/hB42GcPgiGG4AVtNp1NziiqlrFitxODY= MIME-Version: 1.0 Received: by 10.236.176.38 with SMTP id a26mr453047yhm.410.1309337550699; Wed, 29 Jun 2011 01:52:30 -0700 (PDT) Received: by 10.236.157.73 with HTTP; Wed, 29 Jun 2011 01:52:30 -0700 (PDT) In-Reply-To: <20101124002030.GA9493@yeeloong> References: <20101124002030.GA9493@yeeloong> Date: Wed, 29 Jun 2011 12:52:30 +0400 Message-ID: From: SerP To: rixed@happyleptic.org Cc: caml-list@inria.fr Content-Type: multipart/alternative; boundary=20cf303b397599bd4e04a6d5e77c Subject: Re: [Caml-list] Segfault in ARM EABI for programm compiled with ocamlopt 3.12.0 --20cf303b397599bd4e04a6d5e77c Content-Type: text/plain; charset=ISO-8859-1 It took a long time, could you understand why this bug happens. On the iphone I get the same bug with ocaml-3.12? On Wed, Nov 24, 2010 at 3:20 AM, wrote: > For some time now I'm after a bug hitting a program of mine when > compiled on ARM with ocaml 3.12.0. > I initially though my own C code was misbehaving but the program keep > crashing, although not as early, if I comment out all calls to the C > functions. > > The segfaults happen frequently during the GC, in oldify_one or > oldify_mopup, but also in a few other places such as camlList__rev_append > or caml__apply2 or any other places as well. In caml_oldify_one, for > instance, the segfault always happen at the same location : the > assertion that sz is not 0 (and of course when you read the code it's > pretty clear that sz=0 correspond to the case "already forwarded" that's > handled at the beginning of the function). > > The pattern, then, is that a register (usually r0, r2 or r5) is > restored from the stack after a call to a function that might call the > GC (or to a call to the GC itself), then dereferenced. It's obvious > inspecting the stack with gdb that this very word was changed during the > call and a value like 0, 3 or 1024 is read back into the register > instead of an mlvalue. > > I didn't managed (yet) to reduce the size of the program to a small show > case, and I am under the impression that all these components are > required in order for the bug to happen 'fast enough' : > > - threads > - floats > - call to C function (greatly reduce the time to wait before the crash) > > I am also under the impression that the bug is affected by the new stack > alignment requirement (because in one occurrence, calling or not a > function that does nothing from within a function hit by the bug reduced > drastically the probability of the bug, and the major difference I saw > was that on one version of the function the stack size was 16 bytes and > the other 24 bytes (16+4 apparently for the address of a "module" > structure, aligned up to 24 bytes). I thus manually checked the > generated framesets but they were allright as far as I understand them. > > Now I'm a little desperate since each recompile+test takes about 20 > minutes and the bug is so erratic ; so if someone here is familiar with > ARM arch and in particular the difference between old and new ABI please > suggest me what I should check, or any hint whatsoever. I'd be very much > grateful as this consumes a lot of my spare time. > > Also, I'm compiling ocaml with gcc 4.2.1 - do you think it may be a > problem with gcc not following the very same ABI ? > > Also I've run the testsuite but it did not reveal anything. > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > --20cf303b397599bd4e04a6d5e77c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It took a long=A0time,=A0could=A0you understand=A0why this bug ha= ppens.
On the=A0iphone=A0I ge= t=A0t= he same=A0bug with ocaml-3.12?

On Wed, Nov 24, 2010 at 3:20 AM, <rixed@happyleptic.org= > wrote:
For some time now I'm after a bug hitting a program of mine when
compiled on ARM with ocaml 3.12.0.
I initially though my own C code was misbehaving but the program keep
crashing, although not as early, if I comment out all calls to the C
functions.

The segfaults happen frequently during the GC, in oldify_one or
oldify_mopup, but also in a few other places such as camlList__rev_append or caml__apply2 or any other places as well. In caml_oldify_one, for
instance, the segfault always happen at the same location : the
assertion that sz is not 0 (and of course when you read the code it's pretty clear that sz=3D0 correspond to the case "already forwarded&quo= t; that's
handled at the beginning of the function).

The pattern, then, is that a register (usually r0, r2 or r5) is
restored from the stack after a call to a function that might call the
GC (or to a call to the GC itself), then dereferenced. It's obvious
inspecting the stack with gdb that this very word was changed during the
call and a value like 0, 3 or 1024 is read back into the register
instead of an mlvalue.

I didn't managed (yet) to reduce the size of the program to a small sho= w
case, and I am under the impression that all these components are
required in order for the bug to happen 'fast enough' :

- threads
- floats
- call to C function (greatly reduce the time to wait before the crash)

I am also under the impression that the bug is affected by the new stack
alignment requirement (because in one occurrence, calling or not a
function that does nothing from within a function hit by the bug reduced
drastically the probability of the bug, and the major difference I saw
was that on one version of the function the stack size was 16 bytes and
the other 24 bytes (16+4 apparently for the address of a "module"=
structure, aligned up to 24 bytes). I thus manually checked the
generated framesets but they were allright as far as I understand them.

Now I'm a little desperate since each recompile+test takes about 20
minutes and the bug is so erratic ; so if someone here is familiar with
ARM arch and in particular the difference between old and new ABI please
suggest me what I should check, or any hint whatsoever. I'd be very muc= h
grateful as this consumes a lot of my spare time.

Also, I'm compiling ocaml with gcc 4.2.1 - do you think it may be a
problem with gcc not following the very same ABI ?

Also I've run the testsuite but it did not reveal anything.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.in= ria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

--20cf303b397599bd4e04a6d5e77c--