From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p5T8qcJG004977
	for <caml-list@sympa-roc.inria.fr>; Wed, 29 Jun 2011 10:52:38 +0200
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AhQCAKbmCk7RVdo2mGdsb2JhbAA8AQMSglGVWYcPAYgTCBQBAQEBAQgJDQcUJYh4ozOMGoJLhFE5iGgCAwaGKgSCSoRmimeMGDyDWw
X-IronPort-AV: E=Sophos;i="4.65,442,1304287200"; 
   d="scan'208";a="102182132"
Received: from mail-yi0-f54.google.com ([209.85.218.54])
  by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 29 Jun 2011 10:52:32 +0200
Received: by yic13 with SMTP id 13so617642yic.27
        for <caml-list@inria.fr>; Wed, 29 Jun 2011 01:52:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type;
        bh=kh52Ir73N7lx0lOdUsJxTIFyxdNfay3/7O45el0Y+P4=;
        b=QW3E4ugDBVspdPvp8TjVBhd7hu+ahlXyshWQ+AThLOywsEtzKzH/U/eOvbalKUFlqx
         fTSUCTju+xCGn87oCRhbwZ2xBekZWmho6UP0vyIlLW7OF5cqX6x6Bdv/jS17eel9GQPI
         Hsig/hB42GcPgiGG4AVtNp1NziiqlrFitxODY=
MIME-Version: 1.0
Received: by 10.236.176.38 with SMTP id a26mr453047yhm.410.1309337550699; Wed,
 29 Jun 2011 01:52:30 -0700 (PDT)
Received: by 10.236.157.73 with HTTP; Wed, 29 Jun 2011 01:52:30 -0700 (PDT)
In-Reply-To: <20101124002030.GA9493@yeeloong>
References: <20101124002030.GA9493@yeeloong>
Date: Wed, 29 Jun 2011 12:52:30 +0400
Message-ID: <BANLkTimth=5Wisg+U+TgFqoBu924bxT_tA@mail.gmail.com>
From: SerP <serp256@gmail.com>
To: rixed@happyleptic.org
Cc: caml-list@inria.fr
Content-Type: multipart/alternative; boundary=20cf303b397599bd4e04a6d5e77c
Subject: Re: [Caml-list] Segfault in ARM EABI for programm compiled with
 ocamlopt 3.12.0


--20cf303b397599bd4e04a6d5e77c
Content-Type: text/plain; charset=ISO-8859-1

It took a long time, could you understand why this bug happens.
On the iphone I get the same bug with ocaml-3.12?

On Wed, Nov 24, 2010 at 3:20 AM, <rixed@happyleptic.org> wrote:

> For some time now I'm after a bug hitting a program of mine when
> compiled on ARM with ocaml 3.12.0.
> I initially though my own C code was misbehaving but the program keep
> crashing, although not as early, if I comment out all calls to the C
> functions.
>
> The segfaults happen frequently during the GC, in oldify_one or
> oldify_mopup, but also in a few other places such as camlList__rev_append
> or caml__apply2 or any other places as well. In caml_oldify_one, for
> instance, the segfault always happen at the same location : the
> assertion that sz is not 0 (and of course when you read the code it's
> pretty clear that sz=0 correspond to the case "already forwarded" that's
> handled at the beginning of the function).
>
> The pattern, then, is that a register (usually r0, r2 or r5) is
> restored from the stack after a call to a function that might call the
> GC (or to a call to the GC itself), then dereferenced. It's obvious
> inspecting the stack with gdb that this very word was changed during the
> call and a value like 0, 3 or 1024 is read back into the register
> instead of an mlvalue.
>
> I didn't managed (yet) to reduce the size of the program to a small show
> case, and I am under the impression that all these components are
> required in order for the bug to happen 'fast enough' :
>
> - threads
> - floats
> - call to C function (greatly reduce the time to wait before the crash)
>
> I am also under the impression that the bug is affected by the new stack
> alignment requirement (because in one occurrence, calling or not a
> function that does nothing from within a function hit by the bug reduced
> drastically the probability of the bug, and the major difference I saw
> was that on one version of the function the stack size was 16 bytes and
> the other 24 bytes (16+4 apparently for the address of a "module"
> structure, aligned up to 24 bytes). I thus manually checked the
> generated framesets but they were allright as far as I understand them.
>
> Now I'm a little desperate since each recompile+test takes about 20
> minutes and the bug is so erratic ; so if someone here is familiar with
> ARM arch and in particular the difference between old and new ABI please
> suggest me what I should check, or any hint whatsoever. I'd be very much
> grateful as this consumes a lot of my spare time.
>
> Also, I'm compiling ocaml with gcc 4.2.1 - do you think it may be a
> problem with gcc not following the very same ABI ?
>
> Also I've run the testsuite but it did not reveal anything.
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

--20cf303b397599bd4e04a6d5e77c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<meta charset=3D"utf-8"><span class=3D"Apple-style-span" style=3D"font-fami=
ly: arial, sans-serif; font-size: 16px; "><span class=3D"hps" title=3D"Clic=
k for alternate translations">It took a long</span>=A0<span class=3D"hps" t=
itle=3D"Click for alternate translations">time,</span>=A0<span class=3D"hps=
" title=3D"Click for alternate translations">could</span>=A0<span class=3D"=
hps" title=3D"Click for alternate translations">you understand</span>=A0<sp=
an class=3D"hps" title=3D"Click for alternate translations">why this bug ha=
ppens.</span><br>
<span class=3D"hps" title=3D"Click for alternate translations">On the</span=
>=A0<span class=3D"hps" title=3D"Click for alternate translations">iphone</=
span>=A0<span class=3D"hps" title=3D"Click for alternate translations">I ge=
t</span>=A0<span class=3D"hps" title=3D"Click for alternate translations">t=
he same</span>=A0b<span class=3D"hps" title=3D"Click for alternate translat=
ions">ug with ocaml-3.12?</span></span><br>
<br><div class=3D"gmail_quote">On Wed, Nov 24, 2010 at 3:20 AM,  <span dir=
=3D"ltr">&lt;<a href=3D"mailto:rixed@happyleptic.org">rixed@happyleptic.org=
</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
For some time now I&#39;m after a bug hitting a program of mine when<br>
compiled on ARM with ocaml 3.12.0.<br>
I initially though my own C code was misbehaving but the program keep<br>
crashing, although not as early, if I comment out all calls to the C<br>
functions.<br>
<br>
The segfaults happen frequently during the GC, in oldify_one or<br>
oldify_mopup, but also in a few other places such as camlList__rev_append<b=
r>
or caml__apply2 or any other places as well. In caml_oldify_one, for<br>
instance, the segfault always happen at the same location : the<br>
assertion that sz is not 0 (and of course when you read the code it&#39;s<b=
r>
pretty clear that sz=3D0 correspond to the case &quot;already forwarded&quo=
t; that&#39;s<br>
handled at the beginning of the function).<br>
<br>
The pattern, then, is that a register (usually r0, r2 or r5) is<br>
restored from the stack after a call to a function that might call the<br>
GC (or to a call to the GC itself), then dereferenced. It&#39;s obvious<br>
inspecting the stack with gdb that this very word was changed during the<br>
call and a value like 0, 3 or 1024 is read back into the register<br>
instead of an mlvalue.<br>
<br>
I didn&#39;t managed (yet) to reduce the size of the program to a small sho=
w<br>
case, and I am under the impression that all these components are<br>
required in order for the bug to happen &#39;fast enough&#39; :<br>
<br>
- threads<br>
- floats<br>
- call to C function (greatly reduce the time to wait before the crash)<br>
<br>
I am also under the impression that the bug is affected by the new stack<br>
alignment requirement (because in one occurrence, calling or not a<br>
function that does nothing from within a function hit by the bug reduced<br>
drastically the probability of the bug, and the major difference I saw<br>
was that on one version of the function the stack size was 16 bytes and<br>
the other 24 bytes (16+4 apparently for the address of a &quot;module&quot;=
<br>
structure, aligned up to 24 bytes). I thus manually checked the<br>
generated framesets but they were allright as far as I understand them.<br>
<br>
Now I&#39;m a little desperate since each recompile+test takes about 20<br>
minutes and the bug is so erratic ; so if someone here is familiar with<br>
ARM arch and in particular the difference between old and new ABI please<br>
suggest me what I should check, or any hint whatsoever. I&#39;d be very muc=
h<br>
grateful as this consumes a lot of my spare time.<br>
<br>
Also, I&#39;m compiling ocaml with gcc 4.2.1 - do you think it may be a<br>
problem with gcc not following the very same ABI ?<br>
<br>
Also I&#39;ve run the testsuite but it did not reveal anything.<br>
<br>
_______________________________________________<br>
Caml-list mailing list. Subscription management:<br>
<a href=3D"http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list" target=
=3D"_blank">http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list</a><br>
Archives: <a href=3D"http://caml.inria.fr" target=3D"_blank">http://caml.in=
ria.fr</a><br>
Beginner&#39;s list: <a href=3D"http://groups.yahoo.com/group/ocaml_beginne=
rs" target=3D"_blank">http://groups.yahoo.com/group/ocaml_beginners</a><br>
Bug reports: <a href=3D"http://caml.inria.fr/bin/caml-bugs" target=3D"_blan=
k">http://caml.inria.fr/bin/caml-bugs</a><br>
</blockquote></div><br>

--20cf303b397599bd4e04a6d5e77c--