From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 0FEBEBCAF for ; Sat, 11 Jun 2005 20:39:32 +0200 (CEST) Received: from ash25e.internode.on.net (ash25e.internode.on.net [203.16.214.182]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j5BIdTAf032606 for ; Sat, 11 Jun 2005 20:39:31 +0200 Received: from Rosella (ppp35-43.lns1.syd2.internode.on.net [59.167.35.43]) by ash25e.internode.on.net (8.12.9/8.12.6) with ESMTP id j5BIdNpI050820; Sun, 12 Jun 2005 04:09:24 +0930 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Re: [Caml-list] Caml on intel-OSX From: John Skaller To: Jon Harrop Cc: caml-list@yquem.inria.fr In-Reply-To: <200506110039.18279.jon@ffconsultancy.com> References: <1118295206.7145.165.camel@rosella.wigram> <36973.131.254.50.45.1118334009.squirrel@mail.irisa.fr> <1118357500.8693.80.camel@rosella.wigram> <200506110039.18279.jon@ffconsultancy.com> Content-Type: text/plain Date: Sun, 12 Jun 2005 04:39:22 +1000 Message-Id: <1118515162.7212.39.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 42AB2FE1.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocaml:01 compiler:01 bug:01 bug:01 segfaults:01 endline:01 ocaml:01 gdb:01 compiler:01 bytecode:01 stack:01 binary:01 internals:01 ocaml's:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: On Sat, 2005-06-11 at 00:39 +0100, Jon Harrop wrote: > On Thursday 09 June 2005 23:51, John Skaller wrote: > > Whilst I personally prefer the ocaml native code compiler, > > at present I can't use it on my AMD64 as there appears > Do you think I shouldn't release for AMD64? See PR#3640 (in Incoming in the Bug Tracker) first: My guess is if you code works it works... the bug is fairly rare and only triggered in unusual circumstances .. but that is just a guess. > How confident are you that this is > a codegen bug? It would help if someone tried to replicate the problem, to be sure it exists -- if could just be my machine. Try to build Felix: /http://felix.sourceforge.net/flx_1.0.25_src.tgz on my box it segfaults in every test. I can only guess it is a backend code generator bug from the following data: (a) it goes way if a print_endline " .. " is introduced in certain places but not others. (b) Using that technique I have isolated the problem to a few lines of Ocaml code (c) Something like 50 regression tests and tutorial examples all fail in the same place (d) gdb indicates a problem in caml_apply5() (e) The code runs on x86 with native code compiler and all architectures with bytecode compiler. (f) changing stack size from 8M to 16M makes no difference (g) The problem only occurs in one program in one place, other programs don't fail. (h) It fails on both my personal build from sourcecode of 3.08.3 and also a 3.08.2(ubuntu) binary package (which I guess is a copy of the Debian package). I would be a lot more confident if someone reproduced the bug and someone from INRIA said they'd found it :) I'm only guessing. I am not an expert on the internals of Ocaml. > Does your code use only OCaml's theoretically safe subset? Yes. The code also runs on x86 native code and bytecode. It does use Unix, Big_int, and Ocamlyacc/Ocamllex it might be a C compiler problem, but the location of the crash isn't using any of that and it isn't a random error but repeatable and consistent. However there is no Obj.magic, and no C, and no other third party code such as from Extlib that might use it: everything comes from the standard distro. [Actually ocs_scheme is linked in and might, but it isn't being called] There lots of things I could experiment with, for example it might be my linker, C compiler, or even Linux kernel... or it could be a bug in the Felix source which luckily didn't cause a problem in other situations (eg, a tail-recursion that isn't handled on one processor but is on another). But a simple bug in register allocation in the back end seems most likely to me at this time. -- John Skaller, skaller at users.sf.net PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 Download Felix here: http://felix.sf.net