caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Help finding possible bug in 3.09/ia64
@ 2005-11-28  6:25 skaller
  2005-11-28 16:20 ` [Caml-list] " Mike Furr
  0 siblings, 1 reply; 2+ messages in thread
From: skaller @ 2005-11-28  6:25 UTC (permalink / raw)
  To: caml-list

I am wondering if anyone has access to an ia64 and who is
willing to help track down a possible bug in Ocaml 3.09
for that platform. Details and background follow.
I would prefer confirmation before reporting a bug,
also kind of hard to report a bug on an architecture 
I don't have access to .. :)

Mike Furr has found a problem with one of the Felix regression
tests on the ia64 platform. It appears to me this is
most likely a bug in the ia64 runtime, and next most likely
a bug in the native code generator for ia64. 

The code in question works fine on i386, amd64, ppc and
some other architectures. The program contains no uses
of any C bindings, no use of the Obj module, and no unsafe
array accesses.

The error manifests as:

PATH=bin:"$PATH" LD_LIBRARY_PATH=rtl:"$LD_LIBRARY_PATH" bin/flxg -Ilib
tut/examples/mac126
  .. ERROR CODE 0xb
TESTFILE -- ERROR! tut/examples/mac126

during 'make test'.

A segfault results from what appears to be a runaway
loop in the garbage collector:

(gdb) bt
#0  0x40000000002a04e0 in caml_oldify_local_roots ()
#1  0x40000000002a5100 in caml_empty_minor_heap ()
#2  0x40000000002a5360 in caml_minor_collection ()
#3  0x40000000002a1b50 in caml_garbage_collection ()
#4  0x40000000002c5ca0 in caml_call_gc ()
#5  0x40000000002a5100 in caml_empty_minor_heap ()
#6  0x40000000002c5ca0 in caml_call_gc ()
#7  0x40000000002a5100 in caml_empty_minor_heap ()
#8  0x40000000002c5ca0 in caml_call_gc ()
#9  0x40000000002a5100 in caml_empty_minor_heap ()
#10 0x40000000002c5ca0 in caml_call_gc ()
#11 0x40000000002a5100 in caml_empty_minor_heap ()

I don't have access to an ia64, so I am unable to
do much about this.

The fault occurs in a (not uploaded) Debian packaging
for Felix 1.1.1, the original tarball is located here:

http://felix.sourceforge.net/flx_1.1.0_src.tgz

It should build on Unix (or Windows XP64 if Ocaml
supports that, though I haven't tried it).

Yes, it IS possible there is a bug in the source
algorithm -- in fact, there definitely used to be
an unchecked overrun -- however the test is deterministic,
so it should fail on all architectures with the same
word size at least --- it works fine on amd64.

[The algorithm DOES contain a potentially infinite
recursion which is supposed to be limited]

The actual algorithm is probably part of the 
flx_macro module, since the test is exercising
the macro processor.

It is (just) possible a deep recursion is overflowing
the stack, corrupting memory, and causing the gc to
get stuck. Exactly how this could happen I don't know
(since it doesn't on other platforms). The test
has been around for a long time (over a year I think).


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-11-28 16:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-28  6:25 Help finding possible bug in 3.09/ia64 skaller
2005-11-28 16:20 ` [Caml-list] " Mike Furr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).