caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Bug somewhere in Ocaml 3.09.3.rc1?
@ 2007-03-29 20:40 skaller
  2007-03-29 21:10 ` [Caml-list] " skaller
  2007-03-29 21:32 ` skaller
  0 siblings, 2 replies; 5+ messages in thread
From: skaller @ 2007-03-29 20:40 UTC (permalink / raw)
  To: caml-list

I have a weird bug where the Felix compiler is going haywire.
I need some ideas how to think about what it is. It appears
to be a bug in Ocaml, not my code.

One, and only one, regression test case is failing when
a piece of code processed by ALL test cases is added.

The test case consists of around 6000 Felix assert statements
partitioned into N small noinline procedures.

>From some value of M < N, the code works, but if I add
just one more of the procedures the compiler diverges.
The divergence follows a known problem pattern in the
Felix inliner, and it unrelated to any of the test
code (it's an unused library routine).

So I think I'm overflowing some boundary, and the Ocaml
run time is corrupting something. The Felix compiler's fresh
symbol count is around 16,000 when this happens -- quite a small
number. The test code is around 500K of source characters,
or 12,000 lines (half the lines are #line directives).

Is there any known problem with this version of Ocaml that
might explain this? Because obviously I can't easily
submit that much test data, and the actual Ocaml code
is also quite large and there's no possible way to isolate
the bug to a simpler program (the routines causing the problem
can't be simply disconnected from the rest of the compiler,
and still process the test data).

My machine is 1G AMD64, I'm using Ocamlopt, and the Ocaml
was built by me directly from the Inria repository.

I might try 3.10 .. can someone tell me the CVS command
needed to update my repository image (sorry, missed the
release announcement).

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bug somewhere in Ocaml 3.09.3.rc1?
  2007-03-29 20:40 Bug somewhere in Ocaml 3.09.3.rc1? skaller
@ 2007-03-29 21:10 ` skaller
  2007-03-29 21:32 ` skaller
  1 sibling, 0 replies; 5+ messages in thread
From: skaller @ 2007-03-29 21:10 UTC (permalink / raw)
  To: caml-list

On Fri, 2007-03-30 at 06:40 +1000, skaller wrote:

> I might try 3.10 .. can someone tell me the CVS command
> needed to update my repository image (sorry, missed the
> release announcement).

Sorry scratch that request, figured it out .. :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bug somewhere in Ocaml 3.09.3.rc1?
  2007-03-29 20:40 Bug somewhere in Ocaml 3.09.3.rc1? skaller
  2007-03-29 21:10 ` [Caml-list] " skaller
@ 2007-03-29 21:32 ` skaller
  2007-03-31 19:32   ` skaller
  1 sibling, 1 reply; 5+ messages in thread
From: skaller @ 2007-03-29 21:32 UTC (permalink / raw)
  To: caml-list

On Fri, 2007-03-30 at 06:40 +1000, skaller wrote:
> I have a weird bug where the Felix compiler is going haywire.
> I need some ideas how to think about what it is. It appears
> to be a bug in Ocaml, not my code.
[]

> So I think I'm overflowing some boundary, and the Ocaml
> run time is corrupting something. The Felix compiler's fresh
> symbol count is around 16,000 when this happens -- quite a small
> number. The test code is around 500K of source characters,
> or 12,000 lines (half the lines are #line directives).

Ok, 3.10 fails too .. but I think it might be Marshall ..
i'm Marshalling all the parsed code out. Here's the file:

1948 -rw-r--r-- 1 skaller skaller 1988162 2007-03-30 07:26
rt-1.01.01-0.par

so the data here is 2Meg .. could this be breaking string
length limits and crashing something?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bug somewhere in Ocaml 3.09.3.rc1?
  2007-03-29 21:32 ` skaller
@ 2007-03-31 19:32   ` skaller
  2007-04-03 16:56     ` skaller
  0 siblings, 1 reply; 5+ messages in thread
From: skaller @ 2007-03-31 19:32 UTC (permalink / raw)
  To: caml-list

On Fri, 2007-03-30 at 07:32 +1000, skaller wrote:
> On Fri, 2007-03-30 at 06:40 +1000, skaller wrote:
> > I have a weird bug where the Felix compiler is going haywire.
> > I need some ideas how to think about what it is. It appears
> > to be a bug in Ocaml, not my code.
> []
> 
> > So I think I'm overflowing some boundary, and the Ocaml
> > run time is corrupting something. The Felix compiler's fresh
> > symbol count is around 16,000 when this happens -- quite a small
> > number. The test code is around 500K of source characters,
> > or 12,000 lines (half the lines are #line directives).
> 
> Ok, 3.10 fails too .. but I think it might be Marshall ..

It's not Marshall, I just turned it off.

Hmmm .. no hints from anyone? This problem isn't in
the code generator AFAIK, because it occurs on both x86 and x86_64.

I have a set of Felix code going like:

fun f1() { 
  assert(40+2==42);
  assert(40L+2L==42L);
  ... // many more occurrences
}
f1();

fun f2() { ....

for 6000 lines of code, all the same pattern. With a new
function added into the library which is not actually used
by any of this code and DOES work when tested .. the compiler
diverges trying to inline into this function (unrelated
to the test code!)

If I reduce the number of functions in the test code above
by 40% it stops diverging and produces the right answer.
Add one more test .. any one .. and it diverges.

So my theory is something is overflowing and corrupting
something.

Despite the fact that the inlining procedure is fragile
and has had the divergence problem before, legitimately,
and I have no particular reason to believe the current
version doesn't diverge .. there is no logical connection
between the divergence and the number of assertions
being checked in the test case: if the new library
function cause divergence, it should do so every time.
Over 100 tests compile this code and don't fail.
The only test to fail is the one described above,
and it is 100 times bigger than any other.

So I think this is likely to be a bug in Ocaml, probably
in the run time system, and most likely in the garbage
collector (but it could be in ocamllex/ocamlyacc).

I use ocamllex and ocamlyacc, but otherwise the whole
program is pure single threaded Ocaml. No special compilation options
(like -unsafe) are used. The only 'constant' sized data structure
I use is Hashtbl, which is almost always initialised to size 97.

Yes, of course it can be my code, and probably is.. I just
have no idea what to look for.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Bug somewhere in Ocaml 3.09.3.rc1?
  2007-03-31 19:32   ` skaller
@ 2007-04-03 16:56     ` skaller
  0 siblings, 0 replies; 5+ messages in thread
From: skaller @ 2007-04-03 16:56 UTC (permalink / raw)
  To: caml-list

On Sun, 2007-04-01 at 05:32 +1000, skaller wrote:

> > > So I think I'm overflowing some boundary, and the Ocaml
> > > run time is corrupting something. 

> Yes, of course it can be my code, and probably is.. I just
> have no idea what to look for.

I have found the problem! No bug in Ocaml :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-04-03 16:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-29 20:40 Bug somewhere in Ocaml 3.09.3.rc1? skaller
2007-03-29 21:10 ` [Caml-list] " skaller
2007-03-29 21:32 ` skaller
2007-03-31 19:32   ` skaller
2007-04-03 16:56     ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).