2010/12/1 Romain Beauxis <toots@rastageeks.org>
Hi,

Le mardi 30 novembre 2010 17:08:12, Philippe Veber a écrit :
> The seg fault occurs during the call to this function with the button event
> retrieved by ocamlsdl. What's really weird is that if I comment the third
> case of the pattern matching, the seg fault does not occur. This is strange
> since with the "assert false" expression, I make sure this case is useless
> (i don't press the left button). Also, in the various tests I made, I
> obtained different errors, like segmentation fault in caml_absf_mask or
> invalid instruction error.

The function that triggers the segfault may be confusing, in particular in
case of a memory corruption, which I suspect here.
The pattern matching can cause a crash because it is using a value that is
already corrupted and because the third case is one that, for some random
conditions, touches the part in memory that is corrupted.
How is this possible if it is never reached (no left click) ?
 

In this case, I would try to unroll the code and see where the value that is
used in this function was instanciated.
What do you mean by "unrolling the code" ?

 

Main source of corruption when using C bindings most often come from either
the Gc or code executed while the global lock has been released.

In the case of a segfault hapenning during a Gc call, this can be really
unrelated, for instance the instanciation of a new value triggers a Gc
collection to compact memory, which in turns triggers the recollection of a
corrupted value, which causes a segfault.

In the case of a segfault hapenning during a C call while the global lock has
been released, you may get more useful informations through gdb, in particular
the trace of the C code used at the time of the segfault. You need have the
debugging symbols for the dynamic C libraries used as well.

We experienced a couple of segfault with ocaml SDL too but in unrelated parts
(video). I do not mean to criticize upstream's work on ocaml SDL because I
know for a fact that these types of bindings are really hard to code. However,
I would suspect an issue there.

Finally, the best approach could be to actually look closely to the binding's
code and try to spot anything fishy there related to your issue. This generaly
worked better for me than trying to get information from gdb and the like..

Many thanks for the clarification. Maybe I could (partially) "unplug" the GC by setting space_overhead to 100 ? That could give an indication on the moment the problem occurs ?
ph.