Tips to find the cause of a seg fault

* Tips to find the cause of a seg fault
@ 2010-11-30 23:08 Philippe Veber
  2010-11-30 23:18 ` [Caml-list] " oliver
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Philippe Veber @ 2010-11-30 23:08 UTC (permalink / raw)
  To: caml users

[-- Attachment #1: Type: text/plain, Size: 7909 bytes --]

Short story (details below): I'm currently writing a program relying on
react, lablgl and ocamlsdl. This program segfaults on my laptop under two
linux distributions (ubuntu and gentoo) but doesn't on a PC under ubuntu.
The seg fault occurs with both bytecode and native executables. I don't do
any marshaling nor use any typing magic; stack overflow is not likely. I
humbly ask this list about means to improve valgrind or gdb outputs, which
don't report informative function names, or more generally, any tip that
could help me to locate the origin of the problem.

More details
============

The seg fault occurs when using the mouse wheel in the application, and only
if it is rolled fast enough. By trial and error, I could track the problem
up to a function of mine handling mouse events: when a click occurs, the sdl
record describing the mouse event is passed to a callback function which
looks like this:

let picking_handler send = function
  | { mbe_button = BUTTON_WHEELDOWN ; mbe_state = RELEASED } ->
      send `zoom_out
  | { mbe_button = BUTTON_WHEELUP ; mbe_state = RELEASED } ->
      send `zoom_in
  | { mbe_button = BUTTON_LEFT ; mbe_state = RELEASED } ->
    assert false
  | _ -> ()

The seg fault occurs during the call to this function with the button event
retrieved by ocamlsdl. What's *really* weird is that if I comment the third
case of the pattern matching, the seg fault does not occur. This is strange
since with the "assert false" expression, I make sure this case is useless
(i don't press the left button). Also, in the various tests I made, I
obtained different errors, like segmentation fault in caml_absf_mask or
invalid instruction error.

Of course I am not asking a solution to my problem, but maybe this may
remind you of something. That is, any suggestion (other than "resign !" ;o))
will be greatly appreciated !

Philippe.

PS Below, the output of valgrind

~/hum 23:10:20 $ valgrind ./hum.native
==11306== Memcheck, a memory error detector
==11306== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11306== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
copyright info
==11306== Command: ./hum.native
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x7ABC820: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x85AED3F: ???
==11306==    by 0x85AED3F: ???
==11306==    by 0x7FEFFFF1F: ???
==11306==    by 0x7FEFFFFDF: ???
==11306==    by 0x7FEFFFFE7: ???
==11306==    by 0xF2C04F7: ??? (in /dev/zero)
==11306==    by 0xF285FFF: ??? (in /usr/lib/libXfixes.so.3.1.0)
==11306==    by 0x4002: ???
==11306==    by 0x1058619F: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x7ABC829: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x85AED3F: ???
==11306==    by 0x85AED3F: ???
==11306==    by 0x7FEFFFF1F: ???
==11306==    by 0x7FEFFFFDF: ???
==11306==    by 0x7FEFFFFE7: ???
==11306==    by 0xF2C04F7: ??? (in /dev/zero)
==11306==    by 0xF285FFF: ??? (in /usr/lib/libXfixes.so.3.1.0)
==11306==    by 0x4002: ???
==11306==    by 0x1058619F: ???
==11306==
==11306== Use of uninitialised value of size 8
==11306==    at 0x7ABC836: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x85AED3F: ???
==11306==    by 0x85AED3F: ???
==11306==    by 0x7FEFFFF1F: ???
==11306==    by 0x7FEFFFFDF: ???
==11306==    by 0x7FEFFFFE7: ???
==11306==    by 0xF2C04F7: ??? (in /dev/zero)
==11306==    by 0xF285FFF: ??? (in /usr/lib/libXfixes.so.3.1.0)
==11306==    by 0x4002: ???
==11306==    by 0x1058619F: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x5B3D5E7: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3DEAE: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3E8CA: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B12F9F: SDL_PumpEvents (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B13368: SDL_WaitEvent (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x47ED83: mlsdlevent_wait_event (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x498DC3: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC576ABF: ???
==11306==    by 0xC576B77: ???
==11306==    by 0xC55FCB7: ???
==11306==    by 0x4215D5: camlHum__entry (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC5767A7: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x5B3D616: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3DEAE: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3E8CA: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B12F9F: SDL_PumpEvents (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B13368: SDL_WaitEvent (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x47ED83: mlsdlevent_wait_event (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x498DC3: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC576ABF: ???
==11306==    by 0xC576B77: ???
==11306==    by 0xC55FCB7: ???
==11306==    by 0x4215D5: camlHum__entry (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC5767A7: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x7695580: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x1000000000000: ???
==11306==
vex amd64->IR: unhandled instruction bytes: 0xFF 0xD8 0xC 0xF9 0xFF 0xD8
==11306== valgrind: Unrecognised instruction at address 0x4992cb.
==11306== Your program just tried to execute an instruction that Valgrind
==11306== did not recognise.  There are two possible reasons for this.
==11306== 1. Your program has a bug and erroneously jumped to a non-code
==11306==    location.  If you are running Memcheck and you just saw a
==11306==    warning about a bad jump, it's probably your program's fault.
==11306== 2. The instruction is legitimate but Valgrind doesn't handle it,
==11306==    i.e. it's Valgrind's fault.  If you think this is the case or
==11306==    you are not sure, please let us know and we'll try to fix it.
==11306== Either way, Valgrind will now raise a SIGILL signal which will
==11306== probably kill your program.
==11306==
==11306== Process terminating with default action of signal 4 (SIGILL)
==11306==  Illegal opcode at address 0x4992CB
==11306==    at 0x4992CB: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0x422388: camlGui__trace_504 (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x1547AAB7: ???
==11306==    by 0xC54A70F: ???
==11306==    by 0x6B2857: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0x42254E: camlGui__click_508 (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x7FF00036F: ???
==11306==    by 0x422504: camlGui__click_508 (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC54897F: ???
==11306==    by 0x104: ???
==11306==    by 0x2: ???
==11306==    by 0x15487DF7: ???
==11306==
==11306== HEAP SUMMARY:
==11306==     in use at exit: 142,919,371 bytes in 77,584 blocks
==11306==   total heap usage: 183,886 allocs, 106,302 frees, 294,404,168
bytes allocated
==11306==
==11306== LEAK SUMMARY:
==11306==    definitely lost: 38 bytes in 3 blocks
==11306==    indirectly lost: 176 bytes in 4 blocks
==11306==      possibly lost: 66,443,601 bytes in 292 blocks
==11306==    still reachable: 76,475,556 bytes in 77,285 blocks
==11306==         suppressed: 0 bytes in 0 blocks
==11306== Rerun with --leak-check=full to see details of leaked memory
==11306==
==11306== For counts of detected and suppressed errors, rerun with: -v
==11306== Use --track-origins=yes to see where uninitialised values come
from
==11306== ERROR SUMMARY: 1170520 errors from 7 contexts (suppressed: 7 from
7)
Instruction non permise

[-- Attachment #2: Type: text/html, Size: 8917 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread