caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Tips to find the cause of a seg fault
@ 2010-11-30 23:08 Philippe Veber
  2010-11-30 23:18 ` [Caml-list] " oliver
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Philippe Veber @ 2010-11-30 23:08 UTC (permalink / raw)
  To: caml users

[-- Attachment #1: Type: text/plain, Size: 7909 bytes --]

Short story (details below): I'm currently writing a program relying on
react, lablgl and ocamlsdl. This program segfaults on my laptop under two
linux distributions (ubuntu and gentoo) but doesn't on a PC under ubuntu.
The seg fault occurs with both bytecode and native executables. I don't do
any marshaling nor use any typing magic; stack overflow is not likely. I
humbly ask this list about means to improve valgrind or gdb outputs, which
don't report informative function names, or more generally, any tip that
could help me to locate the origin of the problem.

More details
============

The seg fault occurs when using the mouse wheel in the application, and only
if it is rolled fast enough. By trial and error, I could track the problem
up to a function of mine handling mouse events: when a click occurs, the sdl
record describing the mouse event is passed to a callback function which
looks like this:

let picking_handler send = function
  | { mbe_button = BUTTON_WHEELDOWN ; mbe_state = RELEASED } ->
      send `zoom_out
  | { mbe_button = BUTTON_WHEELUP ; mbe_state = RELEASED } ->
      send `zoom_in
  | { mbe_button = BUTTON_LEFT ; mbe_state = RELEASED } ->
    assert false
  | _ -> ()

The seg fault occurs during the call to this function with the button event
retrieved by ocamlsdl. What's *really* weird is that if I comment the third
case of the pattern matching, the seg fault does not occur. This is strange
since with the "assert false" expression, I make sure this case is useless
(i don't press the left button). Also, in the various tests I made, I
obtained different errors, like segmentation fault in caml_absf_mask or
invalid instruction error.

Of course I am not asking a solution to my problem, but maybe this may
remind you of something. That is, any suggestion (other than "resign !" ;o))
will be greatly appreciated !

Philippe.

PS Below, the output of valgrind

~/hum 23:10:20 $ valgrind ./hum.native
==11306== Memcheck, a memory error detector
==11306== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==11306== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
copyright info
==11306== Command: ./hum.native
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x7ABC820: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x85AED3F: ???
==11306==    by 0x85AED3F: ???
==11306==    by 0x7FEFFFF1F: ???
==11306==    by 0x7FEFFFFDF: ???
==11306==    by 0x7FEFFFFE7: ???
==11306==    by 0xF2C04F7: ??? (in /dev/zero)
==11306==    by 0xF285FFF: ??? (in /usr/lib/libXfixes.so.3.1.0)
==11306==    by 0x4002: ???
==11306==    by 0x1058619F: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x7ABC829: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x85AED3F: ???
==11306==    by 0x85AED3F: ???
==11306==    by 0x7FEFFFF1F: ???
==11306==    by 0x7FEFFFFDF: ???
==11306==    by 0x7FEFFFFE7: ???
==11306==    by 0xF2C04F7: ??? (in /dev/zero)
==11306==    by 0xF285FFF: ??? (in /usr/lib/libXfixes.so.3.1.0)
==11306==    by 0x4002: ???
==11306==    by 0x1058619F: ???
==11306==
==11306== Use of uninitialised value of size 8
==11306==    at 0x7ABC836: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x85AED3F: ???
==11306==    by 0x85AED3F: ???
==11306==    by 0x7FEFFFF1F: ???
==11306==    by 0x7FEFFFFDF: ???
==11306==    by 0x7FEFFFFE7: ???
==11306==    by 0xF2C04F7: ??? (in /dev/zero)
==11306==    by 0xF285FFF: ??? (in /usr/lib/libXfixes.so.3.1.0)
==11306==    by 0x4002: ???
==11306==    by 0x1058619F: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x5B3D5E7: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3DEAE: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3E8CA: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B12F9F: SDL_PumpEvents (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B13368: SDL_WaitEvent (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x47ED83: mlsdlevent_wait_event (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x498DC3: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC576ABF: ???
==11306==    by 0xC576B77: ???
==11306==    by 0xC55FCB7: ???
==11306==    by 0x4215D5: camlHum__entry (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC5767A7: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x5B3D616: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3DEAE: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B3E8CA: ??? (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B12F9F: SDL_PumpEvents (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x5B13368: SDL_WaitEvent (in /usr/lib/libSDL-1.2.so.0.11.3)
==11306==    by 0x47ED83: mlsdlevent_wait_event (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x498DC3: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC576ABF: ???
==11306==    by 0xC576B77: ???
==11306==    by 0xC55FCB7: ???
==11306==    by 0x4215D5: camlHum__entry (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC5767A7: ???
==11306==
==11306== Conditional jump or move depends on uninitialised value(s)
==11306==    at 0x7695580: ??? (in
/usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
==11306==    by 0x1000000000000: ???
==11306==
vex amd64->IR: unhandled instruction bytes: 0xFF 0xD8 0xC 0xF9 0xFF 0xD8
==11306== valgrind: Unrecognised instruction at address 0x4992cb.
==11306== Your program just tried to execute an instruction that Valgrind
==11306== did not recognise.  There are two possible reasons for this.
==11306== 1. Your program has a bug and erroneously jumped to a non-code
==11306==    location.  If you are running Memcheck and you just saw a
==11306==    warning about a bad jump, it's probably your program's fault.
==11306== 2. The instruction is legitimate but Valgrind doesn't handle it,
==11306==    i.e. it's Valgrind's fault.  If you think this is the case or
==11306==    you are not sure, please let us know and we'll try to fix it.
==11306== Either way, Valgrind will now raise a SIGILL signal which will
==11306== probably kill your program.
==11306==
==11306== Process terminating with default action of signal 4 (SIGILL)
==11306==  Illegal opcode at address 0x4992CB
==11306==    at 0x4992CB: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0x422388: camlGui__trace_504 (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x1547AAB7: ???
==11306==    by 0xC54A70F: ???
==11306==    by 0x6B2857: ??? (in /home/pveber/hum/_build/src/hum.native)
==11306==    by 0x42254E: camlGui__click_508 (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0x7FF00036F: ???
==11306==    by 0x422504: camlGui__click_508 (in
/home/pveber/hum/_build/src/hum.native)
==11306==    by 0xC54897F: ???
==11306==    by 0x104: ???
==11306==    by 0x2: ???
==11306==    by 0x15487DF7: ???
==11306==
==11306== HEAP SUMMARY:
==11306==     in use at exit: 142,919,371 bytes in 77,584 blocks
==11306==   total heap usage: 183,886 allocs, 106,302 frees, 294,404,168
bytes allocated
==11306==
==11306== LEAK SUMMARY:
==11306==    definitely lost: 38 bytes in 3 blocks
==11306==    indirectly lost: 176 bytes in 4 blocks
==11306==      possibly lost: 66,443,601 bytes in 292 blocks
==11306==    still reachable: 76,475,556 bytes in 77,285 blocks
==11306==         suppressed: 0 bytes in 0 blocks
==11306== Rerun with --leak-check=full to see details of leaked memory
==11306==
==11306== For counts of detected and suppressed errors, rerun with: -v
==11306== Use --track-origins=yes to see where uninitialised values come
from
==11306== ERROR SUMMARY: 1170520 errors from 7 contexts (suppressed: 7 from
7)
Instruction non permise

[-- Attachment #2: Type: text/html, Size: 8917 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-11-30 23:08 Tips to find the cause of a seg fault Philippe Veber
@ 2010-11-30 23:18 ` oliver
  2010-12-01  8:32   ` Philippe Veber
  2010-12-01  1:59 ` Romain Beauxis
  2010-12-01  5:51 ` Ilya Seleznev
  2 siblings, 1 reply; 13+ messages in thread
From: oliver @ 2010-11-30 23:18 UTC (permalink / raw)
  To: caml users

On Wed, Dec 01, 2010 at 12:08:12AM +0100, Philippe Veber wrote:
> Short story (details below): I'm currently writing a program relying on
> react, lablgl and ocamlsdl. This program segfaults on my laptop under two
> linux distributions (ubuntu and gentoo) but doesn't on a PC under ubuntu.
> The seg fault occurs with both bytecode and native executables. I don't do
[...]

A minimal program plus a Makefile would make helping easier.

Did you tried the code with a different X-driver?

Maybe it's a problem there.

Or maybe something is not linked correctly against the X-libs?

Just a guess.


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-11-30 23:08 Tips to find the cause of a seg fault Philippe Veber
  2010-11-30 23:18 ` [Caml-list] " oliver
@ 2010-12-01  1:59 ` Romain Beauxis
  2010-12-01 15:17   ` Philippe Veber
  2010-12-01  5:51 ` Ilya Seleznev
  2 siblings, 1 reply; 13+ messages in thread
From: Romain Beauxis @ 2010-12-01  1:59 UTC (permalink / raw)
  To: caml-list

Hi,

Le mardi 30 novembre 2010 17:08:12, Philippe Veber a écrit :
> The seg fault occurs during the call to this function with the button event
> retrieved by ocamlsdl. What's really weird is that if I comment the third
> case of the pattern matching, the seg fault does not occur. This is strange
> since with the "assert false" expression, I make sure this case is useless
> (i don't press the left button). Also, in the various tests I made, I
> obtained different errors, like segmentation fault in caml_absf_mask or
> invalid instruction error.

The function that triggers the segfault may be confusing, in particular in 
case of a memory corruption, which I suspect here.
The pattern matching can cause a crash because it is using a value that is 
already corrupted and because the third case is one that, for some random 
conditions, touches the part in memory that is corrupted.

In this case, I would try to unroll the code and see where the value that is 
used in this function was instanciated.

Main source of corruption when using C bindings most often come from either 
the Gc or code executed while the global lock has been released.

In the case of a segfault hapenning during a Gc call, this can be really 
unrelated, for instance the instanciation of a new value triggers a Gc 
collection to compact memory, which in turns triggers the recollection of a 
corrupted value, which causes a segfault.

In the case of a segfault hapenning during a C call while the global lock has 
been released, you may get more useful informations through gdb, in particular 
the trace of the C code used at the time of the segfault. You need have the 
debugging symbols for the dynamic C libraries used as well.

We experienced a couple of segfault with ocaml SDL too but in unrelated parts 
(video). I do not mean to criticize upstream's work on ocaml SDL because I 
know for a fact that these types of bindings are really hard to code. However, 
I would suspect an issue there.

Finally, the best approach could be to actually look closely to the binding's 
code and try to spot anything fishy there related to your issue. This generaly 
worked better for me than trying to get information from gdb and the like..

Romain


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-11-30 23:08 Tips to find the cause of a seg fault Philippe Veber
  2010-11-30 23:18 ` [Caml-list] " oliver
  2010-12-01  1:59 ` Romain Beauxis
@ 2010-12-01  5:51 ` Ilya Seleznev
  2010-12-01 15:21   ` Philippe Veber
  2 siblings, 1 reply; 13+ messages in thread
From: Ilya Seleznev @ 2010-12-01  5:51 UTC (permalink / raw)
  To: Philippe Veber; +Cc: caml users

On Wed, Dec 1, 2010 at 5:08 AM, Philippe Veber
<philippe.veber@googlemail.com> wrote:
> Short story (details below): I'm currently writing a program relying on
> react, lablgl and ocamlsdl. This program segfaults on my laptop under two
> linux distributions (ubuntu and gentoo) but doesn't on a PC under ubuntu.
> The seg fault occurs with both bytecode and native executables. I don't do
> any marshaling nor use any typing magic; stack overflow is not likely. I
> humbly ask this list about means to improve valgrind or gdb outputs, which
> don't report informative function names, or more generally, any tip that
> could help me to locate the origin of the problem.

I would log mouse events, that went into 'picking_handler' to ensure
that no unexpected input sent by SDL.


-- 
With best regards,
Ilya Seleznev


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-11-30 23:18 ` [Caml-list] " oliver
@ 2010-12-01  8:32   ` Philippe Veber
  2010-12-01  9:15     ` oliver
  0 siblings, 1 reply; 13+ messages in thread
From: Philippe Veber @ 2010-12-01  8:32 UTC (permalink / raw)
  To: oliver; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 5788 bytes --]

Actually I was not confident I could extract a small program reproducing the
issue until ... you had me try ! I could get a very tiny example that
behaves exactly the same, which does not involve opengl at all, only sdl.
Here it is :

[main.ml]
let init () =
  Sdl.init [`VIDEO ];
  ignore (Sdlvideo.set_video_mode ~w:640 ~h:480 ~bpp:32 [])

open Sdlevent
open Sdlmouse

let picking_handler = function
  | { mbe_button = BUTTON_WHEELDOWN ; mbe_state = RELEASED } -> ()
  | { mbe_button = BUTTON_WHEELUP ; mbe_state = RELEASED } -> ()
  | { mbe_button = BUTTON_LEFT ; mbe_state = RELEASED } -> ()
  | _ -> ()

let rec handle_events () =
  match wait_event () with
    | QUIT -> ()
    | MOUSEBUTTONUP mbe ->
      picking_handler mbe ;
      handle_events () ;
    | _ -> handle_events ()

let _ = init () ; handle_events () ; Sdl.quit ()

which can be compiled with

ocamlfind ocamlopt -o main -linkpkg -package sdl main.ml

On my laptop, this one seg faults unless i remove the third case of the
pattern (but that may not be very meaningful, as Romain explained). I can
report the backtrace offered by gdb :

~/hum 09:22:41 $ gdb ./hum
GNU gdb (GDB) 7.2-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/pveber/hum/hum...done.
(gdb) run
Starting program: /home/pveber/hum/hum
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x000000000043ee13 in caml_absf_mask ()
(gdb) bt
#0  0x000000000043ee13 in caml_absf_mask ()
#1  0x000000000040d283 in camlHum__handle_events_1124 ()
#2  0x00007ffff7fce1d0 in ?? ()
#3  0x000000000040d2f1 in camlHum__entry ()
#4  0x00007ffff7f8c5a0 in ?? ()
#5  0x000000000040c2a9 in caml_program ()
#6  0x000000000008e1e4 in ?? ()
#7  0x000000000043eb56 in caml_start_program ()
#8  0x0000000000000000 in ?? ()

and valgrind output

~/hum 09:28:45 $ valgrind ./hum
==21231== Memcheck, a memory error detector
==21231== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==21231== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
copyright info
==21231== Command: ./hum
==21231==
==21231== Invalid read of size 8
==21231==    at 0x43EE13: ??? (in /home/pveber/hum/hum)
==21231==    by 0x40D282: camlHum__handle_events_1124 (in
/home/pveber/hum/hum)
==21231==    by 0x91921DF: ???
==21231==    by 0x40D2F0: camlHum__entry (in /home/pveber/hum/hum)
==21231==    by 0x928B59F: ???
==21231==    by 0x40C2A8: caml_program (in /home/pveber/hum/hum)
==21231==    by 0x8E1E3: ???
==21231==    by 0x43EB55: ??? (in /home/pveber/hum/hum)
==21231==  Address 0xe4 is not stack'd, malloc'd or (recently) free'd
==21231==
==21231==
==21231== Process terminating with default action of signal 11 (SIGSEGV)
==21231==  Access not within mapped region at address 0xE4
==21231==    at 0x43EE13: ??? (in /home/pveber/hum/hum)
==21231==    by 0x40D282: camlHum__handle_events_1124 (in
/home/pveber/hum/hum)
==21231==    by 0x91921DF: ???
==21231==    by 0x40D2F0: camlHum__entry (in /home/pveber/hum/hum)
==21231==    by 0x928B59F: ???
==21231==    by 0x40C2A8: caml_program (in /home/pveber/hum/hum)
==21231==    by 0x8E1E3: ???
==21231==    by 0x43EB55: ??? (in /home/pveber/hum/hum)
==21231==  If you believe this happened as a result of a stack
==21231==  overflow in your program's main thread (unlikely but
==21231==  possible), you can try to increase the size of the
==21231==  main thread stack using the --main-stacksize= flag.
==21231==  The main thread stack size used in this run was 8388608.
==21231==
==21231== HEAP SUMMARY:
==21231==     in use at exit: 2,036,382 bytes in 1,608 blocks
==21231==   total heap usage: 13,084 allocs, 11,476 frees, 3,705,764 bytes
allocated
==21231==
==21231== LEAK SUMMARY:
==21231==    definitely lost: 16 bytes in 1 blocks
==21231==    indirectly lost: 176 bytes in 4 blocks
==21231==      possibly lost: 1,029,091 bytes in 15 blocks
==21231==    still reachable: 1,007,099 bytes in 1,588 blocks
==21231==         suppressed: 0 bytes in 0 blocks
==21231== Rerun with --leak-check=full to see details of leaked memory
==21231==
==21231== For counts of detected and suppressed errors, rerun with: -v
==21231== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 5 from 5)
Erreur de segmentation

I'm sorry I started with a long explanation instead of this. Thanks to your
advice, I have far better chances to find what's going on.

ph.



2010/12/1 <oliver@first.in-berlin.de>

> On Wed, Dec 01, 2010 at 12:08:12AM +0100, Philippe Veber wrote:
> > Short story (details below): I'm currently writing a program relying on
> > react, lablgl and ocamlsdl. This program segfaults on my laptop under two
> > linux distributions (ubuntu and gentoo) but doesn't on a PC under ubuntu.
> > The seg fault occurs with both bytecode and native executables. I don't
> do
> [...]
>
> A minimal program plus a Makefile would make helping easier.
>
> Did you tried the code with a different X-driver?
>
> Maybe it's a problem there.
>
> Or maybe something is not linked correctly against the X-libs?
>
> Just a guess.
>
>
> Ciao,
>   Oliver
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 11906 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01  8:32   ` Philippe Veber
@ 2010-12-01  9:15     ` oliver
  2010-12-01 10:26       ` Philippe Veber
  0 siblings, 1 reply; 13+ messages in thread
From: oliver @ 2010-12-01  9:15 UTC (permalink / raw)
  To: caml users

Hi,


On Wed, Dec 01, 2010 at 09:32:16AM +0100, Philippe Veber wrote:
> Actually I was not confident I could extract a small program reproducing the
> issue until ... you had me try ! I could get a very tiny example that
> behaves exactly the same, which does not involve opengl at all, only sdl.
> Here it is :
[...]


After installing some sdl-related packages, I could comopile the code.
So far it does not crash.

What actions do create the segfault for you?


Ciao,
   Oliver



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01  9:15     ` oliver
@ 2010-12-01 10:26       ` Philippe Veber
  2010-12-01 10:51         ` oliver
  0 siblings, 1 reply; 13+ messages in thread
From: Philippe Veber @ 2010-12-01 10:26 UTC (permalink / raw)
  To: oliver; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

2010/12/1 <oliver@first.in-berlin.de>

> Hi,
>
>
> On Wed, Dec 01, 2010 at 09:32:16AM +0100, Philippe Veber wrote:
> > Actually I was not confident I could extract a small program reproducing
> the
> > issue until ... you had me try ! I could get a very tiny example that
> > behaves exactly the same, which does not involve opengl at all, only sdl.
> > Here it is :
> [...]
>
>
> After installing some sdl-related packages, I could comopile the code.
> So far it does not crash.
>
> What actions do create the segfault for you?
>

roll the mouse wheel up or down fast with the cursor on the window. However,
I know that this problem does not occur everywhere, so you might well
observe nothing ...



>
>
> Ciao,
>   Oliver
>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1924 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01 10:26       ` Philippe Veber
@ 2010-12-01 10:51         ` oliver
  2010-12-01 15:08           ` Philippe Veber
  0 siblings, 1 reply; 13+ messages in thread
From: oliver @ 2010-12-01 10:51 UTC (permalink / raw)
  To: caml users

On Wed, Dec 01, 2010 at 11:26:19AM +0100, Philippe Veber wrote:
> 2010/12/1 <oliver@first.in-berlin.de>
> 
> > Hi,
> >
> >
> > On Wed, Dec 01, 2010 at 09:32:16AM +0100, Philippe Veber wrote:
> > > Actually I was not confident I could extract a small program reproducing
> > the
> > > issue until ... you had me try ! I could get a very tiny example that
> > > behaves exactly the same, which does not involve opengl at all, only sdl.
> > > Here it is :
> > [...]
> >
> >
> > After installing some sdl-related packages, I could comopile the code.
> > So far it does not crash.
> >
> > What actions do create the segfault for you?
> >
> 
> roll the mouse wheel up or down fast with the cursor on the window. However,
> I know that this problem does not occur everywhere, so you might well
> observe nothing ...
[...]

No crash happened.


Normally I'm very gifted to crash software.... by just looking at it.

If there is a bug, it will find me  ;)

Did you tried another X-driver?

In your valgrind printout there was mentioned "libnvidia".
And a crash seems to have been occured at that part:


> ==11306==    by 0x4215D5: camlHum__entry (in
> /home/pveber/hum/_build/src/hum.native)
> ==11306==    by 0xC5767A7: ???
> ==11306==
> ==11306== Conditional jump or move depends on uninitialised value(s)
> ==11306==    at 0x7695580: ??? (in
> /usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
> ==11306==    by 0x1000000000000: ???
> ==11306==
> vex amd64->IR: unhandled instruction bytes: 0xFF 0xD8 0xC 0xF9 0xFF 0xD8
> ==11306== valgrind: Unrecognised instruction at address 0x4992cb.
> ==11306== Your program just tried to execute an instruction that Valgrind


So, please chack this. Maybe one of the free drivers does work better,
or maybe an update could help.

If you update your kernel, then you might also need to update the
X-drivers, because the nvidia stuff is non-free binary stuff,
and maybe some bindings don't work correctly with a new kernel.


I once experienced problems the other way around: crashing blender
with the free drivers, just by scaling into a view more and more,
and no crash with the non-free drivers.

This X11-driver part is really a desert...

...and even if you don't use OpenGL commands... the driver that you installed
and configured will be used nevretheless, and if there is something wrong, you
will get your crashes.

Look if the X11-driver-bindings are all up to date for the driver you use now,
and also try another X11-driver...



Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01 10:51         ` oliver
@ 2010-12-01 15:08           ` Philippe Veber
  0 siblings, 0 replies; 13+ messages in thread
From: Philippe Veber @ 2010-12-01 15:08 UTC (permalink / raw)
  To: oliver; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 2961 bytes --]

2010/12/1 <oliver@first.in-berlin.de>

> On Wed, Dec 01, 2010 at 11:26:19AM +0100, Philippe Veber wrote:
> > 2010/12/1 <oliver@first.in-berlin.de>
> >
> > > Hi,
> > >
> > >
> > > On Wed, Dec 01, 2010 at 09:32:16AM +0100, Philippe Veber wrote:
> > > > Actually I was not confident I could extract a small program
> reproducing
> > > the
> > > > issue until ... you had me try ! I could get a very tiny example that
> > > > behaves exactly the same, which does not involve opengl at all, only
> sdl.
> > > > Here it is :
> > > [...]
> > >
> > >
> > > After installing some sdl-related packages, I could comopile the code.
> > > So far it does not crash.
> > >
> > > What actions do create the segfault for you?
> > >
> >
> > roll the mouse wheel up or down fast with the cursor on the window.
> However,
> > I know that this problem does not occur everywhere, so you might well
> > observe nothing ...
> [...]
>
> No crash happened.
>
>
> Normally I'm very gifted to crash software.... by just looking at it.
>
> If there is a bug, it will find me  ;)
>
> Did you tried another X-driver?
>
> In your valgrind printout there was mentioned "libnvidia".
> And a crash seems to have been occured at that part:
>
>
> > ==11306==    by 0x4215D5: camlHum__entry (in
> > /home/pveber/hum/_build/src/hum.native)
> > ==11306==    by 0xC5767A7: ???
> > ==11306==
> > ==11306== Conditional jump or move depends on uninitialised value(s)
> > ==11306==    at 0x7695580: ??? (in
> > /usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06)
> > ==11306==    by 0x1000000000000: ???
> > ==11306==
> > vex amd64->IR: unhandled instruction bytes: 0xFF 0xD8 0xC 0xF9 0xFF 0xD8
> > ==11306== valgrind: Unrecognised instruction at address 0x4992cb.
> > ==11306== Your program just tried to execute an instruction that Valgrind
>
>
> So, please chack this. Maybe one of the free drivers does work better,
> or maybe an update could help.
>
> If you update your kernel, then you might also need to update the
> X-drivers, because the nvidia stuff is non-free binary stuff,
> and maybe some bindings don't work correctly with a new kernel.
>
>
> I once experienced problems the other way around: crashing blender
> with the free drivers, just by scaling into a view more and more,
> and no crash with the non-free drivers.
>
> This X11-driver part is really a desert...
>
> ...and even if you don't use OpenGL commands... the driver that you
> installed
> and configured will be used nevretheless, and if there is something wrong,
> you
> will get your crashes.
>
> Look if the X11-driver-bindings are all up to date for the driver you use
> now,
>
they are.


> and also try another X11-driver...
>
I sure will.

The thing is in my last example, there is no warning from the nvidia driver
anymore, so I'd think the bug must be in the sdl part. I should also check
if this doesn't come from the touchpad driver, using a real mouse on the
same system.
Anyway thanks for your comments !

ph.

[-- Attachment #2: Type: text/html, Size: 3945 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01  1:59 ` Romain Beauxis
@ 2010-12-01 15:17   ` Philippe Veber
  2010-12-01 15:27     ` oliver
  2010-12-01 16:15     ` Romain Beauxis
  0 siblings, 2 replies; 13+ messages in thread
From: Philippe Veber @ 2010-12-01 15:17 UTC (permalink / raw)
  To: Romain Beauxis; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]

2010/12/1 Romain Beauxis <toots@rastageeks.org>

> Hi,
>
> Le mardi 30 novembre 2010 17:08:12, Philippe Veber a écrit :
> > The seg fault occurs during the call to this function with the button
> event
> > retrieved by ocamlsdl. What's really weird is that if I comment the third
> > case of the pattern matching, the seg fault does not occur. This is
> strange
> > since with the "assert false" expression, I make sure this case is
> useless
> > (i don't press the left button). Also, in the various tests I made, I
> > obtained different errors, like segmentation fault in caml_absf_mask or
> > invalid instruction error.
>
> The function that triggers the segfault may be confusing, in particular in
> case of a memory corruption, which I suspect here.
> The pattern matching can cause a crash because it is using a value that is
> already corrupted and because the third case is one that, for some random
> conditions, touches the part in memory that is corrupted.
>
How is this possible if it is never reached (no left click) ?


>
> In this case, I would try to unroll the code and see where the value that
> is
> used in this function was instanciated.
>
What do you mean by "unrolling the code" ?



>
> Main source of corruption when using C bindings most often come from either
> the Gc or code executed while the global lock has been released.
>
> In the case of a segfault hapenning during a Gc call, this can be really
> unrelated, for instance the instanciation of a new value triggers a Gc
> collection to compact memory, which in turns triggers the recollection of a
> corrupted value, which causes a segfault.
>
> In the case of a segfault hapenning during a C call while the global lock
> has
> been released, you may get more useful informations through gdb, in
> particular
> the trace of the C code used at the time of the segfault. You need have the
> debugging symbols for the dynamic C libraries used as well.
>
> We experienced a couple of segfault with ocaml SDL too but in unrelated
> parts
> (video). I do not mean to criticize upstream's work on ocaml SDL because I
> know for a fact that these types of bindings are really hard to code.
> However,
> I would suspect an issue there.
>
> Finally, the best approach could be to actually look closely to the
> binding's
> code and try to spot anything fishy there related to your issue. This
> generaly
> worked better for me than trying to get information from gdb and the like..
>

Many thanks for the clarification. Maybe I could (partially) "unplug" the GC
by setting space_overhead to 100 ? That could give an indication on the
moment the problem occurs ?
ph.

[-- Attachment #2: Type: text/html, Size: 3454 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01  5:51 ` Ilya Seleznev
@ 2010-12-01 15:21   ` Philippe Veber
  0 siblings, 0 replies; 13+ messages in thread
From: Philippe Veber @ 2010-12-01 15:21 UTC (permalink / raw)
  To: Ilya Seleznev; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

2010/12/1 Ilya Seleznev <itsuart@gmail.com>

> On Wed, Dec 1, 2010 at 5:08 AM, Philippe Veber
> <philippe.veber@googlemail.com> wrote:
> > Short story (details below): I'm currently writing a program relying on
> > react, lablgl and ocamlsdl. This program segfaults on my laptop under two
> > linux distributions (ubuntu and gentoo) but doesn't on a PC under ubuntu.
> > The seg fault occurs with both bytecode and native executables. I don't
> do
> > any marshaling nor use any typing magic; stack overflow is not likely. I
> > humbly ask this list about means to improve valgrind or gdb outputs,
> which
> > don't report informative function names, or more generally, any tip that
> > could help me to locate the origin of the problem.
>
> I would log mouse events, that went into 'picking_handler' to ensure
> that no unexpected input sent by SDL.
>
>  Thanks for your answer. Unfortunately from the event type there is not
much to learn that could help here. Or at least I don't see what ... Or
maybe you're refering to a low level logging, that is, log values at the C
binding level ?
ph.

[-- Attachment #2: Type: text/html, Size: 1541 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01 15:17   ` Philippe Veber
@ 2010-12-01 15:27     ` oliver
  2010-12-01 16:15     ` Romain Beauxis
  1 sibling, 0 replies; 13+ messages in thread
From: oliver @ 2010-12-01 15:27 UTC (permalink / raw)
  To: caml-list

On Wed, Dec 01, 2010 at 04:17:15PM +0100, Philippe Veber wrote:
[...]
> Many thanks for the clarification. Maybe I could (partially) "unplug" the GC
> by setting space_overhead to 100 ? That could give an indication on the
> moment the problem occurs ?
> ph.
[...]


There are also verbosity-Options for the GC.
If you set the v-option of OCAMLRUNPARAM according to the
documentation of the runtimesystem, the GC will inform you
on his actions.

You can use it to look for to often done GC-actions
(for optimizing the settings for a speedup).

This might be helpful for your bug-research also.


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Tips to find the cause of a seg fault
  2010-12-01 15:17   ` Philippe Veber
  2010-12-01 15:27     ` oliver
@ 2010-12-01 16:15     ` Romain Beauxis
  1 sibling, 0 replies; 13+ messages in thread
From: Romain Beauxis @ 2010-12-01 16:15 UTC (permalink / raw)
  To: caml-list

Le mercredi 1 décembre 2010 09:17:15, Philippe Veber a écrit :
> > The function that triggers the segfault may be confusing, in particular
> > in case of a memory corruption, which I suspect here.
> > The pattern matching can cause a crash because it is using a value that
> > is already corrupted and because the third case is one that, for some
> > random conditions, touches the part in memory that is corrupted.
> 
> How is this possible if it is never reached (no left click) ?

Well, I was giving a general reply which may or may not apply here..
The fact that the problem goes away when you uncomment the unused case could 
be unrelated, though. It could also be that the issue is not related to this 
exact function but that the compiled binary has a different execution flow when 
you uncomment the third case..

> > In this case, I would try to unroll the code and see where the value that
> > is
> > used in this function was instanciated.
> 
> What do you mean by "unrolling the code" ?

Looking backward where the value used in the function was instanciated.

> > Finally, the best approach could be to actually look closely to the
> > binding's
> > code and try to spot anything fishy there related to your issue. This
> > generaly
> > worked better for me than trying to get information from gdb and the
> > like..
> 
> Many thanks for the clarification. Maybe I could (partially) "unplug" the
> GC by setting space_overhead to 100 ? That could give an indication on the
> moment the problem occurs ?

I've never tried this. What you can try also for instance is to comment the 
code that finalizes a value that you suspect causes the segfault..

However, I do not think your issue is related to the Gc. The backtrace does 
not seem to indicate that it occurs in C code with global lock removed so 
maybe what I said was irrelevant.

I have tried your minimal example and it does not segfault here too. As for 
Olivier, maybe this means that this is also related to the driver you are 
using. However, the segfault definitely seem to occur in ocaml part of the code 
so it seems that the problem is entangled, at least.


Romain
Romain


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-12-01 16:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-30 23:08 Tips to find the cause of a seg fault Philippe Veber
2010-11-30 23:18 ` [Caml-list] " oliver
2010-12-01  8:32   ` Philippe Veber
2010-12-01  9:15     ` oliver
2010-12-01 10:26       ` Philippe Veber
2010-12-01 10:51         ` oliver
2010-12-01 15:08           ` Philippe Veber
2010-12-01  1:59 ` Romain Beauxis
2010-12-01 15:17   ` Philippe Veber
2010-12-01 15:27     ` oliver
2010-12-01 16:15     ` Romain Beauxis
2010-12-01  5:51 ` Ilya Seleznev
2010-12-01 15:21   ` Philippe Veber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).