caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] tips to debug ocaml programs segfaulting
@ 2011-03-03 17:56 Yoann Padioleau
  2011-03-03 18:10 ` Yoann Padioleau
  2011-03-03 18:24 ` Guillaume Yziquel
  0 siblings, 2 replies; 5+ messages in thread
From: Yoann Padioleau @ 2011-03-03 17:56 UTC (permalink / raw)
  To: Caml List

Hi,

I have a quite large program that segfaults. I can reproduce the segfault deterministically but have no idea
how to fix it. The program is a server that given a filename lookup information in a berkley DB database on this file
and then returns some results. For certain files everything is right but for other files the program just segfault.
When I attach with gdb on the server here is what I get:

[pad@unittest002 ~]$ gdb /home/engshare/tools/pfff_server 22436
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
...
Attaching to program: /home/engshare/tools/pfff_server, process 22436
...
Reading symbols from /lib64/libpcre.so.0...done.
Loaded symbols for /lib64/libpcre.so.0
Reading symbols from /lib64/libdb-4.3.so...done.
Loaded symbols for /lib64/libdb-4.3.so
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 46912496215408 (LWP 22436)]
[New Thread 1176140096 (LWP 23759)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libncurses.so.5...done.
Loaded symbols for /usr/lib64/libncurses.so.5
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
(gdb) bt
#0  0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
#1  0x000000000040de8f in unix_accept ()
#2  0x0000000000425dd9 in caml_interprete ()
#3  0x000000000041317a in caml_main ()
#4  0x00000000004249cc in main ()
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) n
Program not restarted.
(gdb) 
(gdb) continue
Continuing.
[New Thread 1124940096 (LWP 24691)]
[Thread 1124940096 (LWP 24691) exited]
[New Thread 1124940096 (LWP 24723)]
[Thread 1124940096 (LWP 24723) exited]
[New Thread 1124940096 (LWP 24758)]
[Thread 1124940096 (LWP 24758) exited]
[New Thread 1124940096 (LWP 24796)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1124940096 (LWP 24796)]
0x00000000004258d0 in caml_interprete ()
(gdb) bt
#0  0x00000000004258d0 in caml_interprete ()
#1  0x0000000000421c32 in caml_callbackN_exn ()
#2  0x0000000000421d16 in caml_callback_exn ()
#3  0x00000000004095e9 in caml_thread_start ()
#4  0x000000358ac062f7 in start_thread () from /lib64/libpthread.so.0
#5  0x000000358a0d1e3d in clone () from /lib64/libc.so.6
(gdb) 


At this point I don't know what to do. No idea how from this backtrace to go back to the root cause of the segfault. Any tips ?




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] tips to debug ocaml programs segfaulting
  2011-03-03 17:56 [Caml-list] tips to debug ocaml programs segfaulting Yoann Padioleau
@ 2011-03-03 18:10 ` Yoann Padioleau
  2011-03-03 18:19   ` Yoann Padioleau
  2011-03-03 18:24 ` Guillaume Yziquel
  1 sibling, 1 reply; 5+ messages in thread
From: Yoann Padioleau @ 2011-03-03 18:10 UTC (permalink / raw)
  To: Caml List

And this is what I get when in native mode:

[pad@unittest002 ~]$ gdb /home/engshare/tools/pfff_server.opt 28322
GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".../home/pad/.gdbinit:1: Error in sourced command file:
Undefined command: "python".  Try "help".
Using host libthread_db library "/lib64/libthread_db.so.1".

Attaching to program: /home/engshare/tools/pfff_server.opt, process 28322
Reading symbols from /lib64/libpcre.so.0...done.
Loaded symbols for /lib64/libpcre.so.0
Reading symbols from /lib64/libdb-4.3.so...done.
Loaded symbols for /lib64/libdb-4.3.so
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 46912496213936 (LWP 28322)]
[New Thread 1176140096 (LWP 28627)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
(gdb) continue
Continuing.
[New Thread 1124940096 (LWP 28767)]
[Thread 1124940096 (LWP 28767) exited]
[New Thread 1124940096 (LWP 28808)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1124940096 (LWP 28808)]
0x00000000004d06e6 in camlVisitor_php__v_paren_685 ()
(gdb) bt
#0  0x00000000004d06e6 in camlVisitor_php__v_paren_685 ()
#1  0x00002aaaaaad71e8 in ?? ()
#2  0x00002aaaaaad7500 in ?? ()
#3  0x0000000000000b00 in ?? ()
#4  0x00000000004cdde2 in camlVisitor_php__v_variablebis_779 ()
#5  0x00002aaaaaad7fb0 in ?? ()
#6  0x000000001769e888 in ?? ()
#7  0x00000000430d2a80 in ?? ()
#8  0x00000000004caca8 in camlVisitor_php__k_1608 ()
...

the Visitor_php.v_paren function is as the name suggest part of a set of functions
to help visit the AST of a PHP program. This AST is marshalled in berkeley DB tables.
I guess that's one possible cause for this segfault, a bug in berkeley DB that causes
an incorrect marshalling of the AST which when unmarshalled cause some segfault ?





On Mar 3, 2011, at 9:56 AM, Yoann Padioleau wrote:

> Hi,
> 
> I have a quite large program that segfaults. I can reproduce the segfault deterministically but have no idea
> how to fix it. The program is a server that given a filename lookup information in a berkley DB database on this file
> and then returns some results. For certain files everything is right but for other files the program just segfault.
> When I attach with gdb on the server here is what I get:
> 
> [pad@unittest002 ~]$ gdb /home/engshare/tools/pfff_server 22436
> GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
> ...
> Attaching to program: /home/engshare/tools/pfff_server, process 22436
> ...
> Reading symbols from /lib64/libpcre.so.0...done.
> Loaded symbols for /lib64/libpcre.so.0
> Reading symbols from /lib64/libdb-4.3.so...done.
> Loaded symbols for /lib64/libdb-4.3.so
> Reading symbols from /lib64/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 46912496215408 (LWP 22436)]
> [New Thread 1176140096 (LWP 23759)]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libm.so.6...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /usr/lib64/libncurses.so.5...done.
> Loaded symbols for /usr/lib64/libncurses.so.5
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
> #1  0x000000000040de8f in unix_accept ()
> #2  0x0000000000425dd9 in caml_interprete ()
> #3  0x000000000041317a in caml_main ()
> #4  0x00000000004249cc in main ()
> (gdb) run
> The program being debugged has been started already.
> Start it from the beginning? (y or n) n
> Program not restarted.
> (gdb) 
> (gdb) continue
> Continuing.
> [New Thread 1124940096 (LWP 24691)]
> [Thread 1124940096 (LWP 24691) exited]
> [New Thread 1124940096 (LWP 24723)]
> [Thread 1124940096 (LWP 24723) exited]
> [New Thread 1124940096 (LWP 24758)]
> [Thread 1124940096 (LWP 24758) exited]
> [New Thread 1124940096 (LWP 24796)]
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 1124940096 (LWP 24796)]
> 0x00000000004258d0 in caml_interprete ()
> (gdb) bt
> #0  0x00000000004258d0 in caml_interprete ()
> #1  0x0000000000421c32 in caml_callbackN_exn ()
> #2  0x0000000000421d16 in caml_callback_exn ()
> #3  0x00000000004095e9 in caml_thread_start ()
> #4  0x000000358ac062f7 in start_thread () from /lib64/libpthread.so.0
> #5  0x000000358a0d1e3d in clone () from /lib64/libc.so.6
> (gdb) 
> 
> 
> At this point I don't know what to do. No idea how from this backtrace to go back to the root cause of the segfault. Any tips ?
> 
> 
> 
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] tips to debug ocaml programs segfaulting
  2011-03-03 18:10 ` Yoann Padioleau
@ 2011-03-03 18:19   ` Yoann Padioleau
  2011-03-03 21:08     ` ygrek
  0 siblings, 1 reply; 5+ messages in thread
From: Yoann Padioleau @ 2011-03-03 18:19 UTC (permalink / raw)
  To: Caml List


On Mar 3, 2011, at 10:10 AM, Yoann Padioleau wrote:

> And this is what I get when in native mode:
> 
> [pad@unittest002 ~]$ gdb /home/engshare/tools/pfff_server.opt 28322
> GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
> Copyright (C) 2006 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".../home/pad/.gdbinit:1: Error in sourced command file:
> Undefined command: "python".  Try "help".
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 
> Attaching to program: /home/engshare/tools/pfff_server.opt, process 28322
> Reading symbols from /lib64/libpcre.so.0...done.
> Loaded symbols for /lib64/libpcre.so.0
> Reading symbols from /lib64/libdb-4.3.so...done.
> Loaded symbols for /lib64/libdb-4.3.so
> Reading symbols from /lib64/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 46912496213936 (LWP 28322)]
> [New Thread 1176140096 (LWP 28627)]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libm.so.6...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
> (gdb) continue
> Continuing.
> [New Thread 1124940096 (LWP 28767)]
> [Thread 1124940096 (LWP 28767) exited]
> [New Thread 1124940096 (LWP 28808)]
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 1124940096 (LWP 28808)]
> 0x00000000004d06e6 in camlVisitor_php__v_paren_685 ()
> (gdb) bt
> #0  0x00000000004d06e6 in camlVisitor_php__v_paren_685 ()
> #1  0x00002aaaaaad71e8 in ?? ()
> #2  0x00002aaaaaad7500 in ?? ()
> #3  0x0000000000000b00 in ?? ()
> #4  0x00000000004cdde2 in camlVisitor_php__v_variablebis_779 ()


I think I've found the bug ... It's because I recently changed the type definition for variablebis but
was running the server on a database of old AST, which do not have the same definition for variablebis.
Damn those native code backtraces are useful. Damn I hate unsafe unmarshallng ...

Sorry for the noise.

> #5  0x00002aaaaaad7fb0 in ?? ()
> #6  0x000000001769e888 in ?? ()
> #7  0x00000000430d2a80 in ?? ()
> #8  0x00000000004caca8 in camlVisitor_php__k_1608 ()
> ...
> 
> the Visitor_php.v_paren function is as the name suggest part of a set of functions
> to help visit the AST of a PHP program. This AST is marshalled in berkeley DB tables.
> I guess that's one possible cause for this segfault, a bug in berkeley DB that causes
> an incorrect marshalling of the AST which when unmarshalled cause some segfault ?
> 
> 
> 
> 
> 
> On Mar 3, 2011, at 9:56 AM, Yoann Padioleau wrote:
> 
>> Hi,
>> 
>> I have a quite large program that segfaults. I can reproduce the segfault deterministically but have no idea
>> how to fix it. The program is a server that given a filename lookup information in a berkley DB database on this file
>> and then returns some results. For certain files everything is right but for other files the program just segfault.
>> When I attach with gdb on the server here is what I get:
>> 
>> [pad@unittest002 ~]$ gdb /home/engshare/tools/pfff_server 22436
>> GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
>> ...
>> Attaching to program: /home/engshare/tools/pfff_server, process 22436
>> ...
>> Reading symbols from /lib64/libpcre.so.0...done.
>> Loaded symbols for /lib64/libpcre.so.0
>> Reading symbols from /lib64/libdb-4.3.so...done.
>> Loaded symbols for /lib64/libdb-4.3.so
>> Reading symbols from /lib64/libpthread.so.0...done.
>> [Thread debugging using libthread_db enabled]
>> [New Thread 46912496215408 (LWP 22436)]
>> [New Thread 1176140096 (LWP 23759)]
>> Loaded symbols for /lib64/libpthread.so.0
>> Reading symbols from /lib64/libm.so.6...done.
>> Loaded symbols for /lib64/libm.so.6
>> Reading symbols from /lib64/libdl.so.2...done.
>> Loaded symbols for /lib64/libdl.so.2
>> Reading symbols from /usr/lib64/libncurses.so.5...done.
>> Loaded symbols for /usr/lib64/libncurses.so.5
>> Reading symbols from /lib64/libc.so.6...done.
>> Loaded symbols for /lib64/libc.so.6
>> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>> 0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x000000358ac0ceab in accept () from /lib64/libpthread.so.0
>> #1  0x000000000040de8f in unix_accept ()
>> #2  0x0000000000425dd9 in caml_interprete ()
>> #3  0x000000000041317a in caml_main ()
>> #4  0x00000000004249cc in main ()
>> (gdb) run
>> The program being debugged has been started already.
>> Start it from the beginning? (y or n) n
>> Program not restarted.
>> (gdb) 
>> (gdb) continue
>> Continuing.
>> [New Thread 1124940096 (LWP 24691)]
>> [Thread 1124940096 (LWP 24691) exited]
>> [New Thread 1124940096 (LWP 24723)]
>> [Thread 1124940096 (LWP 24723) exited]
>> [New Thread 1124940096 (LWP 24758)]
>> [Thread 1124940096 (LWP 24758) exited]
>> [New Thread 1124940096 (LWP 24796)]
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 1124940096 (LWP 24796)]
>> 0x00000000004258d0 in caml_interprete ()
>> (gdb) bt
>> #0  0x00000000004258d0 in caml_interprete ()
>> #1  0x0000000000421c32 in caml_callbackN_exn ()
>> #2  0x0000000000421d16 in caml_callback_exn ()
>> #3  0x00000000004095e9 in caml_thread_start ()
>> #4  0x000000358ac062f7 in start_thread () from /lib64/libpthread.so.0
>> #5  0x000000358a0d1e3d in clone () from /lib64/libc.so.6
>> (gdb) 
>> 
>> 
>> At this point I don't know what to do. No idea how from this backtrace to go back to the root cause of the segfault. Any tips ?
>> 
>> 
>> 
>> 
>> -- 
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa-roc.inria.fr/wws/info/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>> 
> 
> 
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] tips to debug ocaml programs segfaulting
  2011-03-03 17:56 [Caml-list] tips to debug ocaml programs segfaulting Yoann Padioleau
  2011-03-03 18:10 ` Yoann Padioleau
@ 2011-03-03 18:24 ` Guillaume Yziquel
  1 sibling, 0 replies; 5+ messages in thread
From: Guillaume Yziquel @ 2011-03-03 18:24 UTC (permalink / raw)
  To: Yoann Padioleau; +Cc: Caml List

Le Thursday 03 Mar 2011 à 09:56:55 (-0800), Yoann Padioleau a écrit :
> Hi,
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 1124940096 (LWP 24796)]
> 0x00000000004258d0 in caml_interprete ()
> (gdb) bt
> #0  0x00000000004258d0 in caml_interprete ()
> #1  0x0000000000421c32 in caml_callbackN_exn ()
> #2  0x0000000000421d16 in caml_callback_exn ()
> #3  0x00000000004095e9 in caml_thread_start ()
> #4  0x000000358ac062f7 in start_thread () from /lib64/libpthread.so.0
> #5  0x000000358a0d1e3d in clone () from /lib64/libc.so.6
> (gdb) 
> 
> 
> At this point I don't know what to do. No idea how from this backtrace to go back to the root cause of the segfault. Any tips ?

Your program seems quite complex, so it's a bit hard to tell. But since
you are using bytecode callbacks, I'd be curious to know if you are
having the same issue when compiled to native code. If not, then it
reminds me of an issue I had.

You could disassemble the caml_interprete function and follow the
execution of the bytecode interpreter. A possible segfault reason in
caml_interprete is the following. There is a 'movq *%rax' instruction
(at least on my box) from which the interpretation of bytecode is
dispatched to the appropriate case. When using bytecode closures for
callbacks, it did happen that there was one indirection too much (though
I do not remember doing anything wrong), and garbage was then fed to the
rax register. Code jumped to a garbage position, and segfaulted.

To me, that's one of the most likely segfault reasons in caml_interprete
in the presence of bytecode callbacks.

So disassemble caml_interprete, look at registers with info registers,
and check that you're not jumping to a random position.

-- 
     Guillaume Yziquel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] tips to debug ocaml programs segfaulting
  2011-03-03 18:19   ` Yoann Padioleau
@ 2011-03-03 21:08     ` ygrek
  0 siblings, 0 replies; 5+ messages in thread
From: ygrek @ 2011-03-03 21:08 UTC (permalink / raw)
  To: Caml List

On Thu, 3 Mar 2011 10:19:14 -0800
Yoann Padioleau <padator@wanadoo.fr> wrote:

> Damn I hate unsafe unmarshallng ...

This is easy to solve with small wrapper over Marshal and ensuring that 
every change in value type comes with change of tag.

module type Value =
sig
type value
val tag : string
end

exception Error

module Marshal(V : Value) =
struct

type t = V.value

let to_channel ch ?(flags=[]) x =
  output_string ch V.tag;
  Marshal.to_channel ch (x:t) flags

let from_channel ch =
  let s = String.create (String.length V.tag) in
  really_input ch s 0 (String.length V.tag);
  if s <> V.tag then raise Error;
  (Marshal.from_channel ch : t)

let to_string ?(flags=[]) x = V.tag ^ Marshal.to_string (x:t) flags
let from_string s = 
  let tag = String.slice s ~last:(String.length V.tag) in
  if tag <> V.tag then raise Error;
  (Marshal.from_string s (String.length V.tag) : t)

end

-- 
 ygrek
 http://ygrek.org.ua

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-03-03 21:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-03 17:56 [Caml-list] tips to debug ocaml programs segfaulting Yoann Padioleau
2011-03-03 18:10 ` Yoann Padioleau
2011-03-03 18:19   ` Yoann Padioleau
2011-03-03 21:08     ` ygrek
2011-03-03 18:24 ` Guillaume Yziquel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).