From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p23IJLD7028429 for ; Thu, 3 Mar 2011 19:19:21 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApEBAOtpb01QDPKCkWdsb2JhbACYLI5OAQEBAQkLCgcRAyK+XYVhBIUailQ X-IronPort-AV: E=Sophos;i="4.62,259,1297033200"; d="scan'208";a="89126209" Received: from smtp08.smtpout.orange.fr (HELO smtp.smtpout.orange.fr) ([80.12.242.130]) by mail4-smtp-sop.national.inria.fr with ESMTP; 03 Mar 2011 19:19:16 +0100 Received: from [172.24.131.25] ([66.220.144.74]) by mwinf5d16 with ME id EiKE1g0071cXi5u03iKFX0; Thu, 03 Mar 2011 19:19:16 +0100 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) From: Yoann Padioleau In-Reply-To: Date: Thu, 3 Mar 2011 10:19:14 -0800 Message-Id: <3CD59A75-1250-4CFD-9E72-AE85AE96477E@wanadoo.fr> References: To: Caml List X-Mailer: Apple Mail (2.1082) Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id p23IJLD7028429 Subject: Re: [Caml-list] tips to debug ocaml programs segfaulting On Mar 3, 2011, at 10:10 AM, Yoann Padioleau wrote: > And this is what I get when in native mode: > > [pad@unittest002 ~]$ gdb /home/engshare/tools/pfff_server.opt 28322 > GNU gdb Red Hat Linux (6.5-37.el5_2.2rh) > Copyright (C) 2006 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu".../home/pad/.gdbinit:1: Error in sourced command file: > Undefined command: "python". Try "help". > Using host libthread_db library "/lib64/libthread_db.so.1". > > Attaching to program: /home/engshare/tools/pfff_server.opt, process 28322 > Reading symbols from /lib64/libpcre.so.0...done. > Loaded symbols for /lib64/libpcre.so.0 > Reading symbols from /lib64/libdb-4.3.so...done. > Loaded symbols for /lib64/libdb-4.3.so > Reading symbols from /lib64/libpthread.so.0...done. > [Thread debugging using libthread_db enabled] > [New Thread 46912496213936 (LWP 28322)] > [New Thread 1176140096 (LWP 28627)] > Loaded symbols for /lib64/libpthread.so.0 > Reading symbols from /lib64/libm.so.6...done. > Loaded symbols for /lib64/libm.so.6 > Reading symbols from /lib64/libdl.so.2...done. > Loaded symbols for /lib64/libdl.so.2 > Reading symbols from /lib64/libc.so.6...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/ld-linux-x86-64.so.2...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > 0x000000358ac0ceab in accept () from /lib64/libpthread.so.0 > (gdb) continue > Continuing. > [New Thread 1124940096 (LWP 28767)] > [Thread 1124940096 (LWP 28767) exited] > [New Thread 1124940096 (LWP 28808)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 1124940096 (LWP 28808)] > 0x00000000004d06e6 in camlVisitor_php__v_paren_685 () > (gdb) bt > #0 0x00000000004d06e6 in camlVisitor_php__v_paren_685 () > #1 0x00002aaaaaad71e8 in ?? () > #2 0x00002aaaaaad7500 in ?? () > #3 0x0000000000000b00 in ?? () > #4 0x00000000004cdde2 in camlVisitor_php__v_variablebis_779 () I think I've found the bug ... It's because I recently changed the type definition for variablebis but was running the server on a database of old AST, which do not have the same definition for variablebis. Damn those native code backtraces are useful. Damn I hate unsafe unmarshallng ... Sorry for the noise. > #5 0x00002aaaaaad7fb0 in ?? () > #6 0x000000001769e888 in ?? () > #7 0x00000000430d2a80 in ?? () > #8 0x00000000004caca8 in camlVisitor_php__k_1608 () > ... > > the Visitor_php.v_paren function is as the name suggest part of a set of functions > to help visit the AST of a PHP program. This AST is marshalled in berkeley DB tables. > I guess that's one possible cause for this segfault, a bug in berkeley DB that causes > an incorrect marshalling of the AST which when unmarshalled cause some segfault ? > > > > > > On Mar 3, 2011, at 9:56 AM, Yoann Padioleau wrote: > >> Hi, >> >> I have a quite large program that segfaults. I can reproduce the segfault deterministically but have no idea >> how to fix it. The program is a server that given a filename lookup information in a berkley DB database on this file >> and then returns some results. For certain files everything is right but for other files the program just segfault. >> When I attach with gdb on the server here is what I get: >> >> [pad@unittest002 ~]$ gdb /home/engshare/tools/pfff_server 22436 >> GNU gdb Red Hat Linux (6.5-37.el5_2.2rh) >> ... >> Attaching to program: /home/engshare/tools/pfff_server, process 22436 >> ... >> Reading symbols from /lib64/libpcre.so.0...done. >> Loaded symbols for /lib64/libpcre.so.0 >> Reading symbols from /lib64/libdb-4.3.so...done. >> Loaded symbols for /lib64/libdb-4.3.so >> Reading symbols from /lib64/libpthread.so.0...done. >> [Thread debugging using libthread_db enabled] >> [New Thread 46912496215408 (LWP 22436)] >> [New Thread 1176140096 (LWP 23759)] >> Loaded symbols for /lib64/libpthread.so.0 >> Reading symbols from /lib64/libm.so.6...done. >> Loaded symbols for /lib64/libm.so.6 >> Reading symbols from /lib64/libdl.so.2...done. >> Loaded symbols for /lib64/libdl.so.2 >> Reading symbols from /usr/lib64/libncurses.so.5...done. >> Loaded symbols for /usr/lib64/libncurses.so.5 >> Reading symbols from /lib64/libc.so.6...done. >> Loaded symbols for /lib64/libc.so.6 >> Reading symbols from /lib64/ld-linux-x86-64.so.2...done. >> Loaded symbols for /lib64/ld-linux-x86-64.so.2 >> 0x000000358ac0ceab in accept () from /lib64/libpthread.so.0 >> (gdb) bt >> #0 0x000000358ac0ceab in accept () from /lib64/libpthread.so.0 >> #1 0x000000000040de8f in unix_accept () >> #2 0x0000000000425dd9 in caml_interprete () >> #3 0x000000000041317a in caml_main () >> #4 0x00000000004249cc in main () >> (gdb) run >> The program being debugged has been started already. >> Start it from the beginning? (y or n) n >> Program not restarted. >> (gdb) >> (gdb) continue >> Continuing. >> [New Thread 1124940096 (LWP 24691)] >> [Thread 1124940096 (LWP 24691) exited] >> [New Thread 1124940096 (LWP 24723)] >> [Thread 1124940096 (LWP 24723) exited] >> [New Thread 1124940096 (LWP 24758)] >> [Thread 1124940096 (LWP 24758) exited] >> [New Thread 1124940096 (LWP 24796)] >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 1124940096 (LWP 24796)] >> 0x00000000004258d0 in caml_interprete () >> (gdb) bt >> #0 0x00000000004258d0 in caml_interprete () >> #1 0x0000000000421c32 in caml_callbackN_exn () >> #2 0x0000000000421d16 in caml_callback_exn () >> #3 0x00000000004095e9 in caml_thread_start () >> #4 0x000000358ac062f7 in start_thread () from /lib64/libpthread.so.0 >> #5 0x000000358a0d1e3d in clone () from /lib64/libc.so.6 >> (gdb) >> >> >> At this point I don't know what to do. No idea how from this backtrace to go back to the root cause of the segfault. Any tips ? >> >> >> >> >> -- >> Caml-list mailing list. Subscription management and archives: >> https://sympa-roc.inria.fr/wws/info/caml-list >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs >> > > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >