From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 4AB08BC58 for ; Wed, 1 Dec 2010 16:17:39 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApQAAPf19UzRVda0kGdsb2JhbACjCwgVAQEBAQkJDAcRBB6qHIt8AQWOCgEEhUeKag X-IronPort-AV: E=Sophos;i="4.59,283,1288566000"; d="scan'208";a="68937903" Received: from mail-iw0-f180.google.com ([209.85.214.180]) by mail3-smtp-sop.national.inria.fr with ESMTP; 01 Dec 2010 16:17:38 +0100 Received: by iwn37 with SMTP id 37so8392001iwn.39 for ; Wed, 01 Dec 2010 07:17:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type; bh=Vc1bXlV1CpOePG2QyKEnUAvz9bgZ3rNiGVFgeBIOGx4=; b=lWgwAWSSXj0AkDWOLRmpHj8fgU9L9FZKA10jpHNQc0Yg8f8mECI7spJuubTMcNeiHT lusztR2slnax3Vs9Y0y8ysMyFIFUsM/R9LI16RJ8B6+OlBd5dpQmXCtqQpBoy6WvLgyE Nl4MACXDxh3VtYceQKG+RGv1/pfb6FVnkC4KM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=NdxdCC5q9IB3IOpJyo6VjXFbtfiplDPg8Dp+Bm4RbtR+HmVUu2D2hrXf/s+xMtDOK+ daQA9j4jA8yeDOwpWdfgFOumys4HKSZoKmOITofwVtIsGnDZZWejIOgODUdnRQWyL/13 5uXwQrBZ7Pib9/yAhjs4fZzJAlbzJTjuVBweE= Received: by 10.231.169.70 with SMTP id x6mr9100289iby.2.1291216656140; Wed, 01 Dec 2010 07:17:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.176.134 with HTTP; Wed, 1 Dec 2010 07:17:15 -0800 (PST) In-Reply-To: <201011301959.04218.toots@rastageeks.org> References: <201011301959.04218.toots@rastageeks.org> From: Philippe Veber Date: Wed, 1 Dec 2010 16:17:15 +0100 Message-ID: Subject: Re: [Caml-list] Tips to find the cause of a seg fault To: Romain Beauxis Cc: caml-list@yquem.inria.fr Content-Type: multipart/alternative; boundary=001636d338971e0ad504965aceac X-Spam: no; 0.00; ocamlsdl:01 segfault:01 instanciated:01 unrolling:01 bindings:01 segfault:01 gdb:01 ocaml:01 ocaml:01 bindings:01 gdb:01 ocamlsdl:01 instanciated:01 assert:01 caml-list:01 --001636d338971e0ad504965aceac Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 2010/12/1 Romain Beauxis > Hi, > > Le mardi 30 novembre 2010 17:08:12, Philippe Veber a =E9crit : > > The seg fault occurs during the call to this function with the button > event > > retrieved by ocamlsdl. What's really weird is that if I comment the thi= rd > > case of the pattern matching, the seg fault does not occur. This is > strange > > since with the "assert false" expression, I make sure this case is > useless > > (i don't press the left button). Also, in the various tests I made, I > > obtained different errors, like segmentation fault in caml_absf_mask or > > invalid instruction error. > > The function that triggers the segfault may be confusing, in particular i= n > case of a memory corruption, which I suspect here. > The pattern matching can cause a crash because it is using a value that i= s > already corrupted and because the third case is one that, for some random > conditions, touches the part in memory that is corrupted. > How is this possible if it is never reached (no left click) ? > > In this case, I would try to unroll the code and see where the value that > is > used in this function was instanciated. > What do you mean by "unrolling the code" ? > > Main source of corruption when using C bindings most often come from eith= er > the Gc or code executed while the global lock has been released. > > In the case of a segfault hapenning during a Gc call, this can be really > unrelated, for instance the instanciation of a new value triggers a Gc > collection to compact memory, which in turns triggers the recollection of= a > corrupted value, which causes a segfault. > > In the case of a segfault hapenning during a C call while the global lock > has > been released, you may get more useful informations through gdb, in > particular > the trace of the C code used at the time of the segfault. You need have t= he > debugging symbols for the dynamic C libraries used as well. > > We experienced a couple of segfault with ocaml SDL too but in unrelated > parts > (video). I do not mean to criticize upstream's work on ocaml SDL because = I > know for a fact that these types of bindings are really hard to code. > However, > I would suspect an issue there. > > Finally, the best approach could be to actually look closely to the > binding's > code and try to spot anything fishy there related to your issue. This > generaly > worked better for me than trying to get information from gdb and the like= .. > Many thanks for the clarification. Maybe I could (partially) "unplug" the G= C by setting space_overhead to 100 ? That could give an indication on the moment the problem occurs ? ph. --001636d338971e0ad504965aceac Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

2010/12/1 Romain Beauxis <toots@rastageeks.org>=
Hi,

Le mardi 30 novembre 2010 17:08:12, Philippe Veber a =E9crit :
> The seg fault occurs during the call to this functio= n with the button event
> retrieved by ocamlsdl. What's really weird is that if I comment th= e third
> case of the pattern matching, the seg fault does not occur. This is st= range
> since with the "assert false" expression, I make sure this c= ase is useless
> (i don't press the left button). Also, in the various tests I made= , I
> obtained different errors, like segmentation fault in caml_absf_mask o= r
> invalid instruction error.

The function that triggers the segfault may be confusing, in particul= ar in
case of a memory corruption, which I suspect here.
The pattern matching can cause a crash because it is using a value that is<= br> already corrupted and because the third case is one that, for some random conditions, touches the part in memory that is corrupted.
<= div>How is this possible if it is never reached (no left click) ?
=A0

In this case, I would try to unroll the code and see where the value that i= s
used in this function was instanciated.
What do you me= an by "unrolling the code" ?

=A0

Main source of corruption when using C bindings most often come from either=
the Gc or code executed while the global lock has been released.

In the case of a segfault hapenning during a Gc call, this can be really unrelated, for instance the instanciation of a new value triggers a Gc
collection to compact memory, which in turns triggers the recollection of a=
corrupted value, which causes a segfault.

In the case of a segfault hapenning during a C call while the global lock h= as
been released, you may get more useful informations through gdb, in particu= lar
the trace of the C code used at the time of the segfault. You need have the=
debugging symbols for the dynamic C libraries used as well.

We experienced a couple of segfault with ocaml SDL too but in unrelated par= ts
(video). I do not mean to criticize upstream's work on ocaml SDL becaus= e I
know for a fact that these types of bindings are really hard to code. Howev= er,
I would suspect an issue there.

Finally, the best approach could be to actually look closely to the binding= 's
code and try to spot anything fishy there related to your issue. This gener= aly
worked better for me than trying to get information from gdb and the like..=

Many thanks for the clarification. Maybe I could = (partially) "unplug" the GC by setting space_overhead to 100 ? Th= at could give an indication on the moment the problem occurs ?
ph.


--001636d338971e0ad504965aceac--