From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 2EB83BC57 for ; Wed, 8 Sep 2010 12:31:31 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnICAEsCh0zRVda2kGdsb2JhbACGDY0hhWABhzhBCBUBAQEBCQkMBxEDH6J2iTOCFYZRLodkAQEDBYU4BIQ/hV8 X-IronPort-AV: E=Sophos;i="4.56,333,1280700000"; d="scan'208";a="58807093" Received: from mail-iw0-f182.google.com ([209.85.214.182]) by mail2-smtp-roc.national.inria.fr with ESMTP; 08 Sep 2010 12:31:30 +0200 Received: by iwn34 with SMTP id 34so8502269iwn.27 for ; Wed, 08 Sep 2010 03:31:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=pKeVczN2bHY7eRvP1D/sIwldIWzc7wjr3PBSdGgSmM0=; b=NvdZ0RM1zKcH5SW5QwZ5+Q/U0Y7oVnSGO+8mxiP8p1UGhTLI9bRR2ubvoUtIvTMQhX mbCypLj0p1oxe6u0MNuKfMJ8h8i1dMTT2+HNJomFHQMVG/fSzofpAqiYu8aD5kpRemeS TDiLSjQgJo3wOcMR83FPEXgYjoHCzq86oZvsc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Ba8f+xsv69xQ42VXmYpPxbUrdbXPzaKms0x7Q8066V8YMN7tzUZEpJ73Ul471uiMEN YQFrOHj38+uDj+BiYDq6lh/3K9vOOCTnJ1H4THg5VEfix9D4dW1rGYTUx3BT9vKj3UTO nnUEBXJNO8tcRpL/GmY6sYgkTLufV32IUGV5k= MIME-Version: 1.0 Received: by 10.231.34.139 with SMTP id l11mr9516443ibd.141.1283941889780; Wed, 08 Sep 2010 03:31:29 -0700 (PDT) Received: by 10.231.111.132 with HTTP; Wed, 8 Sep 2010 03:31:29 -0700 (PDT) In-Reply-To: References: Date: Wed, 8 Sep 2010 13:31:29 +0300 Message-ID: Subject: Re: [Caml-list] Re: Possible ocamlmpi finalization error? From: Eray Ozkural To: Sylvain Le Gall Cc: caml-list@inria.fr Content-Type: multipart/alternative; boundary=002215048ac740adc7048fbd0433 X-Spam: no; 0.00; ocamlmpi:01 eray:01 ozkural:01 le-gall:01 eray:01 ozkural:01 ocamlmpi:01 bug:01 synchronous:01 afaict:01 bilkent:01 le-gall:01 bug:01 synchronous:01 afaict:01 --002215048ac740adc7048fbd0433 Content-Type: text/plain; charset=ISO-8859-1 On Wed, Sep 8, 2010 at 10:51 AM, Sylvain Le Gall wrote: > On 08-09-2010, Eray Ozkural wrote: > > --===============0522474025== > > Content-Type: multipart/alternative; > boundary=0016369204e79d9601048fb8ae36 > > > > --0016369204e79d9601048fb8ae36 > > Content-Type: text/plain; charset=ISO-8859-1 > > > > I'm recently getting errors that are past MPI_Finalize. Since both > > init/final and communicator allocation is managed by ocamlmpi, is it > > possible this is a bug with the library? Have you ever seen something > like > > this? > > > > Using openmpi on OS X. Here is the log message: > > > > *** An error occurred in MPI_Comm_free > > *** after MPI was finalized > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > > In the code I'm using both point-to-point and collective communication, > and > > as far as I know the code is correct. Could this be due to memory > > corruption, or should this never happen? > > > > Maybe, you can give a minimal code to reproduce this error? > > Hmm, not really its a complex code but I just ran the debug version in parallel with exactly the same parameters and there is absolutely no problem with that. All communication is synchronous so timing cannot be an issue (since the debug build is naturally slower). AFAICT it's not a memory problem because no bound errors are reported in the debug build (was it on by default?). I think it's a lower-level problem than my code. This could happen if some of that resource allocation is done in different threads, for instance. Can you give me any ideas to trace the source of this problem? Best, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct --002215048ac740adc7048fbd0433 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Wed, Sep 8, 2010 at 10:51 AM, Sylvain Le Gall= <sylvain@le-ga= ll.net> wrote:
On 08-09-2010, Eray Ozkural <exa= machine@gmail.com> wrote:
> --=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D0522474025=3D=3D
> Content-Type: multipart/alternative; boundary=3D0016369204e79d9601048f= b8ae36
>
> --0016369204e79d9601048fb8ae36
> Content-Type: text/plain; charset=3DISO-8859-1
>
> I'm recently getting errors that are past MPI_Finalize. Since both=
> init/final and communicator allocation is managed by ocamlmpi, is it > possible this is a bug with the library? Have you ever seen something = like
> this?
>
> Using openmpi on OS X. Here is the log message:
>
> *** An error occurred in MPI_Comm_free
> *** after MPI was finalized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> In the code I'm using both point-to-point and collective communica= tion, and
> as far as I know the code is correct. Could this be due to memory
> corruption, or should this never happen?
>

Maybe, you can give a minimal code to reproduce this error?


Hmm, not really its a complex code but I jus= t ran the debug version in parallel with exactly the same parameters and th= ere is absolutely no problem with that. All communication is synchronous so= timing cannot be an issue (since the debug build is naturally slower). AFA= ICT it's not a memory problem because no bound errors are reported in t= he debug build (was it on by default?). I think it's a lower-level prob= lem than =A0my code. This could happen if some of that resource allocation = is done in different threads, for instance.

Can you give me any ideas to trace the source of = this problem?

Best,


-- =
Eray Ozkural, PhD candidate.=A0 Comp. Sci. Dept., Bilkent University, A= nkara
http://groups.yahoo= .com/group/ai-philosophy
h= ttp://myspace.com/arizanesil ht= tp://myspace.com/malfunct

--002215048ac740adc7048fbd0433--