caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Mark Shinwell <mshinwell@janestreet.com>
To: Eray Ozkural <examachine@gmail.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Re: Possible ocamlmpi finalization error?
Date: Wed, 8 Sep 2010 11:58:39 +0100	[thread overview]
Message-ID: <20100908105839.GM25074@janestreet.com> (raw)
In-Reply-To: <AANLkTi=4qY8ES6GFMv0WQ0=r-w0fatr8Rh4Ev1NzdUYQ@mail.gmail.com>

On Wed, Sep 08, 2010 at 01:31:29PM +0300, Eray Ozkural wrote:
> On Wed, Sep 8, 2010 at 10:51 AM, Sylvain Le Gall <sylvain@le-gall.net>wrote:
> 
> > On 08-09-2010, Eray Ozkural <examachine@gmail.com> wrote:
> > > I'm recently getting errors that are past MPI_Finalize. Since both
> > > init/final and communicator allocation is managed by ocamlmpi, is it
> > > possible this is a bug with the library? Have you ever seen something
> > like this?
> > >
> > > Using openmpi on OS X. Here is the log message:
> > >
> > > *** An error occurred in MPI_Comm_free
> > > *** after MPI was finalized
> > > *** MPI_ERRORS_ARE_FATAL (goodbye)
> > >
> > > In the code I'm using both point-to-point and collective communication,
> > and
> > > as far as I know the code is correct. Could this be due to memory
> > > corruption, or should this never happen?
> > >
> >
> > Maybe, you can give a minimal code to reproduce this error?
> >
> Hmm, not really its a complex code but I just ran the debug version in
> parallel with exactly the same parameters and there is absolutely no problem
> with that. All communication is synchronous so timing cannot be an issue
> (since the debug build is naturally slower). AFAICT it's not a memory
> problem because no bound errors are reported in the debug build (was it on
> by default?). I think it's a lower-level problem than  my code. This could
> happen if some of that resource allocation is done in different threads, for
> instance.
> 
> Can you give me any ideas to trace the source of this problem?

I know nothing about MPI, but here are some general ideas:

- Read and/or instrument the MPI source to find out more information than
"An error"...

- Wrap the MPI_Comm_free function and use printf() to display the arguments
(and maybe thread IDs).  This might catch situations such as attempting to
free something twice.  If MPI_Comm_free isn't called often, gdb is another
thing to try.

- If you have a function that isn't re-entrant and you worry that it might be
being called in a re-entrant manner, maybe you could catch that by wrapping
it in another function.  In this other function, protect the function call to
the original function by a mutex, and lock it using pthread_mutex_trylock().
If that function tells you the mutex is already locked, then use signal() to
send SIGSTOP to your own pid.  If this triggers (which can be seen by looking
at the state of the process using "ps"), gdb can be attached to the program and
you should be able to see what tried to call the function before a previous
call had finished.  If you are lucky, you might even see where the previous
call had come from.

Mark


      reply	other threads:[~2010-09-08 10:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-08  5:21 Eray Ozkural
2010-09-08  7:51 ` Sylvain Le Gall
2010-09-08 10:31   ` [Caml-list] " Eray Ozkural
2010-09-08 10:58     ` Mark Shinwell [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100908105839.GM25074@janestreet.com \
    --to=mshinwell@janestreet.com \
    --cc=caml-list@inria.fr \
    --cc=examachine@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).