caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Segmentation fault when using OcamlMPI
@ 2014-04-25 19:55 Étienne André
  2014-04-26 12:21 ` Étienne André
  0 siblings, 1 reply; 2+ messages in thread
From: Étienne André @ 2014-04-25 19:55 UTC (permalink / raw)
  To: caml-list

Dear all,

I'm trying with a colleague to distribute a verification tool using
OcamlMPI.
Unfortunately, we encounter segmentation faults "sometimes".
Sometimes means still often enough to have the tool crash almost always
at some point.

We don't understand at all what is happening.
We thought that the MPI read function ("Mpi.receive source_rank") would
wait until there is something to read, but maybe we misunderstood that.
The precise command we use to receive info is as follows:

let res = Mpi.receive source_rank (int_of_slave_tag Slave_result_tag)
Mpi.comm_world

where int_of_slave_tag Slave_result_tag is our own function returning
some predefined integer.

Are there any risks of conflicts when several nodes (with different
source_rank, though) send something to one node?
For information, we use a master-workers scheme, with one master
centralizing results computed by workers.
For information too, we first send (and receive) the size of the data,
and then the actual data (although, for some strange reason, we do not
use the size when receiving the data; maybe we should?!).

I put a zip on my Web page with a simplified source (a single .ml file,
with _oasis and the command to launch) that is enough to show the bug.
http://lipn.univ-paris13.fr/~andre/PaTATOR.zip

Thank you for your feedback!

-- 
Étienne André
Université Paris 13, Sorbonne Paris Cité
http://lipn.univ-paris13.fr/~andre 


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Caml-list] Segmentation fault when using OcamlMPI
  2014-04-25 19:55 [Caml-list] Segmentation fault when using OcamlMPI Étienne André
@ 2014-04-26 12:21 ` Étienne André
  0 siblings, 0 replies; 2+ messages in thread
From: Étienne André @ 2014-04-26 12:21 UTC (permalink / raw)
  To: caml-list

Dear all,

In the end, it seems we found a bug (or, at least, a very strange issue)
in OcamlMPI.
And we (well, my colleague) found a way to go around the issue.

In short, the node needs to explicitly send its node number in the first
communication to the master.
(Details available on demand.)

We informed the developers.
So the case is closed (for us!).

Best,

-- 
Étienne André
Université Paris 13, Sorbonne Paris Cité
http://lipn.univ-paris13.fr/~andre 

Le 25/04/2014 21:55, Étienne André a écrit :
> Dear all,
>
> I'm trying with a colleague to distribute a verification tool using
> OcamlMPI.
> Unfortunately, we encounter segmentation faults "sometimes".
> Sometimes means still often enough to have the tool crash almost always
> at some point.
>
> We don't understand at all what is happening.
> We thought that the MPI read function ("Mpi.receive source_rank") would
> wait until there is something to read, but maybe we misunderstood that.
> The precise command we use to receive info is as follows:
>
> let res = Mpi.receive source_rank (int_of_slave_tag Slave_result_tag)
> Mpi.comm_world
>
> where int_of_slave_tag Slave_result_tag is our own function returning
> some predefined integer.
>
> Are there any risks of conflicts when several nodes (with different
> source_rank, though) send something to one node?
> For information, we use a master-workers scheme, with one master
> centralizing results computed by workers.
> For information too, we first send (and receive) the size of the data,
> and then the actual data (although, for some strange reason, we do not
> use the size when receiving the data; maybe we should?!).
>
> I put a zip on my Web page with a simplified source (a single .ml file,
> with _oasis and the command to launch) that is enough to show the bug.
> http://lipn.univ-paris13.fr/~andre/PaTATOR.zip
>
> Thank you for your feedback!
>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-04-26 12:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-25 19:55 [Caml-list] Segmentation fault when using OcamlMPI Étienne André
2014-04-26 12:21 ` Étienne André

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).