From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 4BA457F726 for ; Sat, 26 Apr 2014 14:21:05 +0200 (CEST) Received-SPF: SoftFail (mail2-smtp-roc.national.inria.fr: domain of etienne.andre@univ-paris13.fr is inclined to not designate 194.254.163.15 as permitted sender) identity=pra; client-ip=194.254.163.15; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="etienne.andre@univ-paris13.fr"; x-sender="etienne.andre@univ-paris13.fr"; x-conformance=sidf_compatible; x-record-type="spf2.0" Received-SPF: SoftFail (mail2-smtp-roc.national.inria.fr: domain of etienne.andre@univ-paris13.fr is inclined to not designate 194.254.163.15 as permitted sender) identity=mailfrom; client-ip=194.254.163.15; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="etienne.andre@univ-paris13.fr"; x-sender="etienne.andre@univ-paris13.fr"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail.lipn.univ-paris13.fr) identity=helo; client-ip=194.254.163.15; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="etienne.andre@univ-paris13.fr"; x-sender="postmaster@mail.lipn.univ-paris13.fr"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar8BAFCjW1PC/qMPl2dsb2JhbABZFoM/xS+BIQ4BAQEBAQgWBzyCJgEFMgFWCyEWDwkDAgECAUUTCAEBiEEJynUXjmAWhCMEmkaFI4YkiRA X-IPAS-Result: Ar8BAFCjW1PC/qMPl2dsb2JhbABZFoM/xS+BIQ4BAQEBAQgWBzyCJgEFMgFWCyEWDwkDAgECAUUTCAEBiEEJynUXjmAWhCMEmkaFI4YkiRA X-IronPort-AV: E=Sophos;i="4.97,933,1389740400"; d="scan'208";a="70450320" Received: from gw.lipn.univ-paris13.fr (HELO mail.lipn.univ-paris13.fr) ([194.254.163.15]) by mail2-smtp-roc.national.inria.fr with ESMTP; 26 Apr 2014 14:21:04 +0200 Received: from [192.168.1.95] (l9.lns-se1200-ld-01-t2-31-35-233-229.dsl.dyn.abo.bbox.fr [31.35.233.229]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: andre@lipn.univ-paris13.fr) by mail.lipn.univ-paris13.fr (Postfix) with ESMTPSA id 2A80A26182E for ; Sat, 26 Apr 2014 14:21:04 +0200 (CEST) Message-ID: <535BA4AF.5090508@univ-paris13.fr> Date: Sat, 26 Apr 2014 14:21:03 +0200 From: =?ISO-8859-1?Q?=C9tienne_Andr=E9?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: caml-list@inria.fr References: <535ABDC2.4080501@univ-paris13.fr> In-Reply-To: <535ABDC2.4080501@univ-paris13.fr> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Subject: Re: [Caml-list] Segmentation fault when using OcamlMPI Dear all, In the end, it seems we found a bug (or, at least, a very strange issue) in OcamlMPI. And we (well, my colleague) found a way to go around the issue. In short, the node needs to explicitly send its node number in the first communication to the master. (Details available on demand.) We informed the developers. So the case is closed (for us!). Best, -- Étienne André Université Paris 13, Sorbonne Paris Cité http://lipn.univ-paris13.fr/~andre Le 25/04/2014 21:55, Étienne André a écrit : > Dear all, > > I'm trying with a colleague to distribute a verification tool using > OcamlMPI. > Unfortunately, we encounter segmentation faults "sometimes". > Sometimes means still often enough to have the tool crash almost always > at some point. > > We don't understand at all what is happening. > We thought that the MPI read function ("Mpi.receive source_rank") would > wait until there is something to read, but maybe we misunderstood that. > The precise command we use to receive info is as follows: > > let res = Mpi.receive source_rank (int_of_slave_tag Slave_result_tag) > Mpi.comm_world > > where int_of_slave_tag Slave_result_tag is our own function returning > some predefined integer. > > Are there any risks of conflicts when several nodes (with different > source_rank, though) send something to one node? > For information, we use a master-workers scheme, with one master > centralizing results computed by workers. > For information too, we first send (and receive) the size of the data, > and then the actual data (although, for some strange reason, we do not > use the size when receiving the data; maybe we should?!). > > I put a zip on my Web page with a simplified source (a single .ml file, > with _oasis and the command to launch) that is enough to show the bug. > http://lipn.univ-paris13.fr/~andre/PaTATOR.zip > > Thank you for your feedback! >