From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
In-Reply-To: <goli5l$ub0$1@ger.gmane.org>
References: <138575260903030352s623807d7p5a3075b1f7a591f6@mail.gmail.com>
	<4f34febc0903030847t9aedad9haf4355e74953e6a3@mail.gmail.com>
	<goli5l$ub0$1@ger.gmane.org>
Date: Wed,  4 Mar 2009 10:58:31 +0100
Message-ID: <138575260903040158r3ebc4e76haa5a328d2840bd5f@mail.gmail.com>
From: hugo rivera <uair00@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] threads vs forks
Topicbox-Message-UUID: af88b328-ead4-11e9-9d60-3106f5b1d025

Thanks for the advice.
Nevertheless I am in no position to decide what pieces of software the
cluster will run, I just have to deal with what I have, but anyway I
can suggest other possibilities.

2009/3/4, Vincent Schut <schut@sarvision.nl>:
> John Barham wrote:
>
> > On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera <uair00@gmail.com> wrote:
> >
> >
> > > I have to launch many tasks running in parallel (~5000) in a
> > > cluster running linux. Each of the task performs some astronomical
> > > calculations and I am not pretty sure if using fork is the best answer
> > > here.
> > > First of all, all the programming is done in python and c...
> > >
> >
> > Take a look at the multiprocessing package
> > (http://docs.python.org/library/multiprocessing.html),
> newly
> > introduced with Python 2.6 and 3.0:
> >
> > "multiprocessing is a package that supports spawning processes using
> > an API similar to the threading module. The multiprocessing package
> > offers both local and remote concurrency, effectively side-stepping
> > the Global Interpreter Lock by using subprocesses instead of threads."
> >
> > It should be a quick and easy way to set up a cluster-wide job
> > processing system (provided all your jobs are driven by Python).
> >
>
>  Better: use parallelpython (www.parallelpython.org). Afaik multiprocessing
> is geared towards multi-core systems (one machine), while pp is also
> suitable for real clusters with more pc's. No special cluster software
> needed. It will start (here's your fork) a (some) python interpreters on
> each node, and then you can submit jobs to those 'workers'. The interpreters
> are kept alive between jobs, so the startup penalty becomes neglectibly when
> the number of jobs is large enough.
>  Using it here to process massive amounts of satellite data, works like a
> charm.
>
>  Vincent.
>
>
> >
> > It also looks like it's been (partially?) back-ported to Python 2.4
> > and 2.5: http://pypi.python.org/pypi/processing.
> >
> >  John
> >
> >
> >
>
>
>


--
Hugo