From mboxrd@z Thu Jan 1 00:00:00 1970 To: 9fans@cse.psu.edu From: Vincent Schut Date: Wed, 4 Mar 2009 10:37:56 +0100 Message-ID: References: <138575260903030352s623807d7p5a3075b1f7a591f6@mail.gmail.com> <4f34febc0903030847t9aedad9haf4355e74953e6a3@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit User-Agent: Thunderbird 2.0.0.19 (X11/20090213) In-Reply-To: <4f34febc0903030847t9aedad9haf4355e74953e6a3@mail.gmail.com> Subject: Re: [9fans] threads vs forks Topicbox-Message-UUID: af7cc680-ead4-11e9-9d60-3106f5b1d025 John Barham wrote: > On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera wrote: > >> I have to launch many tasks running in parallel (~5000) in a >> cluster running linux. Each of the task performs some astronomical >> calculations and I am not pretty sure if using fork is the best answer >> here. >> First of all, all the programming is done in python and c... > > Take a look at the multiprocessing package > (http://docs.python.org/library/multiprocessing.html), newly > introduced with Python 2.6 and 3.0: > > "multiprocessing is a package that supports spawning processes using > an API similar to the threading module. The multiprocessing package > offers both local and remote concurrency, effectively side-stepping > the Global Interpreter Lock by using subprocesses instead of threads." > > It should be a quick and easy way to set up a cluster-wide job > processing system (provided all your jobs are driven by Python). Better: use parallelpython (www.parallelpython.org). Afaik multiprocessing is geared towards multi-core systems (one machine), while pp is also suitable for real clusters with more pc's. No special cluster software needed. It will start (here's your fork) a (some) python interpreters on each node, and then you can submit jobs to those 'workers'. The interpreters are kept alive between jobs, so the startup penalty becomes neglectibly when the number of jobs is large enough. Using it here to process massive amounts of satellite data, works like a charm. Vincent. > > It also looks like it's been (partially?) back-ported to Python 2.4 > and 2.5: http://pypi.python.org/pypi/processing. > > John > >