From mboxrd@z Thu Jan 1 00:00:00 1970 To: 9fans@cse.psu.edu From: Vincent Schut Date: Wed, 4 Mar 2009 12:15:50 +0100 Message-ID: References: <138575260903030352s623807d7p5a3075b1f7a591f6@mail.gmail.com> <4f34febc0903030847t9aedad9haf4355e74953e6a3@mail.gmail.com> <138575260903040158r3ebc4e76haa5a328d2840bd5f@mail.gmail.com> <138575260903040245w3e8ede69t42d91f290ff82523@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit User-Agent: Thunderbird 2.0.0.19 (X11/20090213) In-Reply-To: <138575260903040245w3e8ede69t42d91f290ff82523@mail.gmail.com> Subject: Re: [9fans] threads vs forks Topicbox-Message-UUID: af9ad990-ead4-11e9-9d60-3106f5b1d025 hugo rivera wrote: > The cluster has torque installed as the resource manager. I think it > runs of top of pbs (an older project). > As far as I know now I just have to call a qsub command to submit my > jobs on a queue, then the resource manager allocates a processor in > the cluster for my process to run till is finished. Well, I don't know torque neither pbs, but I'm guessing that when you submit a job, this job will be some program or script that is run on the allocated processor? If so, your initial question of forking vs threading is bogus. Your cluster manager will run (exec) your job, which if it is a python script will start a python interpreter for each job. I guess that's the overhead you get when running a flexible cluster system, flexible meaning that it can run any type of job (shell script, binary executable, python script, perl, etc.). However, your overhead of starting new python processes each time may seem significant when viewed in absolute terms, but if each job processes lots of data and takes, as you said, 5 min to run on a decent processor, don't you think the startup time for the python process would become non-significant? For example, on a decent machine here, the first time python takes 0.224 secs to start and shutdown immediately, and consequetive starts take only about 0.009 secs because everything is still in memory. Let's take the 0.224 secs for a worst case scenario. That would be approx 0.075 percent of your job execution time. Now lets say you have 6 machines with 8 cores each and perfect scaling, all your jobs would take 6000 / (6*8) *5min = 625 minutes (10 hours 25 mins) without python starting each time, and 625 minutes and 28 seconds with python starting anew each job. Don't you think you could just live with these 28 seconds more? Just reading this message might already have taken you more than those 28 seconds... Vincent. > And I am not really sure if I have access to all the nodes, so I can > install pp on each one of them. > > 2009/3/4, Vincent Schut : >> hugo rivera wrote: >> >>> Thanks for the advice. >>> Nevertheless I am in no position to decide what pieces of software the >>> cluster will run, I just have to deal with what I have, but anyway I >>> can suggest other possibilities. >>> >> Well, depends on how you define 'software the cluster will run'. Do you >> mean cluster management software, or really any program or script or python >> module that needs to be installed on each node? Because for pp, you won't >> need any cluster software. pp is just some python module and helper scripts. >> You *do* need to install this (pure python) module on each node, yes, but >> that's it, nothing else needed. >> Btw, you said 'it's a small cluster, about 6 machines'. Now I'm not an >> expert, but I don't think you can do threading/forking from one machine to >> another (on linux). So I suppose there already is some cluster management >> software involved? And while you appear to be "in no position to decide what >> pieces of software the cluster will run", you might want to enlighten us on >> what this cluster /will/ run? Your best solution might depend on that... >> >> Cheers, >> Vincent. >> >> >> > >