* Multi-core loops
@ 2009-06-28 10:31 Nadav Har'El
2009-06-28 10:45 ` Nadav Har'El
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Nadav Har'El @ 2009-06-28 10:31 UTC (permalink / raw)
To: zsh-users
Hi, I've been a very happy user of Zsh for the last 18 years (!).
Regretfully I haven't been on this list for many years, and now I have
resubscribed to propose a simple, but I think useful, feature for zsh.
Zsh, like all shells, lets you easily do something many times in a loop.
E.g.,
for i in ...
do
dosomething $i
done
But when "dosomething" is CPU intensive, this is *not* what you'd want to
do on a multi-core (multi CPU) machine, which have more-or-less become
standard nowadays...
Such a loop would only use one of the CPUs, and leave the other(s) unused.
Instead, you'll want to keep all CPUs busy all the time, running M (=number
of CPUs) processes at the same time.
This idea has been raised before on this list by others - one thread I found
dates back to 10 years ago,
http://www.zsh.org/mla/users/1999/msg00644.html
and another one from 7 years ago
http://www.zsh.org/mla/users/2002/msg00117.html
But at the time, I guess that the whole concept of multi-CPU machines sounded
esoteric. This is no longer the case, most people nowadays have multi-CPU
machines, and probably run into this issue often. I know I do. So I believe
zsh should make it easy to handle this useful case easily.
The first thread I cited suggested adding a loop new syntax, e.g.
for i in * PARALLEL N ; do job $i ; done
I think this is a very interesting idea (not necessarily with that syntax),
and I think among all the other options I'll mention below, this is probably
the best one. However, I fear that it may be harder for the developers to
accept than the other options below because it involves new syntax and
possibly quite a bit of code (because of all the different types of loops that
are involved). I wonder what other people think - are we ready for a new
syntax for this multi-process loop feature?
If there is a chance that this option will be accepted, I will be happy to
volunteer to write a patch.
The second option, suggested in both threads, requires the user to write more
code, along the lines of this pseudo-code:
for i in ...
do
if ((number_of_jobs >= number_of_processors ))
then
wait any_job
fi
command &
done
The problem with this is that "wait" currently has no way to ask to wait
for just one job - it can either wait for a specific job, or *all* jobs to
finish. I wonder if there is a reason not to add such a feature?
Because the lack of such a "wait for any job" feature, Bart Schaefer
suggested in the first thread an eleborate technique involving a coprocess
to do something similar.
A somewhat similar option I'd like to propose is to add a builtin, or better
yet, a new option for the existing builtin "jobs". "jobs -w 4" will wait
until there are 4 or less jobs in the job-control list. Then the 4-cpu loop
is as easy as writing:
for i in ...
do
jobs -w 4
dosomething $i &
done
Another possibility I wanted to raise is adding a new parameter, say
MAXBACKGROUND; If that parameter is set to 4, then any time you run a
"command &" when there are already 4 jobs in the job-control list,
instead of forking immediately zsh first waits for one of the previous jobs
to finish, and only then runs the command line. With this parameter set,
the multi-CPU loop becomes trivial:
for i in ...
do
dosomething $i &
done
Any thoughts?
Thanks,
Nadav.
--
Nadav Har'El | Sunday, Jun 28 2009, 6 Tammuz 5769
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |In Fortran, God is real unless declared
http://nadav.harel.org.il |an integer.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-06-28 10:31 Multi-core loops Nadav Har'El
@ 2009-06-28 10:45 ` Nadav Har'El
2009-06-28 13:25 ` Nadav Har'El
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Nadav Har'El @ 2009-06-28 10:45 UTC (permalink / raw)
To: zsh-users
On Sun, Jun 28, 2009, Nadav Har'El wrote about "Multi-core loops":
>..
> The first thread I cited suggested adding a loop new syntax, e.g.
>
> for i in * PARALLEL N ; do job $i ; done
>..
I tried to look around if perhaps other shells implemented a similar feature,
and what syntax they had used.
I found one implementation in a shell called psh, the parallel shell, also
known as (amusingly :-)) "the Pourne Shell". The syntax they used is this:
# Do 4 jobs concurrently
set -j4
# same as a for loop, but pays attention to the j flag.
pfor i in 3 5 11 21 8 9 13 7
do
...
done
The only trace I can find of this shell on the Internet, however, is this
page in Google's cache:
http://209.85.129.132/search?q=cache:YhHuBOkhhK4J:bleu.west.spy.net/~dustin/projects/psh.xtp+%22pourne+shell%22&cd=3&hl=en&ct=clnk
Nadav.
--
Nadav Har'El | Sunday, Jun 28 2009, 6 Tammuz 5769
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Tact: The ability to describe others as
http://nadav.harel.org.il |they see themselves. - Abraham Lincoln
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-06-28 10:31 Multi-core loops Nadav Har'El
2009-06-28 10:45 ` Nadav Har'El
@ 2009-06-28 13:25 ` Nadav Har'El
2009-06-28 17:58 ` Christopher Browne
2009-07-01 16:09 ` Wayne Davison
2009-11-22 21:05 ` Kazuo Teramoto
3 siblings, 1 reply; 12+ messages in thread
From: Nadav Har'El @ 2009-06-28 13:25 UTC (permalink / raw)
To: zsh-users
On Sun, Jun 28, 2009, Nadav Har'El wrote about "Multi-core loops":
>...
> for i in ...
> do
> dosomething $i
> done
>
> But when "dosomething" is CPU intensive, this is *not* what you'd want to
> do on a multi-core (multi CPU) machine, which have more-or-less become
> standard nowadays...
> Such a loop would only use one of the CPUs, and leave the other(s) unused.
> Instead, you'll want to keep all CPUs busy all the time, running M (=number
> of CPUs) processes at the same time.
>...
Sorry for replying to my own emails, but I just remembered another point.
The parallel loop I propose to add is not only relevant to SMP machines
and CPU-intensive tasks.
It is also important, sometimes even more important, for some types of
non-CPU intensive tasks. For example, consider a loop like this for
fetching the content of a list of URLs:
cat urllist | while read url
do
wget $url
done
The computer will be idle most of the time, as wget will most of the time
just wait for responses from the network. If I could easily tell the loop
to run 10 wgets at a time, this would (in most cases) boost the performance
of this loop almost 10-fold! And this is true even if you have just one CPU.
Searching the web, I found that Chapter 14 "Throttling Parallel Processes"
of Ron Peters' "Shell Script Pearls" is about a similar use case, and his
solution is a very eleborate one using a variant of the coprocess-based
solution in the thread I mentioned earlier from the zsh list 10 years ago.
So I think there is definitely a need for such a parallel loop feature.
--
Nadav Har'El | Sunday, Jun 28 2009, 6 Tammuz 5769
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Creativity consists of coming up with
http://nadav.harel.org.il |many ideas, not just that one great idea.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-06-28 13:25 ` Nadav Har'El
@ 2009-06-28 17:58 ` Christopher Browne
0 siblings, 0 replies; 12+ messages in thread
From: Christopher Browne @ 2009-06-28 17:58 UTC (permalink / raw)
To: Nadav Har'El; +Cc: zsh-users
On Sun, Jun 28, 2009 at 9:25 AM, Nadav Har'El<nyh@math.technion.ac.il> wrote:
> So I think there is definitely a need for such a parallel loop feature.
I totally agree...
There are lots of cases where you don't want to "flood" a destination
with too many concurrent requests, where you might have hundreds or
thousands of items to request, and where
- serializing 1 at a time is too few
- spawning 1000 at a time is too many
- keeping 10 processes busy might be about right
CPU's not the only reason to parallelize; other reasons include:
- network traffic (3 web servers, hence 3 requests at a time)
- I/O (disk array with 12 drives, can support 6 copies at a time)
- DBMS supports 8 connections at a time for your favorite user
--
http://linuxfinances.info/info/linuxdistributions.html
Alfred Hitchcock - "Television has brought back murder into the home
- where it belongs." -
http://www.brainyquote.com/quotes/authors/a/alfred_hitchcock.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-06-28 10:31 Multi-core loops Nadav Har'El
2009-06-28 10:45 ` Nadav Har'El
2009-06-28 13:25 ` Nadav Har'El
@ 2009-07-01 16:09 ` Wayne Davison
2009-11-22 21:05 ` Kazuo Teramoto
3 siblings, 0 replies; 12+ messages in thread
From: Wayne Davison @ 2009-07-01 16:09 UTC (permalink / raw)
To: Nadav Har'El; +Cc: zsh-users
On Sun, Jun 28, 2009 at 01:31:29PM +0300, Nadav Har'El wrote:
> Instead, you'll want to keep all CPUs busy all the time, running M
> (=number of CPUs) processes at the same time.
Perl has a nice module for this. Perhaps we can steal ideas from this:
use Parallel::ForkManager;
my $pm = new Parallel::ForkManager(5); # limit to 5 parallel processes
foreach my $var (@array) {
# can do non-parallel-ized work here, as needed
...
$pm->start and next; # do the fork, parent skips to next loop
# use any perl commands you like here.
...
$pm->finish; # do the exit in the child process
}
pm->wait_all_children;
If we had something similar for zsh (perhaps using a loadable module),
it would make it easy to parallelize a sequence of shell commands
without having to use '&'.
..wayne..
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-06-28 10:31 Multi-core loops Nadav Har'El
` (2 preceding siblings ...)
2009-07-01 16:09 ` Wayne Davison
@ 2009-11-22 21:05 ` Kazuo Teramoto
2009-11-22 22:16 ` Nadav Har'El
` (2 more replies)
3 siblings, 3 replies; 12+ messages in thread
From: Kazuo Teramoto @ 2009-11-22 21:05 UTC (permalink / raw)
To: zsh-users
On Sun, Jun 28, 2009 at 8:31 AM, Nadav Har'El <nyh@math.technion.ac.il> wrote:
> Zsh, like all shells, lets you easily do something many times in a loop.
> E.g.,
>
> for i in ...
> do
> dosomething $i
> done
>
> But when "dosomething" is CPU intensive, this is *not* what you'd want to
> do on a multi-core (multi CPU) machine, which have more-or-less become
> standard nowadays...
> Such a loop would only use one of the CPUs, and leave the other(s) unused.
> Instead, you'll want to keep all CPUs busy all the time, running M (=number
> of CPUs) processes at the same time.
Any update on this?
I'm searching for a solution. Perhaps this can't be done as a built-in
syntax but what about a more complex solution. I'm a noob (and with
the numbers of features of zsh, I'm gonna be a noob forever), and cant
find a small, beautiful, zsh-is-so-cool-ish solution for it and dont
know how to effective implement it e.g., without using python, I like
some zsh only solution.
How people solved this problem any home-made solution for this, any tip?
Thanks!
--
«Dans la vie, rien n'est à craindre, tout est à comprendre»
Marie Sklodowska Curie.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-11-22 21:05 ` Kazuo Teramoto
@ 2009-11-22 22:16 ` Nadav Har'El
2009-11-22 22:40 ` Mikael Magnusson
2009-11-23 6:18 ` Rakotomandimby Mihamina
2009-11-24 18:57 ` Bart Schaefer
2 siblings, 1 reply; 12+ messages in thread
From: Nadav Har'El @ 2009-11-22 22:16 UTC (permalink / raw)
To: Kazuo Teramoto; +Cc: zsh-users
On Sun, Nov 22, 2009, Kazuo Teramoto wrote about "Re: Multi-core loops":
> On Sun, Jun 28, 2009 at 8:31 AM, Nadav Har'El <nyh@math.technion.ac.il> wrote:
> > Zsh, like all shells, lets you easily do something many times in a loop.
>...
> > But when "dosomething" is CPU intensive, this is *not* what you'd want to
> > do on a multi-core (multi CPU) machine, which have more-or-less become
> > standard nowadays...
> > Such a loop would only use one of the CPUs, and leave the other(s) unused.
> > Instead, you'll want to keep all CPUs busy all the time, running M (=number
> > of CPUs) processes at the same time.
>
> Any update on this?
>
> I'm searching for a solution. Perhaps this can't be done as a built-in
> syntax but what about a more complex solution. I'm a noob (and with
> the numbers of features of zsh, I'm gonna be a noob forever), and cant
> find a small, beautiful, zsh-is-so-cool-ish solution for it and dont
> know how to effective implement it e.g., without using python, I like
> some zsh only solution.
Unfortunately, no.
I am running into this need very often - be it a small script to convert a
bunch of media files, or a script to download a bunch of web pages, and so
on, and always need to come up with some ugly half-working solution.
I am still really surprised that no shell (that I know) comes with a
convenient built-in syntax to do such loops.
At one point I decided to go ahead an modify zsh myself. After some
deliberation with myself, I came up with the following syntax:
for i in 1 2 3
parallel 2 do
echo $i
sleep 3
done
I.e., one adds the keyword "parallel" and the number of concurrent processes
just before the "do/done" block. This syntax makes it easy to parrelize
all kinds of for loops (C-like, csh-like, bourne-shell-like, etc.) in
one syntax.
What this should have done is to run the do/done block in the background
(like with a &), and additionally block while 2 of these are already running,
waiting until one of them stopped (we know when one stops because the shell
has a SIGCHLD interrupt handler).
Unfortunately, I found understanding the zsh parser much harder than I
had originally anticipated. I managed to add the "parallel" syntax, but
was not able (in the several hours I investing in trying to understand)
how to generate the correct "instructions" (of the zsh virtual machine)
to put the do/done block in the background, for example. All the tricks
I tried with WCB_LIST, WCB_SUBLIST, WCB_PIPE, set_list_code, WCB_END and
other strange things I tried, stopped short of actually working.
And even if I had managed to pull that off, I still had some more missing
pieces, like keeping a list of process ids of these backgrounded processes,
recognizing (in the interrupt handler) when they're gone, and waiting until
one of them is gone.
> How people solved this problem any home-made solution for this, any tip?
By the way, in my original thread I mentioned a post from 10 years ago (!)
which suggested an eleborate trick to do what I was after:
http://www.zsh.org/mla/users/1999/msg00644.html
Have you tried this?
--
Nadav Har'El | Sunday, Nov 22 2009, 6 Kislev 5770
nyh@math.technion.ac.il |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Seen on the back of a dump truck:
http://nadav.harel.org.il |<---PASSING SIDE . . . . . SUICIDE--->
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-11-22 22:16 ` Nadav Har'El
@ 2009-11-22 22:40 ` Mikael Magnusson
2009-11-23 0:34 ` Kazuo Teramoto
0 siblings, 1 reply; 12+ messages in thread
From: Mikael Magnusson @ 2009-11-22 22:40 UTC (permalink / raw)
To: Nadav Har'El; +Cc: Kazuo Teramoto, zsh-users
2009/11/22 Nadav Har'El <nyh@math.technion.ac.il>:
> On Sun, Nov 22, 2009, Kazuo Teramoto wrote about "Re: Multi-core loops":
>> On Sun, Jun 28, 2009 at 8:31 AM, Nadav Har'El <nyh@math.technion.ac.il> wrote:
>> > Zsh, like all shells, lets you easily do something many times in a loop.
>>...
>> > But when "dosomething" is CPU intensive, this is *not* what you'd want to
>> > do on a multi-core (multi CPU) machine, which have more-or-less become
>> > standard nowadays...
>> > Such a loop would only use one of the CPUs, and leave the other(s) unused.
>> > Instead, you'll want to keep all CPUs busy all the time, running M (=number
>> > of CPUs) processes at the same time.
>>
>> Any update on this?
>>
>> I'm searching for a solution. Perhaps this can't be done as a built-in
>> syntax but what about a more complex solution. I'm a noob (and with
>> the numbers of features of zsh, I'm gonna be a noob forever), and cant
>> find a small, beautiful, zsh-is-so-cool-ish solution for it and dont
>> know how to effective implement it e.g., without using python, I like
>> some zsh only solution.
>
> Unfortunately, no.
>
> I am running into this need very often - be it a small script to convert a
> bunch of media files, or a script to download a bunch of web pages, and so
> on, and always need to come up with some ugly half-working solution.
> I am still really surprised that no shell (that I know) comes with a
> convenient built-in syntax to do such loops.
>
> At one point I decided to go ahead an modify zsh myself. After some
> deliberation with myself, I came up with the following syntax:
>
> for i in 1 2 3
> parallel 2 do
> echo $i
> sleep 3
> done
>
> I.e., one adds the keyword "parallel" and the number of concurrent processes
> just before the "do/done" block. This syntax makes it easy to parrelize
> all kinds of for loops (C-like, csh-like, bourne-shell-like, etc.) in
> one syntax.
>
> What this should have done is to run the do/done block in the background
> (like with a &), and additionally block while 2 of these are already running,
> waiting until one of them stopped (we know when one stops because the shell
> has a SIGCHLD interrupt handler).
>
> Unfortunately, I found understanding the zsh parser much harder than I
> had originally anticipated. I managed to add the "parallel" syntax, but
> was not able (in the several hours I investing in trying to understand)
> how to generate the correct "instructions" (of the zsh virtual machine)
> to put the do/done block in the background, for example. All the tricks
> I tried with WCB_LIST, WCB_SUBLIST, WCB_PIPE, set_list_code, WCB_END and
> other strange things I tried, stopped short of actually working.
> And even if I had managed to pull that off, I still had some more missing
> pieces, like keeping a list of process ids of these backgrounded processes,
> recognizing (in the interrupt handler) when they're gone, and waiting until
> one of them is gone.
>
>> How people solved this problem any home-made solution for this, any tip?
>
> By the way, in my original thread I mentioned a post from 10 years ago (!)
> which suggested an eleborate trick to do what I was after:
>
> http://www.zsh.org/mla/users/1999/msg00644.html
>
> Have you tried this?
I came across this script a while ago, it's written in bash and I
haven't tried it, but it claims to run stuff in parallel :). Maybe it
can be zsh-ified to get rid of the warning about commands probably not
working with spaces.
http://sitaramc.googlepages.com/queue.sh
--
Mikael Magnusson
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-11-22 22:40 ` Mikael Magnusson
@ 2009-11-23 0:34 ` Kazuo Teramoto
2009-11-23 0:48 ` Kazuo Teramoto
0 siblings, 1 reply; 12+ messages in thread
From: Kazuo Teramoto @ 2009-11-23 0:34 UTC (permalink / raw)
To: zsh-users
I'm using a workaround in python (this is the smallest "solution" I could get)
--------------------------------------------------------------------
from multiprocessing import Pool
from glob import glob
from subprocess import call
def f(x):
call(['echo','asy', '-f', 'png', '-render=0', x])
pool = Pool(processes=4)
pool.map(f, glob('*.asy'))
--------------------------------------------------------------------
But of course this solution dont integrate in zsh command line...
--
«Dans la vie, rien n'est à craindre, tout est à comprendre»
Marie Sklodowska Curie.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-11-23 0:34 ` Kazuo Teramoto
@ 2009-11-23 0:48 ` Kazuo Teramoto
0 siblings, 0 replies; 12+ messages in thread
From: Kazuo Teramoto @ 2009-11-23 0:48 UTC (permalink / raw)
To: zsh-users
I found a pretty/small solution (but not optimized)
http://stackoverflow.com/questions/38160/parallelize-bash-script
I'm quoting the solution (the solution can be used as-is in zsh too)
> Here an alternative solution that can be inserted into .bashrc and used for
> everyday one liner:
>
> function pwait() {
> while [ $(jobs -p | wc -l) -ge $1 ]; do
> sleep 1
> done
> }
>
> To use it, all one has to do is put & after the jobs and a pwait call, the
> parameter gives the number of parallel processes:
>
> for i in *; do
> do_something $i &
> pwait 10
> done
>
> It would be nicer to use wait instead of busy waiting on the output of jobs
> -p, but there doesn't seem to be an obvious solution to wait till any of the
> given jobs is finished instead of a all of them.
>
--
«Dans la vie, rien n'est à craindre, tout est à comprendre»
Marie Sklodowska Curie.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-11-22 21:05 ` Kazuo Teramoto
2009-11-22 22:16 ` Nadav Har'El
@ 2009-11-23 6:18 ` Rakotomandimby Mihamina
2009-11-24 18:57 ` Bart Schaefer
2 siblings, 0 replies; 12+ messages in thread
From: Rakotomandimby Mihamina @ 2009-11-23 6:18 UTC (permalink / raw)
To: zsh-users
11/23/2009 12:05 AM, Kazuo Teramoto::
>> Zsh, like all shells, lets you easily do something many times in a loop.
>> E.g.,
>> for i in ...
>> do
>> dosomething $i
>> done
>> But when "dosomething" is CPU intensive, this is *not* what you'd want to
>> do on a multi-core (multi CPU) machine, which have more-or-less become
>> standard nowadays...
>> Such a loop would only use one of the CPUs, and leave the other(s) unused.
>> Instead, you'll want to keep all CPUs busy all the time, running M (=number
>> of CPUs) processes at the same time.
>
> Any update on this?
What if "dosomething" has hundreds of childs processes?
It would use all the available cores.
--
Architecte Informatique chez Blueline/Gulfsat:
Administration Systeme, Recherche & Developpement
+261 33 11 207 36
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Multi-core loops
2009-11-22 21:05 ` Kazuo Teramoto
2009-11-22 22:16 ` Nadav Har'El
2009-11-23 6:18 ` Rakotomandimby Mihamina
@ 2009-11-24 18:57 ` Bart Schaefer
2 siblings, 0 replies; 12+ messages in thread
From: Bart Schaefer @ 2009-11-24 18:57 UTC (permalink / raw)
To: Kazuo Teramoto; +Cc: zsh-users
On Sun, Nov 22, 2009 at 1:05 PM, Kazuo Teramoto <kaz.rag@gmail.com> wrote:
>
> Any update on this?
I don't recall why I didn't get involved in this thread back in July,
but has anyone looked at zargs? Specifically the --max-procs option.
Even if you can't use zargs itself, it serves as an example for
managing multiple background jobs in the shell.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-11-24 19:28 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-28 10:31 Multi-core loops Nadav Har'El
2009-06-28 10:45 ` Nadav Har'El
2009-06-28 13:25 ` Nadav Har'El
2009-06-28 17:58 ` Christopher Browne
2009-07-01 16:09 ` Wayne Davison
2009-11-22 21:05 ` Kazuo Teramoto
2009-11-22 22:16 ` Nadav Har'El
2009-11-22 22:40 ` Mikael Magnusson
2009-11-23 0:34 ` Kazuo Teramoto
2009-11-23 0:48 ` Kazuo Teramoto
2009-11-23 6:18 ` Rakotomandimby Mihamina
2009-11-24 18:57 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).