9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Sam Watkins <sam@nipl.net>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] Barrelfish
Date: Thu, 22 Oct 2009 02:43:23 +1100	[thread overview]
Message-ID: <20091021154323.GA10118@nipl.net> (raw)
In-Reply-To: <4ADD1D76.8050603@maht0x0r.net>

I wrote:
>I calculated roughly that encoding a 2-hour video could be parallelized by a
>factor of perhaps 20 trillion, using pipelining and divide-and-conquer

On Tue, Oct 20, 2009 at 03:16:22AM +0100, matt wrote:
> I know you are using video / audio encoding as an example and there are
> probably datasets that make sense but in this case, what use is it?

I was using it to work out the *maximum* extent to which a common task can be
parallelized.  20-trillion-fold is the answer I came up with.  Someone was
talking about Ahmdal's Law and saying that having large numbers of processors
is not much use because Ahmdal's Law limits their utilization.  I disagree.

In reality 10,000 processing units might be a more sensible number to have than
20 trillion.  If you have ever done H264 video encoding on a PC you would know
that it is very slow, even normal mpeg encoding is barely faster than real time
on a 1Ghz PC.  Few people like having to wait 2 hours for a task to complete.

This whole argument / discussion has come out of nowhere since it appears Ken's
original comment was criticising the normal sort of multi-core systems, and he
is more in favor of other approaches like FPGA.  I fully agree with that.

> You can't watch 2 hours of video per second and you can't write it to disk
> fast enough to empty the pipeline.

If I had a computer with 20 trillion processing units capable of recoding 2
billion hours of video per second, I would have superior storage media and IO
systems to go with it.  The system I described could encode 2 BILLION hours of
video per second, not 2 hours per second.

> You've got to feed in 2 hours of source material - 820Gb per stream, how?

I suppose some sort of parallel bus of wires or optic fibres.  If I have
massively parallel processing I would want massively parallel IO to go with it.
I.e. something like "read data starting from here" -> "here it is streaming one
megabit in parallel down the bus at 1Ghz over 1 million channels"

> Once you have your uncompressed stream, MPEG-2 encoding requires seeking
> through the time dimension with keyframes every n frames and out of order
> macro blocks, so we have to wait for n frames to be composited.  For the best
> quality the datarate is unconstrained on the first processing run and then
> macro blocks best-fitted and re-ordered on the second to match the desired
> output datarate, but again, this is n frames at a time.
>
> Amdahl is punching you in the face every time you say "see, it's easy".

I'm no expert on video encoding but it seems to me you are assuming I would
approach it the conventional stupid serial way.  With massively parallel
processing one could "seek" through the time dimension simply by comparing data
from all time offsets at once in parallel.

Can you give one example of a slow task that you think cannot benefit much from
parallel processing?  video is an extremely obvious example of one that
certainly does benefit from just about as much parallel processing as you can
throw at it, so I'm surprised you would argue about it.  Probably my "20
trillion" upset you or something, it seems you didn't get my point.

It might have been better to consider a simpler example, such as frequency
analysis of audio data to perform pitch correction (for out of tune singers).

I can write a simple shell script using ffmpeg to do h264 video encoding which
would take advantage of perhaps 720 "cores" to encode a two hour video in 10
second chunks with barely any Ahmdal effects, running the encoding over a LAN.
A server should be able to pipe the whole 800Mb input - I am assuming it is
already encoded in xvid or something - over the network in about 10 seconds on
a gigabit (or faster) network.  Each participating computer will receive the
chunk of data it needs.  The encoding would take perhaps 30 seconds for the 10
seconds of video on each of 720 1Ghz computers.  And another 10 seconds to pipe
the data back to the server.  Concatenating the video should take very little
time, although perhaps the mp4 format is not the best for that, I'm not sure.

The entire operation takes 50 seconds as opposed to 6 hours (21600 seconds).
With my 721 computers I achieve a 432 times speed up.  Ahmdal is not sucking up
much there, only a little for transferring data around.  And each computer
could be doing something else while waiting for its chunk of data to arrive,
the total actual utilization can be over 99%.  People do this stuff every day.
Have you heard of a render-farm?

This applies for all Ahmdal arguments - if part of the system is idle due to
serial constraints in the algorithm, it could likely be working on something
else.  Perhaps you have a couple of videos to recode?  Then you can achieve
close to 100% utilization.  The time taken for a single task may be limited by
the method or the hardware, but a batch of several tasks can be achieved close
to N times faster if you have N processors/computers.

I'm not sure why I'm wasting time writing about this, it's obvious anyway.

Sam



  parent reply	other threads:[~2009-10-21 15:43 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <<20091015105328.GA18947@nipl.net>
2009-10-15 13:27 ` erik quanstrom
2009-10-15 13:40   ` Richard Miller
2009-10-16 17:20   ` Sam Watkins
2009-10-16 18:18     ` Latchesar Ionkov
2009-10-19 15:26       ` Sam Watkins
2009-10-19 15:33         ` andrey mirtchovski
2009-10-19 15:50         ` ron minnich
2009-10-16 21:17     ` Jason Catena
2009-10-17 20:58       ` Dave Eckhardt
2009-10-18  2:09         ` Jason Catena
2009-10-18 16:02           ` Dave Eckhardt
2009-10-17 18:45   ` Eris Discordia
2009-10-17 21:07     ` Steve Simon
2009-10-17 21:18       ` Eric Van Hensbergen
2009-10-18  8:48         ` Eris Discordia
2009-10-18  8:44       ` Eris Discordia
2009-10-19 15:57     ` Sam Watkins
2009-10-19 16:03       ` ron minnich
2009-10-19 16:46       ` Russ Cox
2009-10-20  2:16       ` matt
2009-10-20  9:15         ` Steve Simon
2009-10-21 15:43         ` Sam Watkins [this message]
2009-10-21 16:11           ` Russ Cox
2009-10-21 16:37             ` Sam Watkins
2009-10-21 18:01           ` ron minnich
2009-10-28 15:37           ` matt
     [not found]   ` <A90043D02D52B2CBF2804FA4@192.168.1.2>
2009-10-18  0:06     ` ron minnich
2009-10-18  0:54       ` Roman Shaposhnik
     [not found] <<4ADD147A.4090801@maht0x0r.net>
2009-10-20  2:11 ` erik quanstrom
2009-10-20  2:33   ` matt
     [not found] <<20091019182352.GA1688@polynum.com>
2009-10-19 18:48 ` erik quanstrom
     [not found] <<4ADC7439.3060502@maht0x0r.net>
2009-10-19 16:13 ` erik quanstrom
2009-10-19 18:23   ` tlaronde
2009-10-20  1:38   ` matt
2009-10-20  1:58     ` Eris Discordia
2009-10-20  2:17       ` matt
     [not found] <<20091019155738.GB13857@nipl.net>
2009-10-19 16:05 ` erik quanstrom
2009-10-19 16:34   ` Sam Watkins
2009-10-19 17:30     ` ron minnich
2009-10-19 17:57       ` W B Hacker
2009-10-19 18:14       ` David Leimbach
     [not found] <<20091018031508.717CE5B30@mail.bitblocks.com>
2009-10-19 13:44 ` erik quanstrom
2009-10-19 14:36   ` David Leimbach
     [not found] <<d50d7d460910161417w45b5c675p8740315aaf6861f@mail.gmail.com>
2009-10-16 22:25 ` erik quanstrom
     [not found] <<20091016172030.GB3135@nipl.net>
2009-10-16 18:34 ` erik quanstrom
     [not found] <<3e1162e60910150805q2ea3f682w688299a39274051c@mail.gmail.com>
2009-10-15 15:28 ` erik quanstrom
     [not found] <<4AD70EE9.1010208@conducive.org>
2009-10-15 13:52 ` erik quanstrom
     [not found] <<207092dc429fe476c2046d537aeaa400@hamnavoe.com>
2009-10-15 13:52 ` erik quanstrom
2009-10-15 15:07   ` David Leimbach
2009-10-15 15:21     ` roger peppe
2009-10-16 17:21       ` Sam Watkins
2009-10-16 23:39         ` Nick LaForge
2009-10-18  1:12         ` Roman Shaposhnik
2009-10-19 14:14           ` matt
2009-10-19 16:00           ` Sam Watkins
2009-10-14 19:09 Tim Newsham
2009-10-14 19:54 ` Roman Shaposhnik
2009-10-14 21:21   ` Tim Newsham
2009-10-14 21:33     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2009-10-14 21:42       ` Noah Evans
2009-10-14 21:45         ` erik quanstrom
2009-10-14 21:57           ` Noah Evans
2009-10-14 22:10         ` Eric Van Hensbergen
2009-10-14 22:21           ` Noah Evans
2009-10-15  1:03     ` David Leimbach
2009-10-15  1:50     ` Roman Shaposhnik
2009-10-15  2:12       ` Eric Van Hensbergen
2009-10-15 10:53       ` Sam Watkins
2009-10-15 11:50         ` Richard Miller
2009-10-15 12:00           ` W B Hacker
2009-10-16 17:03           ` Sam Watkins
2009-10-16 18:17             ` ron minnich
2009-10-16 18:39               ` Wes Kussmaul
2009-10-17 12:42             ` Roman Shaposhnik
2009-10-15 11:56         ` Josh Wood
2009-10-15 13:11         ` hiro
2009-10-15 15:05           ` David Leimbach
2009-10-18  1:15         ` Roman Shaposhnik
2009-10-18  3:15           ` Bakul Shah
     [not found]             ` <e763acc10910180606q1312ff7cw9a465d6af39c0fbe@mail.gmail.com>
2009-10-18 13:22               ` Roman Shaposhnik
2009-10-18 19:18                 ` Bakul Shah
2009-10-18 20:12                   ` ron minnich
2009-10-14 21:36   ` Eric Van Hensbergen
2009-10-15  2:05     ` Roman Shaposhnik
2009-10-15  2:17       ` Eric Van Hensbergen
2009-10-15  3:32         ` Tim Newsham
2009-10-15  3:59           ` Eric Van Hensbergen
2009-10-15 17:39             ` Tim Newsham
2009-10-15 18:28 ` Christopher Nielsen
2009-10-15 18:55   ` W B Hacker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091021154323.GA10118@nipl.net \
    --to=sam@nipl.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).