The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: "Steve Johnson" <scj@yaccman.com>
To: "Nelson H. F. Beebe" <beebe@math.utah.edu>, tuhs@minnie.tuhs.org
Subject: Re: [TUHS] PDP-11 legacy, C, and modern architectures
Date: Wed, 27 Jun 2018 09:00:16 -0700	[thread overview]
Message-ID: <af780f9fb5c14e37f12ce5c2a4e40376669c730f@webmail.yaccman.com> (raw)
In-Reply-To: <CMM.0.96.0.1530035664.beebe@gamma.math.utah.edu>

[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]


I agree that C is a bad language for parallelism, and, like it or not,
that's what today's hardware is giving us -- not speed, but many
independent processors.  But I'd argue that its problem isn't that it
is not low-level, but that it is not high-level enough.  A language
like MATLAB, whose basic data object is an N-diemsional tensor, can
make impressive use of parallel hardware.

Consider matrix multiplication.   Multiplying two NxN arrays to get
another NxN array is a classic data-parallel problem -- each value in
the result matrix is completely independent of every other one -- in
theory, we could dedicate a processor to each output element, and
would not need any cache coherency or locking mechanism -- just let
them go at it -- the trickiest part is deciding you are finished.

The reason we know we are data parallel is not because of any feature
of the language -- it's because of the mathematical structure of the
problem.  While it's easy to write a matrix multiply function in C
(as it is in most languages), just the fact that the arguments are
pointers is enough to make data parallelism invisible from within the
function.  You can bolt on additional features that, in effect, tell
the compiler it should treat the inputs as independent and
non-overlapping, but this is just the tip of the iceberg -- real
parallel problems see this in spaces.  

The other hardware factor that comes into play is that hardware,
especially memories, have physical limits in what they can do.  So
the "ideal" matrix multiply with a processor for each output element
would suffer because many of the processors would be trying to read
the same memory at the same time.  Some would be bound to fail,
requiring the ability to stack requests and restart them, as well as
pause the processor until the data was available.   (note that, in
this and many other cases, we don't need cache coherency because the
input data is not changing while we are using it).  The obvious way
around this is to divide the memory in to many small memories that are
close to the processors, so memory access is not the bottleneck.

And this is where C (and Python) fall shortest.  The idea that there
is one memory space of semi-infinite size, and all pointers point into
it and all variables live in it almost forces attempts at parallelism
to be expensive and performance-killing.  And yet, because of C's
limited, "low-level" approach to data, we are stuck.  Being able to
declare that something is a tensor that will be unchanging when used,
can be distributed across many small memories to prevent data
bottlenecks when reading and writing, and changed only in limited and
controlled ways is the key to unlocking serious performance.

Steve

PS: for some further thoughts, see
https://wavecomp.ai/blog/auto-hardware-and-ai



[-- Attachment #2: Type: text/html, Size: 3018 bytes --]

  parent reply	other threads:[~2018-06-28  2:41 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-26 17:54 Nelson H. F. Beebe
2018-06-26 18:03 ` Cornelius Keck
2018-06-26 21:21   ` Nelson H. F. Beebe
2018-06-26 21:56   ` Kurt H Maier
2018-06-26 18:52 ` Ronald Natalie
2018-06-26 19:01 ` Ronald Natalie
2018-06-26 21:16   ` Arthur Krewat
2018-06-26 21:50     ` Larry McVoy
2018-06-26 21:54       ` Ronald Natalie
2018-06-26 21:59         ` Larry McVoy
2018-06-26 22:20           ` Bakul Shah
2018-06-26 22:33             ` Arthur Krewat
2018-06-26 23:53               ` Bakul Shah
2018-06-27  8:30             ` Tim Bradshaw
2018-06-26 22:33           ` Andy Kosela
2018-06-27  0:11             ` Bakul Shah
2018-06-27  6:10               ` arnold
2018-06-27  2:18           ` [TUHS] PDP-11 legacy, C, and modern architectTures Theodore Y. Ts'o
2018-06-27  2:22             ` Theodore Y. Ts'o
2018-06-28 14:36             ` Steffen Nurpmeso
2018-06-27 11:26         ` [TUHS] PDP-11 legacy, C, and modern architectures Tony Finch
2018-06-27 14:33           ` Clem Cole
2018-06-27 14:38             ` Clem Cole
2018-06-27 15:30             ` Paul Winalski
2018-06-27 16:55               ` Tim Bradshaw
2018-06-27  6:27     ` arnold
2018-06-27 16:00 ` Steve Johnson [this message]
2018-06-28  4:12   ` Bakul Shah
2018-06-28 14:15     ` Theodore Y. Ts'o
2018-06-28 14:40       ` Larry McVoy
2018-06-28 14:55         ` Perry E. Metzger
2018-06-28 14:58           ` Larry McVoy
2018-06-28 15:39             ` Tim Bradshaw
2018-06-28 16:02               ` Larry McVoy
2018-06-28 16:41                 ` Tim Bradshaw
2018-06-28 16:59                   ` Paul Winalski
2018-06-28 17:09                   ` Larry McVoy
2018-06-29 15:32                     ` tfb
2018-06-29 16:09                       ` Perry E. Metzger
2018-06-29 17:51                       ` Larry McVoy
2018-06-29 18:27                         ` Tim Bradshaw
2018-06-29 19:02                         ` Perry E. Metzger
2018-06-28 20:37                 ` Perry E. Metzger
2018-06-28 15:37         ` Clem Cole
2018-06-28 20:37           ` Lawrence Stewart
2018-06-28 14:43       ` Perry E. Metzger
2018-06-28 14:56         ` Larry McVoy
2018-06-28 15:07           ` Warner Losh
2018-06-28 19:42           ` Perry E. Metzger
2018-06-28 19:55             ` Paul Winalski
2018-06-28 20:42             ` Warner Losh
2018-06-28 21:03               ` Perry E. Metzger
2018-06-28 22:29                 ` Theodore Y. Ts'o
2018-06-29  0:18                   ` Larry McVoy
2018-06-29 15:41                     ` Perry E. Metzger
2018-06-29 18:01                       ` Larry McVoy
2018-06-29 19:07                         ` Perry E. Metzger
2018-06-29  5:58                   ` Michael Kjörling
2018-06-28 20:52             ` Lawrence Stewart
2018-06-28 21:07               ` Perry E. Metzger
2018-06-28 16:45       ` Paul Winalski
2018-06-28 20:47         ` Perry E. Metzger
2018-06-29 15:43         ` emanuel stiebler
2018-06-29  2:02       ` Bakul Shah
2018-06-29 12:58         ` Theodore Y. Ts'o
2018-06-29 18:41           ` Perry E. Metzger
2018-06-29  1:02 Noel Chiappa
2018-06-29  1:06 Noel Chiappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af780f9fb5c14e37f12ce5c2a4e40376669c730f@webmail.yaccman.com \
    --to=scj@yaccman.com \
    --cc=beebe@math.utah.edu \
    --cc=tuhs@minnie.tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).