The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* Re: [TUHS] PDP-11 legacy, C, and modern architectures
@ 2018-06-29  1:06 Noel Chiappa
  0 siblings, 0 replies; 65+ messages in thread
From: Noel Chiappa @ 2018-06-29  1:06 UTC (permalink / raw)
  To: tuhs; +Cc: jnc

    > But don't forget Clarke's Third Law!

Ooops. Read my Web search results wrong. (Should have taken the time to click
through, sigh.) Meant 'Clarke's First Law'.

	 Noel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29 18:01                       ` Larry McVoy
@ 2018-06-29 19:07                         ` Perry E. Metzger
  0 siblings, 0 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-29 19:07 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

On Fri, 29 Jun 2018 11:01:09 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> Well welcome to the old farts club, I'll cut you some slack :)
> I still think you are missing the point I was trying to make,
> it's amusing, a bit, that you are preaching what you are to me
> as a guy who moved Sun in that direction.  I'm not at all against
> your arguments, I was just making a different point.

I think we may be mostly talking past each other. To me, the
underlying question began at the start of this thread (see the
subject which we should have changed a long time ago): "is there any
benefit to new sorts of programming languages to deal with the modern
multiprocessor world".

I think we're now at the point where dealing with fleets of
processors is the norm, and on the languages side, I think Erlang was
a good early exemplar on that, and now that we have Rust I think the
answer is a definitive "yes". Go's CSP stuff is clearly also intended
to address this. Having language support so you don't have to handle
concurrent and parallel stuff all on your own is really nice.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29 17:51                       ` Larry McVoy
  2018-06-29 18:27                         ` Tim Bradshaw
@ 2018-06-29 19:02                         ` Perry E. Metzger
  1 sibling, 0 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-29 19:02 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs, tfb

On Fri, 29 Jun 2018 10:51:26 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> > But I think my usage tells you something important: that the
> > performance of individual cores will, inevitably, become
> > increasingly tiny compared to the performance of the system they
> > are in ....  
> 
> Yeah, so what?  That wasn't the point being discussed though you
> and Perry keep pushing it.  

Perhaps we're not agreed on what was originally motivating the
discussion. I believe we started with whether modern parallel
computing environments might need something more than C. I think the
original hardware question this spawned was: is dealing with lots of
cores simultaneously here to stay, and do you get something out of
having language support to help with it? We were trying to address
that question, and given that, I think we've been on point.

Single programs that have to handle large numbers of threads and
cores are now common. Every interesting use of the GPU on your
machine is like this, including the classic ones like rendering
images to your screen, but also newer ones like being exploited
for all sorts of purposes by your browser that aren't related to
video as such. Your desktop browser is a fine example of other sorts
of parallelism, too: it uses loads of parallelism (not merely
concurrency) on a modern desktop to deal with rapid processing
of the modern hideous and bloated browsing stack.

As for whether there's an advantage to modern languages here, the
answer there is also "yes", Mozilla only really managed to get a
bunch of parallelism into Firefox because they used a language (Rust)
with a pretty advanced linear type system to assure that they didn't
have concurrency problems.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29 12:58         ` Theodore Y. Ts'o
@ 2018-06-29 18:41           ` Perry E. Metzger
  0 siblings, 0 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-29 18:41 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: tuhs

On Fri, 29 Jun 2018 08:58:48 -0400 "Theodore Y. Ts'o" <tytso@mit.edu>
wrote:
> All of this is to point out that talking about 2998 threads really
> doesn't mean much.  We shouldn't be talking about threads; we should
> be talking about how many CPU cores can we usefully keep busy at the
> same time.  Most of the time, for desktops and laptops, except for
> brief moments when you are running "make -j32" (and that's only for
> us weird programmer-types; we aren't actually the common case),
> most of the time, the user-facing CPU is twiddling its fingers.

On the other hand, when people are doing machine learning stuff on
GPUs or TPUs, or are playing a video game on a GPU, or are doing
weather prediction or magnetohydrodynamics simulations, or are
calculating the bonding energies of molecules, they very much _are_
going to use each and every computation unit they can get their hands
on. Yah, a programmer doesn't use very much CPU unless they're
compiling, but the world's big iron isn't there to make programmers
happy these days.

Even if "all" you're doing is handling instant message traffic for a
couple hundred million people, you're going to be doing a lot of stuff
that isn't easily classified as concurrent vs. parallel
vs. distributed any more, and the edges all get funny.

> > 4. The reason most people prefer to use one very high perf.
> >    CPU rather than a bunch of "wimpy" processors is *because*
> >    most of our tooling uses only sequential languages with
> >    very little concurrency.
>
> The problem is that I've been hearing this excuse for two decades.
> And there have been people who have been working on this problem.
> And at this point, there's been a bit of "parallelism winter" that
> is much like the "AI winter" in the 80's.

The parallelism winter already happened, in the late 1980s. Larry may
think I'm a youngster, but I worked over 30 years ago on a machine
called "DADO" built at Columbia with 1023 processors arranged in a
tree with MIMD and SIMD features depending on how it was
arranged. Later I worked at Bellcore on something called the Y machine
built for massive simulations.

All that parallel stuff died hard because general purpose CPUs kept
getting faster too quickly for it to be worthwhile. Now, however,
things are different.

> Lots of people have been promisng wonderful results for a long time;
> Sun bet their company (and lost) on it; and there haven't been much
> in the way of results.

I'm going to dispute that. Sun bet the company on the notion that
people wanted machines they could sell them at ridiculously high
margins with pretty low performance per dollar when they could go out
and buy Intel boxes running RHEL instead. As has been said elsewhere
in the thread, what failed wasn't Niagra as much as Sun's entire
attempt to sell people what were effectively mainframes at a point
where they no longer wanted giant boxes with low throughput per
dollar.

Frankly, Sun jumped the shark long before. I remember some people (Hi
Larry!) valiantly keeping adding minor variants on the SunOS release
number (what did it get to? SunOS 4.1.3_u1b or some such?) at a point
where the firm had committed to Solaris. They stopped shipping
compilers so they could charge you for them, stopped maintaining the
desktop, stopped updating the userland utilities so they got
ridiculously behind (Solaris still, when I last checked, had a /bin/sh
that wouldn't handle $( ) ), and put everything into selling bigger
and bigger iron, which paid off for a little while, until it didn't
any more.

> Sure, there are specialized cases where this has been useful ---
> making better nuclear bumbs with which to kill ourselves, predicting
> the weather, etc.  But for the most part, there haven't been much
> improvement for anything other than super-specialzied use cases.

Your laptop's GPUs beat it's CPUs pretty badly on power, and you're
using them every day. Ditto for the GPUs in your cellphone. You're
invoking big parallel back-ends at Amazon, Google, and Apple all the
time over the network, too, to do things like voice recognition and
finding the best route on a large map. They might be "specialized"
uses, but you're invoking them enough times an hour that maybe they're
not that specialized?

> > 6. The conventional wisdom is parallel languages are a failure
> >    and parallel programming is *hard*.  Hoare's CSP and
> >    Dijkstra's "elephants made out of mosquitos" papers are
> >    over 40 years old.
>
> It's a failure because there hasn't been *results*.  There are
> parallel languages that have been proposed by academics --- I just
> don't think they are any good, and they certainly haven't proven
> themselves to end-users.

Er, Erlang? Rust? Go has CSP and is in very wide deployment? There's
multicore stuff in a bunch of functional languages, too.

If you're using Firefox right now, a large chunk of the code is
running multicore using Rust to assure that parallelism is safe. Rust
does that using type theory work (on linear types) from the PL
community. It's excellent research that's paying off in the real
world.

(And on the original point, you're also using the GPU to render most
of what you're looking at, invoked from your browser, whether Firefox,
Chrome, or Safari. That's all parallel stuff and it's going on every
time you open a web page.)

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29 17:51                       ` Larry McVoy
@ 2018-06-29 18:27                         ` Tim Bradshaw
  2018-06-29 19:02                         ` Perry E. Metzger
  1 sibling, 0 replies; 65+ messages in thread
From: Tim Bradshaw @ 2018-06-29 18:27 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

On 29 Jun 2018, at 18:51, Larry McVoy <lm@mcvoy.com> wrote:
> 
> Yeah, so what?  That wasn't the point being discussed though you and Perry
> keep pushing it.  

Fuck this.  Warren: can you unsubscribe me please?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29 15:41                     ` Perry E. Metzger
@ 2018-06-29 18:01                       ` Larry McVoy
  2018-06-29 19:07                         ` Perry E. Metzger
  0 siblings, 1 reply; 65+ messages in thread
From: Larry McVoy @ 2018-06-29 18:01 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: TUHS main list

> You don't remember that you've known me for thirty years or so?
> 
> Hell, you used to help me out. Viktor and I and a couple of other
> people built the predecessor of Jumpstart at Lehman Brothers like 25
> years ago, it was called PARIS, and you were the one who did stuff
> like telling us the secret IOCTL to to turn off sync FFS metadata
> writes in SunOS so we could saturate our disk controllers during
> installs. (I guess it wasn't that secret but it was a big help, we got
> the bottleneck down to being network bandwidth and could install a
> single workstation from "boot net" to ready for the trader in five
> minutes.)

Heh, check this out:

$ call Viktor
Viktor Dukhovni                 718-754-2126 (W/Morgan Stanley)

Sorry Perry, I had completely forgotten about all that stuff.  Now that
you mention it, I do remember your install stuff, it was very cool.
But I'm horrible with names, when I was teaching at Stanford I'd have
students come up to me from a couple of semesters ago and I'd have no
idea what their names were.  Sorry about that.

> Anyway, I've also been doing this for quite a while. Not as long as
> many people here, I'm way younger than the people who were hacking in
> the 1960s on IBM 360s (I was learning things like reading back then,
> not computer science), but my first machine was a PDP-8e with an
> ASR-33 teletype.

Well welcome to the old farts club, I'll cut you some slack :)
I still think you are missing the point I was trying to make,
it's amusing, a bit, that you are preaching what you are to me
as a guy who moved Sun in that direction.  I'm not at all against
your arguments, I was just making a different point.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29 15:32                     ` tfb
  2018-06-29 16:09                       ` Perry E. Metzger
@ 2018-06-29 17:51                       ` Larry McVoy
  2018-06-29 18:27                         ` Tim Bradshaw
  2018-06-29 19:02                         ` Perry E. Metzger
  1 sibling, 2 replies; 65+ messages in thread
From: Larry McVoy @ 2018-06-29 17:51 UTC (permalink / raw)
  To: tfb; +Cc: tuhs

On Fri, Jun 29, 2018 at 04:32:59PM +0100, tfb@tfeb.org wrote:
> On 28 Jun 2018, at 18:09, Larry McVoy <lm@mcvoy.com> wrote:
> > 
> > I'm not sure how people keep missing the original point.  Which was:
> > the market won't choose a bunch of wimpy cpus when it can get faster
> > ones.  It wasn't about the physics (which I'm not arguing with), it 
> > was about a choice between lots of wimpy cpus and a smaller number of
> > fast cpus.  The market wants the latter, as Ted said, Sun bet heavily
> > on the former and is no more.
> 
> [I said I wouldn't reply more: I'm weak.]
> 
> I think we have been talking at cross-purposes, which is probably
> my fault.  I think you've been using 'wimpy' to mean 'intentionally
> slower than they could be' while I have been using it to mean 'of very
> tiny computational power compared to the power of the whole system'.
> Your usage is probably more correct in terms of the way the term has
> been used historically.

Not "intentionally" as "let me slow this down" but as in "it's faster 
and cheaper to make a slower cpu so I'll just give you more of them".

The market has shown, repeatedly, that more slow cpus are not as fun
as less faster cpus.  

It's not a hard concept and I struggle to understand why it's a point
to discuss.

> But I think my usage tells you something important: that the performance
> of individual cores will, inevitably, become increasingly tiny compared
> to the performance of the system they are in ....

Yeah, so what?  That wasn't the point being discussed though you and Perry
keep pushing it.  

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29 15:32                     ` tfb
@ 2018-06-29 16:09                       ` Perry E. Metzger
  2018-06-29 17:51                       ` Larry McVoy
  1 sibling, 0 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-29 16:09 UTC (permalink / raw)
  To: tfb; +Cc: tuhs

On Fri, 29 Jun 2018 16:32:59 +0100 tfb@tfeb.org wrote:
> On 28 Jun 2018, at 18:09, Larry McVoy <lm@mcvoy.com> wrote:
> > 
> > I'm not sure how people keep missing the original point.  Which
> > was: the market won't choose a bunch of wimpy cpus when it can
> > get faster ones.  It wasn't about the physics (which I'm not
> > arguing with), it was about a choice between lots of wimpy cpus
> > and a smaller number of fast cpus.  The market wants the latter,
> > as Ted said, Sun bet heavily on the former and is no more.
> 
> [I said I wouldn't reply more: I'm weak.]
> 
> I think we have been talking at cross-purposes, which is probably
> my fault.  I think you've been using 'wimpy' to mean 'intentionally
> slower than they could be' while I have been using it to mean 'of
> very tiny computational power compared to the power of the whole
> system'.  Your usage is probably more correct in terms of the way
> the term has been used historically.
>
> But I think my usage tells you something important: that the
> performance of individual cores will, inevitably, become
> increasingly tiny compared to the performance of the system they
> are in, and will almost certainly become asymptotically constant
> (ie however much money you spend on an individual core it will not
> be very much faster than the one you can buy off-the-shelf).

You've touched the point with a needle. This is exactly what I was
getting at. It's not a question of whether you want a better single
CPU or not, they don't exist, and one CPU is a tiny fraction of the
power in a modern distributed or parallel system.

> So, if you want to keep seeing performance improvements (especially
> if you want to keep seeing exponential improvements for any
> significant time), then you have no choice but to start thinking
> about parallelism.
>
> The place I work now is an example of this.  Our machines have the
> fastest cores we could get.  But we need nearly half a million of
> them to do the work we want to do (this is across three systems).

And that's been the case for a long time now. Supercomputers are all
giant arrays of CPUs, there hasn't been a world-beating single core
supercomputer in decades.

> I certainly don't want to argue that choosing intentionally slower
> cores than you can get is a good idea in general (although there
> are cases where it may be, including, perhaps, some HPC workloads).

And anything where power usage is key. If you're doing something that
computationally requires top-end single core performance you run the
4GHz core. Otherwise, you save a boatload of power and also
_cooling_ by going a touch slower. Dynamic power rises as the square
of the clock rate so you get quite a lot out of backing off just a
bit.

But it doesn't matter much even if you want to run flat out, because
no single processor can do what you want any more. There's a serious
limit to how much money you can throw at the vendors to give you a
faster core, and it tops out (per core) at a lot less than you're
paying your top engineers per day.

> However let me add something about the Sun T-series machines, which
> were 'wimpy cores' in the 'intentionally slower' sense.  When these
> started appearing I worked in a canonical Sun customer at the time:
> a big retail bank.  And the reason we did not buy lots of them was
> nothing to do with how fast they were (which was more than fast
> enough), it was because Sun's software was inadequate.

Your memory of this is the same as mine. I was also consulting to quite
similar customers at the time (most of my career has been consulting
to large investment banks. Haven't done much on the retail side though
that's changed in recent years.)

> (As an addentum: what eventually happened / is happening I think is
> that applications are getting recertified on Linux/x86 sitting on
> top of ESX, *which can move VMs live between hosts*, thus solving
> the problem.)

At the investment banks, the appetite for new Sun hardware when cheap
Linux on Intel was available was just not there. As you noted, too
many eggs in one basket was one problem, but another was the
combination of better control on an open source OS and just plain
cheaper and (by then sometimes even nicer) hardware.

Sun seemed to get fat in the .com bubble when people were throwing
money at their stuff like crazy, and never quite adjusted to the fact
that the world was changing and people wanted lots of cheap boxes more
than they wanted a few expensive ones.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 16:45       ` Paul Winalski
  2018-06-28 20:47         ` Perry E. Metzger
@ 2018-06-29 15:43         ` emanuel stiebler
  1 sibling, 0 replies; 65+ messages in thread
From: emanuel stiebler @ 2018-06-29 15:43 UTC (permalink / raw)
  Cc: tuhs

On 2018-06-28 12:45, Paul Winalski wrote:
> On 6/28/18, Theodore Y. Ts'o <tytso@mit.edu> wrote:

> Parallel programming *is* hard for humans.  Very few people can cope
> with it, or with the nasty bugs that crop up when you get it wrong.

I'm not sure about it. Look how many processes/threads(?) some hardware
guys program in VHDL/Verilog at a time, how many of those run in
different clocking domains, parallel and the stuff works.

I think it is just a matter to get used to thinking this way ...

>> The problem is that not all people are interested in solving problems
>> which are amenable to embarassingly parallel algorithms.
> 
> Most interesting problems in fact are not embarrassingly parallel.
> They tend to have data interdependencies.
> 
> There have been some advancements in software development tools to
> make parallel programming easier.  Modern compilers are getting pretty
> good at loop analysis to discover opportunities for parallel execution
> and vectorization in sequentially-written code.

I was actually just musing about this, but we have multi-threaded
architectures, we have languages which would support this.

Probably we are just missing the problems, we would like to solve with it?

;-)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29  0:18                   ` Larry McVoy
@ 2018-06-29 15:41                     ` Perry E. Metzger
  2018-06-29 18:01                       ` Larry McVoy
  0 siblings, 1 reply; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-29 15:41 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

On Thu, 28 Jun 2018 17:18:31 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> On Thu, Jun 28, 2018 at 06:29:54PM -0400, Theodore Y. Ts'o wrote:
> > On Thu, Jun 28, 2018 at 05:03:17PM -0400, Perry E. Metzger
> > wrote:  
> > > 
> > > Tens of thousands of machines is a lot more than one. I think
> > > the point stands. This is the age of distributed and parallel
> > > systems.  
> > 
> > This is the age of distributed systems, yes.  I'm not so sure
> > about "parallel".  And the point remains that for many problems,
> > you need fewer strong cores, and a crapton of weak cores is not
> > as useful.  
> 
> As usual, Ted gets it.

My laptop's GPUs are a lot more powerful than the CPU and do much
more most of the time, and they're ridiculously parallel. Everything
from weather prediction to machine learning to Google's search
stuff runs _parallel_, not just distributed. All the simulations I do
of molecular systems run parallel, on lots and lots of machines.

> Perry, please take this in the spirit in which is intended, but
> you're arguing with people who have been around the block (there
> are people on this list that have 5 decades of going around the
> block - looking at you Ken).

You don't remember that you've known me for thirty years or so?

Hell, you used to help me out. Viktor and I and a couple of other
people built the predecessor of Jumpstart at Lehman Brothers like 25
years ago, it was called PARIS, and you were the one who did stuff
like telling us the secret IOCTL to to turn off sync FFS metadata
writes in SunOS so we could saturate our disk controllers during
installs. (I guess it wasn't that secret but it was a big help, we got
the bottleneck down to being network bandwidth and could install a
single workstation from "boot net" to ready for the trader in five
minutes.)

I guess I wasn't that memorable, but I'm sure at least Ted remembers
me, we've been paling around at conferences and IETF meetings for
decades.

Anyway, I've also been doing this for quite a while. Not as long as
many people here, I'm way younger than the people who were hacking in
the 1960s on IBM 360s (I was learning things like reading back then,
not computer science), but my first machine was a PDP-8e with an
ASR-33 teletype.

> This is a really poor place for a younger person

I wish I was young. People do still tell me that I look like I'm in my
30s but I think that's just that my hair isn't gray yet for some
unaccountable reason.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 17:09                   ` Larry McVoy
@ 2018-06-29 15:32                     ` tfb
  2018-06-29 16:09                       ` Perry E. Metzger
  2018-06-29 17:51                       ` Larry McVoy
  0 siblings, 2 replies; 65+ messages in thread
From: tfb @ 2018-06-29 15:32 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

On 28 Jun 2018, at 18:09, Larry McVoy <lm@mcvoy.com> wrote:
> 
> I'm not sure how people keep missing the original point.  Which was:
> the market won't choose a bunch of wimpy cpus when it can get faster
> ones.  It wasn't about the physics (which I'm not arguing with), it 
> was about a choice between lots of wimpy cpus and a smaller number of
> fast cpus.  The market wants the latter, as Ted said, Sun bet heavily
> on the former and is no more.

[I said I wouldn't reply more: I'm weak.]

I think we have been talking at cross-purposes, which is probably my fault.  I think you've been using 'wimpy' to mean 'intentionally slower than they could be' while I have been using it to mean 'of very tiny computational power compared to the power of the whole system'.  Your usage is probably more correct in terms of the way the term has been used historically.

But I think my usage tells you something important: that the performance of individual cores will, inevitably, become increasingly tiny compared to the performance of the system they are in, and will almost certainly become asymptotically constant (ie however much money you spend on an individual core it will not be very much faster than the one you can buy off-the-shelf). So, if you want to keep seeing performance improvements (especially if you want to keep seeing exponential improvements for any significant time), then you have no choice but to start thinking about parallelism.

The place I work now is an example of this.  Our machines have the fastest cores we could get.  But we need nearly half a million of them to do the work we want to do (this is across three systems).

I certainly don't want to argue that choosing intentionally slower cores than you can get is a good idea in general (although there are cases where it may be, including, perhaps, some HPC workloads).

---

However let me add something about the Sun T-series machines, which were 'wimpy cores' in the 'intentionally slower' sense.  When these started appearing I worked in a canonical Sun customer at the time: a big retail bank.  And the reason we did not buy lots of them was nothing to do with how fast they were (which was more than fast enough), it was because Sun's software was inadequate.

To see why, consider what retail banks IT looked like in the late 2000s.  We had a great mass of applications, the majority of which ran on individual Solaris instances (at least two, live and DR, per application).  A very high proportion (not all) of these applications had utterly negligible computational requirements.  But they had very strong requirements on availability, or at least the parts of the business which owned them said they did and we could not argue with that, especially given that this was 2008 and we knew that if we had a visible outage there was a fair chance that it would be misread as the bank failing, resulting in  a cascade failure of the banking system and the inevitable zombie apocalypse.  No one wanted that.

Some consolidation had already been done: we had a bunch of 25ks, many of which were split into lot of domains.  The smallest domain on a 25k was a single CPU board which was 4 sockets and therefore 8 or 16 (I forget how many cores there were per socket) cores.  I think you could not partition a 25k like that completely because you ran out of IO assemblies, so some domains had to be bigger.

This smallest domain was huge overkill for many of these applications, and 25ks were terribly expensive as well.

So, along came the first T-series boxes and they were just obviously ideal: we could consolidate lots and lots of these things onto a single T-series box, with a DR partner, and it would cost some tiny fraction of what a 25k cost, and use almost no power (DC power was and is a real problem).

But we didn't do that: we did some experiments, and some things moved I think, but on the whole we didn't move.  The reason we didn't move was nothing, at all, to do with performance, it was, as I said, software, and in particular virtualisation. Sun had two approaches to this, neither of which solved the problems that everyone had.

At the firmware level there were LDOMs (which I think did not work very well early on or may not have existed) which let you cut up a machine into lots of smaller ones with a hypervisor in the usual way.  But all of these smaller machines shared the same hardware of course.  So if you had a serious problem on the machine, then all of your LDOMs went away, and all of the services on that machine had an outage, at once.  This was not the case on a 25k: if a CPU or an IO board died it would affect the domain it was part of, but everything else would carry on.

At the OS level there were zones (containers).  Zones had the advantage that they could look like Solaris 8 (the machine itself, and therefore the LDOMs it got split into, could only run Solaris 10), which all the old applications were running, and they could be very fine-grained.  But they weren't really very isolated from each other (especially in hindsight), they didn't look *enough* like Solaris 8 for people to be willing to certify the applications on them, and they still had the all-your-eggs-in-one-basket problem if the hardware died.

The thing that really killed it was the eggs-in-one-basket problem.  We had previous experience with consolidating a lot of applications onto one OS & hardware instance, and no-one wanted to go anywhere near that.  If you needed to get an outage (say to install a critical security patch, or because of failing hardware) you had to negotiate this with *all* the application teams, all of whom had different requirements and all of whom regarded their application as the most important thing the bank ran (some of them might be right).  It could very easily take more than a year to get an outage on the big shared-services machines, and when the outage happened you would have at least 50 people involved to stop and restat everything.  It was just a scarring nightmare.

So, to move to the T-series machines what we would have needed was a way of partitioning the machine in such a way that the partitions ran Solaris 8 natively, and in such a way that the partitions could be moved, live, to other systems to deal with the eggs-in-one-basket problem.  Sun didn't have that, anywhere near (they knew this I think, and they got closer later on, but it was too late).

So, the machines failed for us.  But this failure was nothing, at all, to do with performance, let alone performance per core, which was generally more than adequate.  Lots of wimpy, low power, CPUs was what we needed, in fact: we just needed the right software on top of them, which was not there.

(As an addentum: what eventually happened / is happening I think is that applications are getting recertified on Linux/x86 sitting on top of ESX, *which can move VMs live between hosts*, thus solving the problem.)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-29  2:02       ` Bakul Shah
@ 2018-06-29 12:58         ` Theodore Y. Ts'o
  2018-06-29 18:41           ` Perry E. Metzger
  0 siblings, 1 reply; 65+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-29 12:58 UTC (permalink / raw)
  To: Bakul Shah; +Cc: tuhs

On Thu, Jun 28, 2018 at 07:02:11PM -0700, Bakul Shah wrote:
> 3. As Perry said, we are using parallel and distributed
>    computing more and more. Even the RaspberryPi Zero has a
>    several times more powerful GPU than its puny ARM "cpu"!
>    Most all cloud services use multiple cores & nodes.  We may
>    not set up our services but we certainly use quite a few of
>    them via the Internet. Even on my laptop at present there
>    are 555 processes and 2998 threads. Most of these are
>    indeed "embarrassingly" parallel -- most of them don't talk
>    to each other!

In order to think clearly about the problem, it's important to
distingish between parallel and distributed computing.  Parallel
computing to me means that you have a large number of CPU-bound
threads that are all working on the same problem.  What is mean by
"the same problem" is tricky, and we need to distinguish between
stupid ways of breaking up the work --- for example, in Service
Oriented Architectures, you might do an RPC call to multiple a dollar
value by 1.05 to calculate the sales tax --- sure, you can call that
"distributed" computing, or even "parallel" computing because it's a
different thread (even if it is I/O bound waiting for the next RPC
call, and most of the CPU power is spent marshalling and unmarshalling
the parmeter and return values).  But it's a __dumb__ way of breaking
up the problem.  At least, unless the problem is to sell lots of extra
IBM hardware and make IBM shareholders lots of money, in which case,
it's brilliant.  :-)

It's also important to distinguish between CPU-bound and I/O-bound
threads.  You may have 2998 threads, and but I bet they are mostly I/O
bound, and are there for programmer convenience.  Very often such
threads are not actually a terribly efficent way to break up the
problem.  In my career, where the number of threads are significantly
greater than the number of CPU's, you can actually make a tradeoff
between programmer convenience and CPU efficiency by taking those
hundreds of threads, and transforming PC and a small state structure
into something that is much more of a continuation-based
implementation which uses significantly fewer threads.  That
particular architecture still had cores that were mostly I/O bound,
but it meant we could use significantly cheaper CPU's, and it saved
millions and millions of dollars.

All of this is to point out that talking about 2998 threads really
doesn't mean much.  We shouldn't be talking about threads; we should
be talking about how many CPU cores can we usefully keep busy at the
same time.  Most of the time, for desktops and laptops, except for
brief moments when you are running "make -j32" (and that's only for us
weird programmer-types; we aren't actually the common case), most of
the time, the user-facing CPU is twiddling its fingers.

> 4. The reason most people prefer to use one very high perf.
>    CPU rather than a bunch of "wimpy" processors is *because*
>    most of our tooling uses only sequential languages with
>    very little concurrency.

The problem is that I've been hearing this excuse for two decades.
And there have been people who have been working on this problem.  And
at this point, there's been a bit of "parallelism winter" that is much
like the "AI winter" in the 80's.  Lots of people have been promisng
wonderful results for a long time; Sun bet their company (and lost) on
it; and there haven't been much in the way of results.

Sure, there are specialized cases where this has been useful ---
making better nuclear bumbs with which to kill ourselves, predicting
the weather, etc.  But for the most part, there haven't been much
improvement for anything other than super-specialzied use cases.
Machine learning might be another area, but that's one where we're
seeing specialied chips that are doing one thing and exactly one
thing.  Whether it's running a nueral network, or doing AES encryption
in-line, this is not an example of better parllel programming
languages or better software tooling.

> 5. You may well be right that most people don't need faster
>    machines. Or that machines optimized for parallel languages
>    and codes may never succeed commercially.
> 
>    But as a techie I am more interested in what can be built
>    (as opposed to what will sell). It is not a question of
>    whether problems amenable to parallel solutions are the
>    *only problems that matter*.

If we can build something which is useful, the money will take care of
itself.  That means generally useful.  The market of weather
prediction or people interested in building better nuclear bombs is
fairly small compared to the entire computing market.

As a techie, what I am interested in is building something that is
useful.  But part of being useful is that it has to make economic
sense.  That problably makes me a lousy acdemic, but I'm a cynical
industry engineer, not an academic.

> 6. The conventional wisdom is parallel languages are a failure
>    and parallel programming is *hard*.  Hoare's CSP and
>    Dijkstra's "elephants made out of mosquitos" papers are
>    over 40 years old.

It's a failure because there hasn't been *results*.  There are
parallel languages that have been proposed by academics --- I just
don't think they are any good, and they certainly haven't proven
themselves to end-users.

>    We are doing adhoc distributed systems but we
>    don't have a theory as to how they behave under stress.
>    But see also [1]

Actually, there are plenty of people at the hyper-scaler cloud
companies (e.g., Amazon, Facebook, Google, etc.) who understand very
well how they behave under stress.  Many of these companies regularly
experiment with putting their systems under stress to see how they
behave.  More importantly, they will concoct full-blown scenarios
(sometimes with amusing back-stories such as extra-dimensional aliens
attacking Moffet Field) to test how *humans* and their *processes*
manging these large-scale distributed systems react under stress.

> Here is a somewhat tenuous justification for why this topic does
> make sense on this list: Unix provides *composable* tools.

How many of these cases were these composable tools actually ones
where it allowed CPU resources to be used more efficiently?  A
pipeline that involves sort, awk, sed, etc. certainly is better
because you didn't have to write an ad-hoc program.  And i've written
lots of Unix pipelines in my time.  But in how many cases were these
pipelines actually CPU bound.  I think if you were to examine the
picture closely, they all tended to be I/O bound, not CPU bound.

So while Unix tools' composability is very good thing, I would
question whether they have proven to be a useful tool in terms of
being able to use computational resources more efficiently, and how
much they really leveraged computational parllelism.

						- Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 22:29                 ` Theodore Y. Ts'o
  2018-06-29  0:18                   ` Larry McVoy
@ 2018-06-29  5:58                   ` Michael Kjörling
  1 sibling, 0 replies; 65+ messages in thread
From: Michael Kjörling @ 2018-06-29  5:58 UTC (permalink / raw)
  To: tuhs

On 28 Jun 2018 18:29 -0400, from tytso@mit.edu (Theodore Y. Ts'o):
> And if you are really worried about potential
> problems with Spectre and Meltdown, what that means is that sharing
> caches is perilous.  So if you have 128 wimpy cores, you need 128
> separate I and D cacaches.  If you have 32 stronger cores, you need 32
> separate I and D caches.

What's more, I suspect that in order to get good performance out of
those wimpy cores, you'd rather need _more_ cache per core, than less
or the same, simply because there's less of an advantage in raw clock.
One doesn't have to look hard to find examples of where adding or
increasing cache in a CPU (these days on-die) have, at least for
workloads that are able to use such cache effectively, led to huge
improvements in overall performance, even at similar clock rates.

Of course, I can't help but find it interesting that we're having this
discussion at all about a language that is approaching 50 years old by
now (Wikipedia puts the earliest design in 1969, which sounds about
right, and even K&R C is 40 years old by now). Sure, C has evolved --
for example, C11 added language constructs for multithreaded
programming, including the _Thread_local storage class specifier) --
but it's still in active use and it's still recognizably an evolved
version of the language specified in K&R. I can pull out the manual
for a pre-ANSI C compiler and look at the code samples, and sure there
are things about that code that a modern compiler barfs at, but it's
quite easy to just move a few things around a little and end up with
pretty close to modern C (albeit code that doesn't take advantage of
new features, obviously). I wonder how many of today's programming
languages we'll be able to say the same thing about in 2040-2050-ish.

-- 
Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
  “The most dangerous thought that you can have as a creative person
              is to think you know what you’re doing.” (Bret Victor)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:15     ` Theodore Y. Ts'o
                         ` (2 preceding siblings ...)
  2018-06-28 16:45       ` Paul Winalski
@ 2018-06-29  2:02       ` Bakul Shah
  2018-06-29 12:58         ` Theodore Y. Ts'o
  3 siblings, 1 reply; 65+ messages in thread
From: Bakul Shah @ 2018-06-29  2:02 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: tuhs

On Thu, 28 Jun 2018 10:15:38 -0400 "Theodore Y. Ts'o" <tytso@mit.edu> wrote:
> Bakul, I think you and Steve have a very particular set of programming
> use cases in mind, and are then over-generalizing this to assume that
> these are the only problems that matter.  It's the same mistake
> Chisnall made when he assrted the parallel programming a myth that
> humans writing parallel programs was "hard", and "all you needed" was
> the right language.

Let me try to explain my thinking on this topic.

1. Cobol/C/C++/Fortran/Go/Java/Perl/Python/Ruby/... will be
   around for a long time.  Lots of computing platforms &
   existing codes using them so them languages will continue
   to be used. This is not particularly interesting or worth
   debating.

2. A lot of processor architecture evolution in the last few
   decades has been to make such programs run as fast as
   possible but this emulation of flat shared/virtual address
   spaces and squeezing out parallelism at run time from
   sequential code is falling further and further behind.

   The original Z80 had 8.5E3 transistors while an 8 core
   Ryzen has 4.8E9 transistors (about 560K Z80s worth of
   transistor).  And yet, it is only 600K times faster in
   Dhrystone MIPS even though it has > thousand times faster
   clock rate, its data bus is 8 times wider and there is no
   data/address multiplexing.

   I make this crude comparison to point out how much less
   efficient these resources are used due to the needs of
   above languages.

3. As Perry said, we are using parallel and distributed
   computing more and more. Even the RaspberryPi Zero has a
   several times more powerful GPU than its puny ARM "cpu"!
   Most all cloud services use multiple cores & nodes.  We may
   not set up our services but we certainly use quite a few of
   them via the Internet. Even on my laptop at present there
   are 555 processes and 2998 threads. Most of these are
   indeed "embarrassingly" parallel -- most of them don't talk
   to each other!

   Even local servers in any organization run a large number
   of processes.

   Things like OpenCL is being used more and more to benefit
   from whatever parallelism we can squeeze out of a GPU for
   specialized applications.

4. The reason most people prefer to use one very high perf.
   CPU rather than a bunch of "wimpy" processors is *because*
   most of our tooling uses only sequential languages with
   very little concurrency. And just as in the case of
   processors, most of our OSes also allow use of very little
   parallelism. And most performance metrics focus on single
   CPU performance. This is what is optimized so given these
   assumptions using faster and faster CPUs makes the most
   sense but we are running out that trick.

5. You may well be right that most people don't need faster
   machines. Or that machines optimized for parallel languages
   and codes may never succeed commercially.

   But as a techie I am more interested in what can be built
   (as opposed to what will sell). It is not a question of
   whether problems amenable to parallel solutions are the
   *only problems that matter*.  

   I think about these issues because
   a) I find them interesting (one among many).
   b) Currently we are using resources rather inefficiently
      and I'm interested in what can be done about it.
   c) This is the only direction in future that may yield
      faster and faster solutions for large set of problems.

   And in *this context* our current languages do fall short.

6. The conventional wisdom is parallel languages are a failure
   and parallel programming is *hard*.  Hoare's CSP and
   Dijkstra's "elephants made out of mosquitos" papers are
   over 40 years old. But I don't think we have a lot of
   experience with parallel languages to know one way or
   another. We are doing adhoc distributed systems but we
   don't have a theory as to how they behave under stress.
   But see also [1]

7. As to distributed vs parallel systems, my point was that
   even if they different, there are a number of subproblems
   common to them both. These should be investigated further
   (beyond using adhoc solutions like Kubernetes). It may even
   be possible to separate a reliability layer to simplify a
   more abstract layer that is more common to the both. Not
   unlike the way disks handle reliability so that at a higher
   level we can treat them like a sequence of blocks (or how
   IP & TCP handle reliability).

Here is a somewhat tenuous justification for why this topic does
make sense on this list: Unix provides *composable* tools. It
isn'just that one unix program did one thing well but that it
was easy to make them worked together to achieve a goal + a
few abstractions were useful from most programs. You hid
device peculiarities in device drivers and filesystem
peculiarities in filesystem code and so on. I think these same
design principles can help with distributed/parallel systems.

Even if we get an easy to use parallel language, we may still
have to worry about placement and load balancing and latencies
and so forth. And for that we may need a glue language.

[1] What I have discovered is that it takes some experience
and experimenting to think in a particular way that is natural
to the language at hand.  As an example, when programming in k
I often start out with a sequential, loopy program .  But if I
can think about in terms of "array" operations, I can iterate
fast and come up with a better solution. Not have to think
about locations and pointers make this iterativer process very
fast.

Similarly it takes a while to switch to writing forth (or
postscript) code idiomatically. Writind idiomatic code in any
language is not just a matter learning the language but being
comfortable with it, knowing what works well and in what
situation.

I suspect the same is true with parallel programming as well.

Also note that Unix hackers routinely write simple parallel
programms (shell pipelines) but these may seem quite foreign
to people who grew up using just GUI.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
@ 2018-06-29  1:02 Noel Chiappa
  0 siblings, 0 replies; 65+ messages in thread
From: Noel Chiappa @ 2018-06-29  1:02 UTC (permalink / raw)
  To: tuhs; +Cc: jnc

    > From: Larry McVoy

    > This is a really poor place for a younger person to come in and make
    > loud points, that is frowned upon.  It's a fantastic place for a younger
    > person to come in and learn.

But don't forget Clarke's Third Law! And maybe you can remember what it's like
when you're young... :-)

But Ted does have a point. 'Distributed' != 'parallel'.

    Noel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 22:29                 ` Theodore Y. Ts'o
@ 2018-06-29  0:18                   ` Larry McVoy
  2018-06-29 15:41                     ` Perry E. Metzger
  2018-06-29  5:58                   ` Michael Kjörling
  1 sibling, 1 reply; 65+ messages in thread
From: Larry McVoy @ 2018-06-29  0:18 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: TUHS main list

On Thu, Jun 28, 2018 at 06:29:54PM -0400, Theodore Y. Ts'o wrote:
> On Thu, Jun 28, 2018 at 05:03:17PM -0400, Perry E. Metzger wrote:
> > 
> > Tens of thousands of machines is a lot more than one. I think the
> > point stands. This is the age of distributed and parallel systems.
> 
> This is the age of distributed systems, yes.  I'm not so sure about
> "parallel".  And the point remains that for many problems, you need
> fewer strong cores, and a crapton of weak cores is not as useful.

As usual, Ted gets it.

> You're conflating "distributed" and "parllel" computing, and they are
> really quite different.

Precisely (and well put Ted!)

Perry, please take this in the spirit in which is intended, but you're
arguing with people who have been around the block (there are people
on this list that have 5 decades of going around the block - looking at
you Ken).  I designed the first clustered product at Sun, I was the 4th
guy at google (working on clusters there), Ted is a Linux old timer,
Clem goes back in Unix farther than I do, Ken did much of Unix, etc.
There are a ton of people on this list that make me look like a nobody,
you want to be careful in that crowd.

This is a really poor place for a younger person to come in and make
loud points, that is frowned upon.  It's a fantastic place for a younger
person to come in and learn.  All of us old farts want to pass on what
we know and will gladly do so.  But some of us old farts, like me, are
really tired of arguing with people that don't see the whole picture.
This is not the place to bring the whole picture into focus for you,
sorry.  If you want to argue about stuff I'll eventually go away and so
will other old farts.

I kinda think you don't want to chase me away or other old farts away,
this is a place where we (mostly) talk about Unix history and if the
old farts go away, so does the history.

I'm not saying you can't voice your opinion and argue all you want, just
saying this might not be the list for that.  But that's just my view,
Warren will step in if he needs to.

Cheers and welcome,

--lm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 21:03               ` Perry E. Metzger
@ 2018-06-28 22:29                 ` Theodore Y. Ts'o
  2018-06-29  0:18                   ` Larry McVoy
  2018-06-29  5:58                   ` Michael Kjörling
  0 siblings, 2 replies; 65+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-28 22:29 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: TUHS main list

On Thu, Jun 28, 2018 at 05:03:17PM -0400, Perry E. Metzger wrote:
> 
> Tens of thousands of machines is a lot more than one. I think the
> point stands. This is the age of distributed and parallel systems.

This is the age of distributed systems, yes.  I'm not so sure about
"parallel".  And the point remains that for many problems, you need
fewer strong cores, and a crapton of weak cores is not as useful.

Of course we should parllelize work where we can.  The point is that
very often, we can't.  And if you are really worried about potential
problems with Spectre and Meltdown, what that means is that sharing
caches is perilous.  So if you have 128 wimpy cores, you need 128
separate I and D cacaches.  If you have 32 stronger cores, you need 32
separate I and D caches.

And the fact remains that humans really suck at parallel programming.
Use a separate core for each HTTP request, with a load balancer to
split the incoming request to tens of hundreds servers?  Sure!  But
using a several dozen cores for each HTTP request?  That's a much
bigger lift.

You're conflating "distributed" and "parllel" computing, and they are
really quite different.

      	    	     	  	    	   - Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 20:52             ` Lawrence Stewart
@ 2018-06-28 21:07               ` Perry E. Metzger
  0 siblings, 0 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-28 21:07 UTC (permalink / raw)
  To: Lawrence Stewart; +Cc: tuhs

On Thu, 28 Jun 2018 16:52:26 -0400 Lawrence Stewart
<stewart@serissa.com> wrote: 
> Some weird stuff gets built for CDNs!  We had a real-time video
> transcoding project at Quanta using Tilera chips to do transcoding
> on demand for retrofitting systems in China with millions of old
> cable boxes.  Not I/O limited at all!  There was a <lot> of I/O but
> still more computing.

Of course, Tilera is a many core architecture, so again, I think
this supports the original point, which is we're now in the parallel
and distributed age, and that demands software design techniques to
match. (Indeed, If I Recall Correctly, the guys at MIT that designed
the original Tilera stuff are now looking at building 1000 core
devices.)

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 20:42             ` Warner Losh
@ 2018-06-28 21:03               ` Perry E. Metzger
  2018-06-28 22:29                 ` Theodore Y. Ts'o
  0 siblings, 1 reply; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-28 21:03 UTC (permalink / raw)
  To: Warner Losh; +Cc: TUHS main list

On Thu, 28 Jun 2018 14:42:47 -0600 Warner Losh <imp@bsdimp.com> wrote:
> > > Got a source that backs up that claim?  I was recently dancing
> > > with Netflix and they don't match your claim, nor do the other
> > > content delivery networks, they want every cycle they can get.  
> >
> > Netflix has how many machines?  
> 
> We generally say we have tens of thousands of machines deployed
> worldwide in our CDN. We don't give out specific numbers though.

Tens of thousands of machines is a lot more than one. I think the
point stands. This is the age of distributed and parallel systems.

> > Taking the other way of looking at it, from what I understand,
> > CDN boxes are about I/O and not CPU, though I could be wrong. I
> > can ask some of the Netflix people, a former report of mine is
> > one of the people behind their front end cache boxes and we keep
> > in touch.  
> 
> I can tell you it's about both. We recently started encrypting all
> traffic, which requires a crapton of CPU. Plus, we're doing
> sophisticated network flow modeling to reduce congestion, which
> takes CPU. On our 100G boxes, which we get in the low 90's
> encrypted, we have some spare CPU, but almost no space memory
> bandwidth and our PCI lanes are full of either 100G network traffic
> or 4-6 NVMe drives delivering content up at about 85-90Gbps.
> 
> Most of our other boxes are the same, with the exception of the
> 'storage' tier boxes. Those we're definitely hard disk I/O bound.

I believe all of this, but I think it is consistent with the point.
You're not trying to buy $100,000 CPUs that are faster than the
several-hundred-per-core things you can get, because no one sells
them. You're building systems that scale out by adding more CPUs
and more boxes. You might want very high end CPUs even, but the high
end isn't vastly better than the low, and there's a limit to what you
can spend per CPU because there just aren't better ones on the market.

So, all of this means that, architecturally, we're no longer in an
age where things get designed to run on one processor. Systems
have to be built to be parallel and distributed. Our kernels are
no longer one fast core and need to handle multiprocessing and all
it entails. Our software needs to run multicore if it's going to
take advantage of the expensive processors and motherboards we've
bought. Thread pools, locking, IPC, and all the rest are now a way of
life. We've got ways to avoid some of those things by using share
nothing and message passing, but even so, the fact that we've
structured our software to deal with parallelism is unavoidable.

Why am I belaboring this? Because the original point, that language
support for building distributed and parallel systems does help,
isn't wrong. There are a lot of projects out there using things like
Erlang and managing nearly miraculous feats of uptime because of it.
There are people replacing C++ with Rust because they can't reason
about concurrency well enough without language support and Rust's
linear types mean you can't write code that accidentally shares
memory between two writers by accident. The stuff does matter.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 19:42           ` Perry E. Metzger
  2018-06-28 19:55             ` Paul Winalski
  2018-06-28 20:42             ` Warner Losh
@ 2018-06-28 20:52             ` Lawrence Stewart
  2018-06-28 21:07               ` Perry E. Metzger
  2 siblings, 1 reply; 65+ messages in thread
From: Lawrence Stewart @ 2018-06-28 20:52 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: tuhs


> On 2018, Jun 28, at 3:42 PM, Perry E. Metzger <perry@piermont.com> wrote:
> 
> On Thu, 28 Jun 2018 07:56:09 -0700 Larry McVoy <lm@mcvoy.com> wrote:
>>> Huge numbers of wimpy cores is the model already dominating the
>>> world.   
>> 
>> Got a source that backs up that claim?  I was recently dancing with
>> Netflix and they don't match your claim, nor do the other content
>> delivery networks, they want every cycle they can get.
> 
> Netflix has how many machines? I'd say in general that principle
> holds: this is the age of huge distributed computation systems, the
> most you can pay for a single core before it tops out is in the
> hundreds of dollars, not in the millions like it used to be. The high
> end isn't very high up, and we scale by adding boxes and cores, not
> by getting single CPUs that are unusually fast.
> 
> Taking the other way of looking at it, from what I understand,
> CDN boxes are about I/O and not CPU, though I could be wrong. I can
> ask some of the Netflix people, a former report of mine is one of the
> people behind their front end cache boxes and we keep in touch.
> 
> Perry
> -- 
> Perry E. Metzger		perry@piermont.com

Some weird stuff gets built for CDNs!  We had a real-time video transcoding project at Quanta using Tilera chips to do transcoding on demand for retrofitting systems in China with millions of old cable boxes.  Not I/O limited at all!  There was a <lot> of I/O but still more computing.
-L


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 16:45       ` Paul Winalski
@ 2018-06-28 20:47         ` Perry E. Metzger
  2018-06-29 15:43         ` emanuel stiebler
  1 sibling, 0 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-28 20:47 UTC (permalink / raw)
  To: Paul Winalski; +Cc: tuhs

On Thu, 28 Jun 2018 12:45:39 -0400 Paul Winalski
<paul.winalski@gmail.com> wrote:
> On 6/28/18, Theodore Y. Ts'o <tytso@mit.edu> wrote:
> >
> > It's the same mistake
> > Chisnall made when he assrted the parallel programming a myth that
> > humans writing parallel programs was "hard", and "all you needed"
> > was the right language.  
> 
> I''ve heard the "all you need is the right language" solution to the
> parallel processing development problem since I joined DEC in 1980.
> Here we are in 2018 and nobody's found that "right language" yet.

Dunno. Rust does some amazing things because it has a linear type
system, which means both that it can be a fully safe language even
though it doesn't have a garbage collector, and that it can allow
sharing of memory without any fear of multiple writers touching the
same block of memory.

I used to think that there hadn't been much progress in computer
science in decades and then I fell down the rabbit hole of modern
type theory. The evolution of type systems over the last few decades
has changed the game in a lot of ways. Most people aren't aware of
the progress that has been made, which is a shame.

> There have been some advancements in software development tools to
> make parallel programming easier.  Modern compilers are getting
> pretty good at loop analysis to discover opportunities for parallel
> execution and vectorization in sequentially-written code.

You're not mentioning things like linear types, effect systems, etc.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 19:42           ` Perry E. Metzger
  2018-06-28 19:55             ` Paul Winalski
@ 2018-06-28 20:42             ` Warner Losh
  2018-06-28 21:03               ` Perry E. Metzger
  2018-06-28 20:52             ` Lawrence Stewart
  2 siblings, 1 reply; 65+ messages in thread
From: Warner Losh @ 2018-06-28 20:42 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

On Thu, Jun 28, 2018 at 1:42 PM, Perry E. Metzger <perry@piermont.com>
wrote:

> On Thu, 28 Jun 2018 07:56:09 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> > > Huge numbers of wimpy cores is the model already dominating the
> > > world.
> >
> > Got a source that backs up that claim?  I was recently dancing with
> > Netflix and they don't match your claim, nor do the other content
> > delivery networks, they want every cycle they can get.
>
> Netflix has how many machines?


We generally say we have tens of thousands of machines deployed worldwide
in our CDN. We don't give out specific numbers though.


> I'd say in general that principle
> holds: this is the age of huge distributed computation systems, the
> most you can pay for a single core before it tops out is in the
> hundreds of dollars, not in the millions like it used to be. The high
> end isn't very high up, and we scale by adding boxes and cores, not
> by getting single CPUs that are unusually fast.
>
> Taking the other way of looking at it, from what I understand,
> CDN boxes are about I/O and not CPU, though I could be wrong. I can
> ask some of the Netflix people, a former report of mine is one of the
> people behind their front end cache boxes and we keep in touch.


I can tell you it's about both. We recently started encrypting all traffic,
which requires a crapton of CPU. Plus, we're doing sophisticated network
flow modeling to reduce congestion, which takes CPU. On our 100G boxes,
which we get in the low 90's encrypted, we have some spare CPU, but almost
no space memory bandwidth and our PCI lanes are full of either 100G network
traffic or 4-6 NVMe drives delivering content up at about 85-90Gbps.

Most of our other boxes are the same, with the exception of the 'storage'
tier boxes. Those we're definitely hard disk I/O bound.

Warner

[-- Attachment #2: Type: text/html, Size: 2526 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 15:37         ` Clem Cole
@ 2018-06-28 20:37           ` Lawrence Stewart
  0 siblings, 0 replies; 65+ messages in thread
From: Lawrence Stewart @ 2018-06-28 20:37 UTC (permalink / raw)
  To: Clem Cole; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 6708 bytes --]

Thanks for the promotion to CTO Clem!  I was merely the software architect at Sicortex.

The SC systems had 6-core MIPS-64 cpus at 700 MHz, two channel DDR-2, and a really fast interconnect.  (Seriously fast for its day 800 ns PUT, 1.6 uS GET, north of 2 GB/sec end-to-end, this was in 2008.)  The achilles heel was low memory bandwidth due to a core limitation of a single outstanding miss.  The new chip would have fixed that (and about 8x performance) but we ran out of money in 2009, which was not a good time to look for more.

We had delighted customers who appreciated the reliabiity and the network.  For latency limited codes we did extremely well (GUPS) and still did well on the rest from a flops/watt perspective.  However, lots of commercial prospects didn’t have codes that needed the network and did need single stream performance.  We talked to Urs Holtze at Google and he was very clear - they needed fast single threads.  The low power was very nice, … but we were welcome to try and parallelize their benchmarks.

Which brings me back to the original issue - does C constrain our architectural thinking? 

I’ve spent a fair amount of time recently digging into Nyx, which is an adaptive mesh refinement cosmological hydrodynamics code.  The framework is in C++ because the inheritance stuff makes it straightforward to adapt the AMR machinery to different problems.  This isn’t the kind of horrible C++ that you can’t tell what is going to happen, but pretty close to C style in which you can visualize what the compiler will do.  The “solvers” tend to be Fortran modules, because, I think, Fortran is just sensible about multidimensional arrays and indexing in a way you have to use weird macros to replicate in C.  It isn’t I think that C or C++ compilers cannot generate good code - it is about the syntax for arrays.

For anyone interested in architectural arm wrestling, memory IS the main issue.  It is worth reading the papers on BLIS, an analytical model for writing Basic Linear Algebra libraries.  Once you figure out the flops per byte, you are nearly done - the rest is complicated but straightforward code tuning.  Matrix multiply has O(n^3) computation for O(n^2) memory and that immediately says you can get close to 100% of the ALUs running if you have a clue about blocking in the caches.  This is just as easy or hard to do in C as in Fortran.  The kernels tend to wind up in asm(“”) no matter what you wish for just in order to get the prefetch instructions placed just so.  As far as I can tell, compilers still do not have very good models for cache hierarchies although there isn’t really any reason why they shouldn’t.  Similarly, if your code is mainly doing inner products, you are doomed to run at memory speeds rather than ALU speeds.  Multithreading usually doesn’t help, because often other cores are farther away than main memory.

My summary of the language question comes down to: if you knew what code would run fast, you could code it in C.  Thinking that a new language will explain how to make it run fast is just wishful thinking.  It just pushes the problem onto the compiler writers, and they don’t know how to code it to run fast either.  The only argument I like for new languages is that at least they might be able to let you describe the problem in a way that others will recognize.  I’m sure everyone here has has the sad experience of trying to figure out what is the idea behind a chunk of code.  Comments are usually useless.  I wind up trying to match the physics papers with the math against the code and it makes my brain hurt.  It sure would be nice if there were a series of representations between math and hardware transitioning from why to what to how.  I think that is was Steele was trying to do with Fortress.

I do think the current environment is the best for architectural innovation since the ‘90s.  We have The Machine, we have Dover Micro trying to add security, we have Microsoft’s EDGE stuff, and the multiway battle between Intel/AMD/ARM and the GPU guys and the FPGA guys.  It is a lot more interesting than 2005!  

> On 2018, Jun 28, at 11:37 AM, Clem Cole <clemc@ccc.com> wrote:
> 
> 
> 
> On Thu, Jun 28, 2018 at 10:40 AM, Larry McVoy <lm@mcvoy.com <mailto:lm@mcvoy.com>> wrote:
> Yep.  Lots of cpus are nice when doing a parallel make but there is 
> always some task that just uses one cpu.  And then you want the fastest
> one you can get.  Lots of wimpy cpus is just, um, wimpy.
> 
> ​Larry Stewart would be better to reply as SiCortec's CTO - but that was the basic logic behind their system -- lots of cheap MIPS chips. Truth is they made a pretty neat system and it scaled pretty well.   My observation is that they, like most of the attempts I have been a part, in the end architecture does not matter nearly as much as economics.
> 
> In my career I have build 4 or 5 specially architecture systems.  You can basically live through one or two generations using some technology argument and 'win'.   But in the end, people buy computers to do a job and the really don't give a s*t about how the job gets done, as long as it get done cheaply.​   Whoever wins the economic war has the 'winning' architecture.   Look x66/Intel*64 would never win awards as a 'Computer Science Architecture'  or in SW side; Fortran vs. Algol etc...; Windows beat UNIX Workstations for the same reasons... as well know.
> 
> Hey, I used to race sailboats ...  there is a term called a 'sea lawyer' - where you are screaming you have been fouled but you drowning as your boating is sinking.   I keep thinking about it here.   You can scream all you want about goodness or badness of architecture or language, but in the end, users really don't care.   They buy computers to do a job.   You really can not forget that is the purpose.
> 
> As Larry says: Lots of wimpy cpus is just wimpy.    Hey, Intel, nVidia and AMD's job is sell expensive hot rocks.   They are going to do what they can to make those rocks useful for people.  They want to help people get there jobs done -- period. That is what they do.   Amtel and RPi folks take the 'jelly bean' approach - which is one of selling enough it make it worth it for the chip manufacture and if the simple machine can do the customer job, very cool.  In those cases simple is good (hey the PDP-11 is pretty complex compared to say the 6502).
> 
> So, I think the author of the paper trashing as too high level C misses the point, and arguing about architecture is silly.  In the end it is about what it costs to get the job done.   People will use what it is the most economically for them.
> 
> Clem
> 


[-- Attachment #2: Type: text/html, Size: 10178 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 16:02               ` Larry McVoy
  2018-06-28 16:41                 ` Tim Bradshaw
@ 2018-06-28 20:37                 ` Perry E. Metzger
  1 sibling, 0 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-28 20:37 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

So lets not forget: the original question was, do people need to do a
lot of parallel and concurrent computing these days. Keep that in
mind while thinking about this.

On Thu, 28 Jun 2018 09:02:02 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> But for people who care about performance, and there are a lot of
> them, more but slower is less desirable than less but faster.

Sure, but where can you get fewer but faster any more? Yes, you can
get it up to a point, but that point tops out fast. Yes, you can get a
4GHz processor from Intel instead of a bunch of fairly low end
Broadcom ARM chips. However, that top end Intel chip isn't much faster
than it was a few years ago, and it's a pretty cheap and slow chip
compared to what people need, so one of them still isn't going to do
it for you, and so you still need to go parallel, and that means you
still need to write parallel code.

Thirty years ago, you could pay as much money as you wanted, up to
tens of millions, to get a faster single CPU for your work. The
difference between the processors on an IBM PC class machine and on a
Cray was Really Really Different. These days, Intel won't sell you
anything that costs more than a couple thousand dollars, and that
couple thousand dollar thing has many CPUs. The most you can pay per
core, for the highest end, is in the low hundreds depending on the
moment. Taking inflation into account, that's a silly low amount of
money. IBM will sell you some slightly more expensive high end POWER
stuff, but very few people buy that and besides that, there's pretty
much nothing.

So it doesn't matter even if you'd rather spend 100x to get a core
that's 10x faster than the top of what is offered, the 10x faster
thing doesn't exist. You're stuck. You've got top of the line 64 bit
x86 and maybe POWER and there's nothing else. 

So, yes, I agree, all things being equal, people will prefer to buy
the faster stuff, but at the moment, no one can get it, so instead,
we're in an age of loads of parallel machines and cores. Your maximal
fast core is in the hundreds of dollars, but you've got millions of
dollars of computing to do, so you buy tons of processors instead.

> People still care about performance and always will.  Yeah, for your
> laptop or whatever you could probably use what you have for the next
> 10 years and be fine.   But when you are doing real work, sorting
> the genome, machine learning, whatever, performance is a thing and
> lots of wimpy cpus are not.

For all those pieces of work, people use hundreds, thousands, or
hundreds of thousands of cores, depending on the job. Machine
learning, shotgun sequencing, etc., all depend on parallelism these
days. Sometimes people need to buy top end processors for that, but
even then, they have to buy a _ton_ of top end processors because any
given one is too small to do a significant fraction of the work.

So, circling back to the original discussion, languages that don't
let you express such algorithms well are now a problem.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 19:42           ` Perry E. Metzger
@ 2018-06-28 19:55             ` Paul Winalski
  2018-06-28 20:42             ` Warner Losh
  2018-06-28 20:52             ` Lawrence Stewart
  2 siblings, 0 replies; 65+ messages in thread
From: Paul Winalski @ 2018-06-28 19:55 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: tuhs

On 6/28/18, Perry E. Metzger <perry@piermont.com> wrote:
>
> Taking the other way of looking at it, from what I understand,
> CDN boxes are about I/O and not CPU, though I could be wrong.

That, and power consumption/heat dissipation.

-Paul W.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:56         ` Larry McVoy
  2018-06-28 15:07           ` Warner Losh
@ 2018-06-28 19:42           ` Perry E. Metzger
  2018-06-28 19:55             ` Paul Winalski
                               ` (2 more replies)
  1 sibling, 3 replies; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-28 19:42 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

On Thu, 28 Jun 2018 07:56:09 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> > Huge numbers of wimpy cores is the model already dominating the
> > world.   
> 
> Got a source that backs up that claim?  I was recently dancing with
> Netflix and they don't match your claim, nor do the other content
> delivery networks, they want every cycle they can get.

Netflix has how many machines? I'd say in general that principle
holds: this is the age of huge distributed computation systems, the
most you can pay for a single core before it tops out is in the
hundreds of dollars, not in the millions like it used to be. The high
end isn't very high up, and we scale by adding boxes and cores, not
by getting single CPUs that are unusually fast.

Taking the other way of looking at it, from what I understand,
CDN boxes are about I/O and not CPU, though I could be wrong. I can
ask some of the Netflix people, a former report of mine is one of the
people behind their front end cache boxes and we keep in touch.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 16:41                 ` Tim Bradshaw
  2018-06-28 16:59                   ` Paul Winalski
@ 2018-06-28 17:09                   ` Larry McVoy
  2018-06-29 15:32                     ` tfb
  1 sibling, 1 reply; 65+ messages in thread
From: Larry McVoy @ 2018-06-28 17:09 UTC (permalink / raw)
  To: Tim Bradshaw; +Cc: tuhs

On Thu, Jun 28, 2018 at 05:41:24PM +0100, Tim Bradshaw wrote:
> On 28 Jun 2018, at 17:02, Larry McVoy <lm@mcvoy.com> wrote:
> > But when you are doing real work, sorting
> > the genome, machine learning, whatever, performance is a thing and
> > lots of wimpy cpus are not.
> 
> But lots of (relatively) wimpy CPUs is what physics says you will have
> and you really can't argue with physics.

I'm not sure how people keep missing the original point.  Which was:
the market won't choose a bunch of wimpy cpus when it can get faster
ones.  It wasn't about the physics (which I'm not arguing with), it 
was about a choice between lots of wimpy cpus and a smaller number of
fast cpus.  The market wants the latter, as Ted said, Sun bet heavily
on the former and is no more.

If you want to bet on what Sun did, feel free, but do so knowing that
people have tried to tell you that is a failed approach.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 16:41                 ` Tim Bradshaw
@ 2018-06-28 16:59                   ` Paul Winalski
  2018-06-28 17:09                   ` Larry McVoy
  1 sibling, 0 replies; 65+ messages in thread
From: Paul Winalski @ 2018-06-28 16:59 UTC (permalink / raw)
  To: tuhs

n re modern architectures, admiral Grace Hopper gave a talk at DEC's
Nashua software development plant ca. 1982 on the future of computer
architecture.  She predicted the modern multi-core situation.  As she
put it, if you were a carter and had to haul double the load in your
cart, you wouldn't breed bigger a horse twice as big--you'd hook up a
team of horses.

Another problem with solving problems fast is I/O bandwidth.  Much of
digital audio file, for example, is an embarrassingly parallel
problem.  But no matter how many cores you throw at the problem, and
no matter how fast they are, the time it takes to process the audio
file is limited by how fast you can get the original off the disk and
the modified file back onto the disk.  Or main memory, for that
matter.  Relative to the processor speed, a cache miss takes an
eternity to resolve.  Get your cache management wrong and your program
ends up running an order of magnitude slower.  It's a throwback to the
situation in the 1960s, where compute speeds were comparable to main
memory speeds, but vastly higher than the I/O transmission rates to
disk and tape.  Only now first-level cache is the new "main memory",
and for practical purposes main memory is a slow storage medium.

-Paul W.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:15     ` Theodore Y. Ts'o
  2018-06-28 14:40       ` Larry McVoy
  2018-06-28 14:43       ` Perry E. Metzger
@ 2018-06-28 16:45       ` Paul Winalski
  2018-06-28 20:47         ` Perry E. Metzger
  2018-06-29 15:43         ` emanuel stiebler
  2018-06-29  2:02       ` Bakul Shah
  3 siblings, 2 replies; 65+ messages in thread
From: Paul Winalski @ 2018-06-28 16:45 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: tuhs

On 6/28/18, Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> It's the same mistake
> Chisnall made when he assrted the parallel programming a myth that
> humans writing parallel programs was "hard", and "all you needed" was
> the right language.

I''ve heard the "all you need is the right language" solution to the
parallel processing development problem since I joined DEC in 1980.
Here we are in 2018 and nobody's found that "right language" yet.

Parallel programming *is* hard for humans.  Very few people can cope
with it, or with the nasty bugs that crop up when you get it wrong.

> The problem is that not all people are interested in solving problems
> which are amenable to embarassingly parallel algorithms.

Most interesting problems in fact are not embarrassingly parallel.
They tend to have data interdependencies.

There have been some advancements in software development tools to
make parallel programming easier.  Modern compilers are getting pretty
good at loop analysis to discover opportunities for parallel execution
and vectorization in sequentially-written code.

-Paul W.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 16:02               ` Larry McVoy
@ 2018-06-28 16:41                 ` Tim Bradshaw
  2018-06-28 16:59                   ` Paul Winalski
  2018-06-28 17:09                   ` Larry McVoy
  2018-06-28 20:37                 ` Perry E. Metzger
  1 sibling, 2 replies; 65+ messages in thread
From: Tim Bradshaw @ 2018-06-28 16:41 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

On 28 Jun 2018, at 17:02, Larry McVoy <lm@mcvoy.com> wrote:
> 
> But when you are doing real work, sorting
> the genome, machine learning, whatever, performance is a thing and
> lots of wimpy cpus are not.

But lots of (relatively) wimpy CPUs is what physics says you will have and you really can't argue with physics.

I think this is definitely off-topic now so I won't reply further: feel free to mail me privately.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 15:39             ` Tim Bradshaw
@ 2018-06-28 16:02               ` Larry McVoy
  2018-06-28 16:41                 ` Tim Bradshaw
  2018-06-28 20:37                 ` Perry E. Metzger
  0 siblings, 2 replies; 65+ messages in thread
From: Larry McVoy @ 2018-06-28 16:02 UTC (permalink / raw)
  To: Tim Bradshaw; +Cc: tuhs

On Thu, Jun 28, 2018 at 04:39:58PM +0100, Tim Bradshaw wrote:
> > On 28 Jun 2018, at 15:58, Larry McVoy <lm@mcvoy.com> wrote:
> > 
> > You completely missed my point, I never said I was in favor of single
> > cpu systems, I said I the speed of a single cpu to be fast no matter
> > how many of them I get.  The opposite of wimpy.
>
> And this also misses the point, I think.  Defining a core as 'wimpy'
> or not is dependent on when you make the definition: the Cray-1 was not
> wimpy when it was built, but it is now.

That's not what I, or Ted, or the Market was saying.  We were not comparing
yesterday's cpu against todays.  We were saying that at any given moment,
a faster processor is better than more processors that are slower.

That's not an absolute, obviously.  If I am running AWS and I get 10x
the total CPU processing speed at the same power budget, yeah, that's
interesting.

But for people who care about performance, and there are a lot of them,
more but slower is less desirable than less but faster.  There too much
stuff that hasn't been (or can't be) parallelized (and I'll note here
that I'm the guy that built the first clustered server at Sun, I can
argue the parallel case just fine).

People still care about performance and always will.  Yeah, for your
laptop or whatever you could probably use what you have for the next
10 years and be fine.   But when you are doing real work, sorting
the genome, machine learning, whatever, performance is a thing and
lots of wimpy cpus are not.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:58           ` Larry McVoy
@ 2018-06-28 15:39             ` Tim Bradshaw
  2018-06-28 16:02               ` Larry McVoy
  0 siblings, 1 reply; 65+ messages in thread
From: Tim Bradshaw @ 2018-06-28 15:39 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

> On 28 Jun 2018, at 15:58, Larry McVoy <lm@mcvoy.com> wrote:
> 
> You completely missed my point, I never said I was in favor of single
> cpu systems, I said I the speed of a single cpu to be fast no matter
> how many of them I get.  The opposite of wimpy.

And this also misses the point, I think.  Defining a core as 'wimpy' or not is dependent on when you make the definition: the Cray-1 was not wimpy when it was built, but it is now.  The interesting question is what happens to the performance of serial code on a core over time.  For a long time it has increased, famously, approximately exponentially.  There is good evidence that this is no longer the case and that per-core performance will fall off (or has fallen off in fact) that curve and may even become asymptotically constant.  If that's true, then in due course *all cores will become 'wimpy'*, and to exploit the performance available from systems we will *have* to deal with parallelism.

(Note I've said 'core' not 'CPU' for clarity even when it's anachronistic: I never know what the right terminology is now.)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:40       ` Larry McVoy
  2018-06-28 14:55         ` Perry E. Metzger
@ 2018-06-28 15:37         ` Clem Cole
  2018-06-28 20:37           ` Lawrence Stewart
  1 sibling, 1 reply; 65+ messages in thread
From: Clem Cole @ 2018-06-28 15:37 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2457 bytes --]

On Thu, Jun 28, 2018 at 10:40 AM, Larry McVoy <lm@mcvoy.com> wrote:

> Yep.  Lots of cpus are nice when doing a parallel make but there is
> always some task that just uses one cpu.  And then you want the fastest
> one you can get.  Lots of wimpy cpus is just, um, wimpy.
>

​Larry Stewart would be better to reply as SiCortec's CTO - but that was
the basic logic behind their system -- lots of cheap MIPS chips. Truth is
they made a pretty neat system and it scaled pretty well.   My observation
is that they, like most of the attempts I have been a part, *in the end
architecture does not matter nearly as much as economics*.

In my career I have build 4 or 5 specially architecture systems.  You can
basically live through one or two generations using some technology
argument and 'win'.   But in the end, people buy computers to do a job and
the really don't give a s*t about how the job gets done, as long as it get
done cheaply.​   Whoever wins the economic war has the 'winning'
architecture.   Look x66/Intel*64 would never win awards as a 'Computer
Science Architecture'  or in SW side; Fortran *vs*. Algol *etc*...; Windows
beat UNIX Workstations for the same reasons... as well know.

Hey, I used to race sailboats ...  there is a term called a 'sea lawyer' -
where you are screaming you have been fouled but you drowning as your
boating is sinking.   I keep thinking about it here.   You can scream all
you want about goodness or badness of architecture or language, but in the
end, users really don't care.   They buy computers to do a job.   You
really can not forget that is the purpose.

As Larry says: Lots of wimpy cpus is just wimpy.    Hey, Intel, nVidia and
AMD's job is sell expensive hot rocks.   They are going to do what they can
to make those rocks useful for people.  They want to help people get there
jobs done -- period. That is what they do.   Amtel and RPi folks take the
'jelly bean' approach - which is one of selling enough it make it worth it
for the chip manufacture and if the simple machine can do the customer job,
very cool.  In those cases simple is good (hey the PDP-11 is pretty complex
compared to say the 6502).

So, I think the author of the paper trashing as too high level C misses the
point, and arguing about architecture is silly.  In the end it is about
what it costs to get the job done.   People will use what it is the most
economically for them.

Clem

[-- Attachment #2: Type: text/html, Size: 4730 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:56         ` Larry McVoy
@ 2018-06-28 15:07           ` Warner Losh
  2018-06-28 19:42           ` Perry E. Metzger
  1 sibling, 0 replies; 65+ messages in thread
From: Warner Losh @ 2018-06-28 15:07 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On Thu, Jun 28, 2018 at 8:56 AM, Larry McVoy <lm@mcvoy.com> wrote:

> On Thu, Jun 28, 2018 at 10:43:29AM -0400, Perry E. Metzger wrote:
> > On Thu, 28 Jun 2018 10:15:38 -0400 "Theodore Y. Ts'o" <tytso@mit.edu>
> > wrote:
> > > I'll note that Sun made a big bet (one of its last failed bets) on
> > > this architecture in the form of the Niagra architecture, with a
> > > large number of super "wimpy" cores.  It was the same basic idea
> > > --- we can't make big fast cores (since that would lead to high
> > > ILP's, complex register rewriting, and lead to cache-oriented
> > > security vulnerabilities like Spectre and Meltdown) --- so instead,
> > > let's make lots of tiny wimpy cores, and let programmers write
> > > highly threaded programs!  They essentially made a bet on the
> > > web-based microservice model which you are promoting.
> > >
> > > And the Market spoke.  And shortly thereafter, Java fell under the
> > > control of Oracle....  And Intel would proceed to further dominate
> > > the landscape.
> >
> > I'll be contrary for a moment.
> >
> > Huge numbers of wimpy cores is the model already dominating the
> > world.
>
> Got a source that backs up that claim?  I was recently dancing with
> Netflix and they don't match your claim, nor do the other content
> delivery networks, they want every cycle they can get.
>

Well, we want to be able to manage 100G or more of encrypted traffic sanely.

We currently get this by lots (well 20) of not-so-wimpy cores to do all the
work since none of the offload solutions can scale.

The problem is that there's no systems with lots (100's) of wimpy cores
that we can do the offload with that also have enough bandwidth to keep up.
And even if there were, things like NUMA and slow interprocessor connects
make the usefulness of the boatloads of cores a lot trickier to utilize
than it should....

Then again, a lot of what we do is rather special case, even if we do use
off the shelf technology to get there...

Warner

[-- Attachment #2: Type: text/html, Size: 2862 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:55         ` Perry E. Metzger
@ 2018-06-28 14:58           ` Larry McVoy
  2018-06-28 15:39             ` Tim Bradshaw
  0 siblings, 1 reply; 65+ messages in thread
From: Larry McVoy @ 2018-06-28 14:58 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: tuhs

On Thu, Jun 28, 2018 at 10:55:38AM -0400, Perry E. Metzger wrote:
> On Thu, 28 Jun 2018 07:40:17 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> > On Thu, Jun 28, 2018 at 10:15:38AM -0400, Theodore Y. Ts'o wrote:
> > > And the Market spoke.  And shortly thereafter, Java fell under the
> > > control of Oracle....  And Intel would proceed to further
> > > dominate the landscape.  
> > 
> > Yep.  Lots of cpus are nice when doing a parallel make but there is 
> > always some task that just uses one cpu.  And then you want the
> > fastest one you can get.  Lots of wimpy cpus is just, um, wimpy.
> 
> And yet, there are few single core devices I can buy any more other
> than embedded processors. Even the $35 Raspberry Pis are now four core
> machines, and I'm sure they'll be eight core devices soon.
> 
> If you want a single core Unix machine, you need to buy the $5
> Raspberry Pi Zero, which is the only single core Unix box I still can
> think of on the market.

You completely missed my point, I never said I was in favor of single
cpu systems, I said I the speed of a single cpu to be fast no matter
how many of them I get.  The opposite of wimpy.

Which was, I think, Ted's point as well when he said the market rejected
the idea of lots of wimpy cpus.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:43       ` Perry E. Metzger
@ 2018-06-28 14:56         ` Larry McVoy
  2018-06-28 15:07           ` Warner Losh
  2018-06-28 19:42           ` Perry E. Metzger
  0 siblings, 2 replies; 65+ messages in thread
From: Larry McVoy @ 2018-06-28 14:56 UTC (permalink / raw)
  To: Perry E. Metzger; +Cc: tuhs

On Thu, Jun 28, 2018 at 10:43:29AM -0400, Perry E. Metzger wrote:
> On Thu, 28 Jun 2018 10:15:38 -0400 "Theodore Y. Ts'o" <tytso@mit.edu>
> wrote:
> > I'll note that Sun made a big bet (one of its last failed bets) on
> > this architecture in the form of the Niagra architecture, with a
> > large number of super "wimpy" cores.  It was the same basic idea
> > --- we can't make big fast cores (since that would lead to high
> > ILP's, complex register rewriting, and lead to cache-oriented
> > security vulnerabilities like Spectre and Meltdown) --- so instead,
> > let's make lots of tiny wimpy cores, and let programmers write
> > highly threaded programs!  They essentially made a bet on the
> > web-based microservice model which you are promoting.
> >
> > And the Market spoke.  And shortly thereafter, Java fell under the
> > control of Oracle....  And Intel would proceed to further dominate
> > the landscape.
> 
> I'll be contrary for a moment.
> 
> Huge numbers of wimpy cores is the model already dominating the
> world. 

Got a source that backs up that claim?  I was recently dancing with
Netflix and they don't match your claim, nor do the other content
delivery networks, they want every cycle they can get.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:40       ` Larry McVoy
@ 2018-06-28 14:55         ` Perry E. Metzger
  2018-06-28 14:58           ` Larry McVoy
  2018-06-28 15:37         ` Clem Cole
  1 sibling, 1 reply; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-28 14:55 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

On Thu, 28 Jun 2018 07:40:17 -0700 Larry McVoy <lm@mcvoy.com> wrote:
> On Thu, Jun 28, 2018 at 10:15:38AM -0400, Theodore Y. Ts'o wrote:
> > And the Market spoke.  And shortly thereafter, Java fell under the
> > control of Oracle....  And Intel would proceed to further
> > dominate the landscape.  
> 
> Yep.  Lots of cpus are nice when doing a parallel make but there is 
> always some task that just uses one cpu.  And then you want the
> fastest one you can get.  Lots of wimpy cpus is just, um, wimpy.

And yet, there are few single core devices I can buy any more other
than embedded processors. Even the $35 Raspberry Pis are now four core
machines, and I'm sure they'll be eight core devices soon.

If you want a single core Unix machine, you need to buy the $5
Raspberry Pi Zero, which is the only single core Unix box I still can
think of on the market.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:15     ` Theodore Y. Ts'o
  2018-06-28 14:40       ` Larry McVoy
@ 2018-06-28 14:43       ` Perry E. Metzger
  2018-06-28 14:56         ` Larry McVoy
  2018-06-28 16:45       ` Paul Winalski
  2018-06-29  2:02       ` Bakul Shah
  3 siblings, 1 reply; 65+ messages in thread
From: Perry E. Metzger @ 2018-06-28 14:43 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: tuhs

On Thu, 28 Jun 2018 10:15:38 -0400 "Theodore Y. Ts'o" <tytso@mit.edu>
wrote:
> I'll note that Sun made a big bet (one of its last failed bets) on
> this architecture in the form of the Niagra architecture, with a
> large number of super "wimpy" cores.  It was the same basic idea
> --- we can't make big fast cores (since that would lead to high
> ILP's, complex register rewriting, and lead to cache-oriented
> security vulnerabilities like Spectre and Meltdown) --- so instead,
> let's make lots of tiny wimpy cores, and let programmers write
> highly threaded programs!  They essentially made a bet on the
> web-based microservice model which you are promoting.
>
> And the Market spoke.  And shortly thereafter, Java fell under the
> control of Oracle....  And Intel would proceed to further dominate
> the landscape.

I'll be contrary for a moment.

Huge numbers of wimpy cores is the model already dominating the
world. Clock rates aren't rising any longer, but (in spite of claims
to the contrary) Moore's law continues, very slightly with shrinkage
of feature size (which is about to end) and more dominantly with
increasing the number of transistors per square mil by going into
3D. Dynamic power usage also scales as the square of clock rate, so
larger numbers of lower clocked cores save a boatload of heat, and at
some point you have too many transistors in too small an area to take
heat out if you're generating too much.

Some data points:

1. All the largest compute platforms out there (Google, Amazon, etc.)
   are based on vast numbers of processors integrated into a giant
   distributed system. You might not see this as evidence for the
   trend, but it is. No one can make a single processor that's much
   faster than what you get for a few hundred bucks from Intel or AMD,
   so the only way to get more compute is to scale out, and this is
   now so common that no one even thinks of it as odd.

2. The most powerful compute engines out there within in a single box
   aren't Intel microprocessors, they're GPUs, and anyone doing really
   serious computing now uses GPUs to do it. Machine learning,
   scientific computing, etc. has become dependent on the things, and
   they're basically giant bunches of tiny processors. Ways to program
   these things have become very important.

   Oh, and your iPhone or Android device is now pretty lopsided. By
   far most of the compute power in it comes from its GPUs, though
   there are a ridiculous number of general purpose CPUs in these
   things too.

3. Even "normal" hacking on "normal" CPUs on a singe box now runs on
   lots of fairly wimpy processors. I do lots of compiler hacking
   these days, and my normal lab machine has 64 cores, 128
   hyperthreads, and a half T of RAM. It rebuilds one system I need to
   recompile a lot, which takes like 45 minutes to build on my laptop,
   in two minutes. Note that this box is both on the older side, the
   better ones in the lab have a lot more RAM, newer and better
   processors and more of them, etc.

   This box also costs a ridiculously small fraction of what I cost, a
   serious inversion of the old days when I started out and a machine
   cost a whole lot more compared to a human.

   Sadly my laptop is stalled out and hasn't gotten any better in
   forever, but the machines in the lab still keep getting
   better. However, the only way to take advantage of that is
   parallelism. Luckily parallel builds work pretty well.

Perry
-- 
Perry E. Metzger		perry@piermont.com

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28 14:15     ` Theodore Y. Ts'o
@ 2018-06-28 14:40       ` Larry McVoy
  2018-06-28 14:55         ` Perry E. Metzger
  2018-06-28 15:37         ` Clem Cole
  2018-06-28 14:43       ` Perry E. Metzger
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 65+ messages in thread
From: Larry McVoy @ 2018-06-28 14:40 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: tuhs

On Thu, Jun 28, 2018 at 10:15:38AM -0400, Theodore Y. Ts'o wrote:
> And the Market spoke.  And shortly thereafter, Java fell under the
> control of Oracle....  And Intel would proceed to further dominate the
> landscape.

Yep.  Lots of cpus are nice when doing a parallel make but there is 
always some task that just uses one cpu.  And then you want the fastest
one you can get.  Lots of wimpy cpus is just, um, wimpy.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-28  4:12   ` Bakul Shah
@ 2018-06-28 14:15     ` Theodore Y. Ts'o
  2018-06-28 14:40       ` Larry McVoy
                         ` (3 more replies)
  0 siblings, 4 replies; 65+ messages in thread
From: Theodore Y. Ts'o @ 2018-06-28 14:15 UTC (permalink / raw)
  To: Bakul Shah; +Cc: tuhs

Bakul, I think you and Steve have a very particular set of programming
use cases in mind, and are then over-generalizing this to assume that
these are the only problems that matter.  It's the same mistake
Chisnall made when he assrted the parallel programming a myth that
humans writing parallel programs was "hard", and "all you needed" was
the right language.

The problem is that not all people are interested in solving problems
which are amenable to embarassingly parallel algorithms.  Not all
programmers are interested in doing matrix multiply, or writing using
the latest hyped archiecture (whether you call by the new name,
"microservices", or the older hyped name which IBM tried to promote
"Service Oriented Architecture", or SOA).

I'll note that Sun made a big bet (one of its last failed bets) on
this architecture in the form of the Niagra architecture, with a large
number of super "wimpy" cores.  It was the same basic idea --- we
can't make big fast cores (since that would lead to high ILP's,
complex register rewriting, and lead to cache-oriented security
vulnerabilities like Spectre and Meltdown) --- so instead, let's make
lots of tiny wimpy cores, and let programmers write highly threaded
programs!  They essentially made a bet on the web-based microservice
model which you are promoting.

And the Market spoke.  And shortly thereafter, Java fell under the
control of Oracle....  And Intel would proceed to further dominate the
landscape.

					- Ted

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-27 16:00 ` Steve Johnson
@ 2018-06-28  4:12   ` Bakul Shah
  2018-06-28 14:15     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 65+ messages in thread
From: Bakul Shah @ 2018-06-28  4:12 UTC (permalink / raw)
  To: Steve Johnson; +Cc: tuhs

On Jun 27, 2018, at 9:00 AM, Steve Johnson <scj@yaccman.com> wrote:
> 
> I agree that C is a bad language for parallelism, and, like it or not, that's what today's hardware is giving us -- not speed, but many independent processors.  But I'd argue that its problem isn't that it is not low-level, but that it is not high-level enough.  A language like MATLAB, whose basic data object is an N-diemsional tensor, can make impressive use of parallel hardware.
> 
> Consider matrix multiplication.   Multiplying two NxN arrays to get another NxN array is a classic data-parallel problem -- each value in the result matrix is completely independent of every other one -- in theory, we could dedicate a processor to each output element, and would not need any cache coherency or locking mechanism -- just let them go at it -- the trickiest part is deciding you are finished.
> 
> The reason we know we are data parallel is not because of any feature of the language -- it's because of the mathematical structure of the problem.  While it's easy to write a matrix multiply function in C (as it is in most languages), just the fact that the arguments are pointers is enough to make data parallelism invisible from within the function.  You can bolt on additional features that, in effect, tell the compiler it should treat the inputs as independent and non-overlapping, but this is just the tip of the iceberg -- real parallel problems see this in spaces.  
> 
> The other hardware factor that comes into play is that hardware, especially memories, have physical limits in what they can do.  So the "ideal" matrix multiply with a processor for each output element would suffer because many of the processors would be trying to read the same memory at the same time.  Some would be bound to fail, requiring the ability to stack requests and restart them, as well as pause the processor until the data was available.   (note that, in this and many other cases, we don't need cache coherency because the input data is not changing while we are using it).  The obvious way around this is to divide the memory in to many small memories that are close to the processors, so memory access is not the bottleneck.
> 
> And this is where C (and Python) fall shortest.  The idea that there is one memory space of semi-infinite size, and all pointers point into it and all variables live in it almost forces attempts at parallelism to be expensive and performance-killing.  And yet, because of C's limited, "low-level" approach to data, we are stuck.  Being able to declare that something is a tensor that will be unchanging when used, can be distributed across many small memories to prevent data bottlenecks when reading and writing, and changed only in limited and controlled ways is the key to unlocking serious performance.
> 
> Steve
> 
> PS: for some further thoughts, see https://wavecomp.ai/blog/auto-hardware-and-ai

Very well put. The whole concept of address-spaces is rather
low level.

There is in fact a close parallel to this model that is in 
current use. Cloud computing is essentially a collection of
"micro-services", orchestrated to provide some higher level
service. External to some micro-service X, all other services
care about is how to reach X and what comm. protocol to use to
talk to it but not about any details of how it is implemented.
Here concerns are more about reliability, uptime, restarts,
updates, monitoring, load balancing, error handling, DoS,
security, access-control, latency, network address space &
traffic management, dynamic scaling, etc. A subset of these
concerns would apply to parallel computers as well.

Current cloud computing solutions to these problems are quite
messy, complex and heavyweight. There is a lot of scope here
for simplification.... 






^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-27 15:30             ` Paul Winalski
@ 2018-06-27 16:55               ` Tim Bradshaw
  0 siblings, 0 replies; 65+ messages in thread
From: Tim Bradshaw @ 2018-06-27 16:55 UTC (permalink / raw)
  To: Paul Winalski; +Cc: TUHS main list

On 27 Jun 2018, at 16:30, Paul Winalski <paul.winalski@gmail.com> wrote:
> 
> What Clem said.  Chisnall is right about C having been designed for a
> sequential-programming world.  That's why Fortran (with array and
> other parallel/vector operations built in) rules in the HPTC parallel
> programming space.  But I don't buy most of his arguments.  Making
> parallel programming easy and natural has been an unsolved problem
> during my entire 30+ year career in designing software development
> tools.  It's still an unsolved problem.  [...]

I think that's right.  The missing bit is that while once the only people who had to worry about processors with a lot of parallelism were the HPC people, who fortunately often had algorithms which parallelised rather well.  Now you have to worry about it if you want to write programs for the processor in your laptop and probably the processor in your watch.  Or you would, if the designers of those processors had not gone to heroic lengths to make them look like giant PDP-11s.  Unfortunately those heroic lengths haven't been heroic enough as has become apparent, and will presumably fall apart increasingly rapidly from now on.

So he's right: the giant PDP-11 thing is a disaster, but he's wrong about its cause: it's not caused by C, but by the fact that writing programs for what systems really need to look like is just an unsolved problem.  It might have helped if we had not spent forty years sweeping it busily under the carpet.

A thing that is also coming of course, which he does not talk about, is that big parallel machines are also going to start getting increasingly constrained by physics which means that a lot of the tricks that HPC people use will start to fall apart as well.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 17:54 Nelson H. F. Beebe
                   ` (2 preceding siblings ...)
  2018-06-26 19:01 ` Ronald Natalie
@ 2018-06-27 16:00 ` Steve Johnson
  2018-06-28  4:12   ` Bakul Shah
  3 siblings, 1 reply; 65+ messages in thread
From: Steve Johnson @ 2018-06-27 16:00 UTC (permalink / raw)
  To: Nelson H. F. Beebe, tuhs

[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]


I agree that C is a bad language for parallelism, and, like it or not,
that's what today's hardware is giving us -- not speed, but many
independent processors.  But I'd argue that its problem isn't that it
is not low-level, but that it is not high-level enough.  A language
like MATLAB, whose basic data object is an N-diemsional tensor, can
make impressive use of parallel hardware.

Consider matrix multiplication.   Multiplying two NxN arrays to get
another NxN array is a classic data-parallel problem -- each value in
the result matrix is completely independent of every other one -- in
theory, we could dedicate a processor to each output element, and
would not need any cache coherency or locking mechanism -- just let
them go at it -- the trickiest part is deciding you are finished.

The reason we know we are data parallel is not because of any feature
of the language -- it's because of the mathematical structure of the
problem.  While it's easy to write a matrix multiply function in C
(as it is in most languages), just the fact that the arguments are
pointers is enough to make data parallelism invisible from within the
function.  You can bolt on additional features that, in effect, tell
the compiler it should treat the inputs as independent and
non-overlapping, but this is just the tip of the iceberg -- real
parallel problems see this in spaces.  

The other hardware factor that comes into play is that hardware,
especially memories, have physical limits in what they can do.  So
the "ideal" matrix multiply with a processor for each output element
would suffer because many of the processors would be trying to read
the same memory at the same time.  Some would be bound to fail,
requiring the ability to stack requests and restart them, as well as
pause the processor until the data was available.   (note that, in
this and many other cases, we don't need cache coherency because the
input data is not changing while we are using it).  The obvious way
around this is to divide the memory in to many small memories that are
close to the processors, so memory access is not the bottleneck.

And this is where C (and Python) fall shortest.  The idea that there
is one memory space of semi-infinite size, and all pointers point into
it and all variables live in it almost forces attempts at parallelism
to be expensive and performance-killing.  And yet, because of C's
limited, "low-level" approach to data, we are stuck.  Being able to
declare that something is a tensor that will be unchanging when used,
can be distributed across many small memories to prevent data
bottlenecks when reading and writing, and changed only in limited and
controlled ways is the key to unlocking serious performance.

Steve

PS: for some further thoughts, see
https://wavecomp.ai/blog/auto-hardware-and-ai



[-- Attachment #2: Type: text/html, Size: 3018 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-27 14:33           ` Clem Cole
  2018-06-27 14:38             ` Clem Cole
@ 2018-06-27 15:30             ` Paul Winalski
  2018-06-27 16:55               ` Tim Bradshaw
  1 sibling, 1 reply; 65+ messages in thread
From: Paul Winalski @ 2018-06-27 15:30 UTC (permalink / raw)
  To: TUHS main list

What Clem said.  Chisnall is right about C having been designed for a
sequential-programming world.  That's why Fortran (with array and
other parallel/vector operations built in) rules in the HPTC parallel
programming space.  But I don't buy most of his arguments.  Making
parallel programming easy and natural has been an unsolved problem
during my entire 30+ year career in designing software development
tools.  It's still an unsolved problem.  Modern compiler technology
helps to find the hidden parallelism in algorithms expressed
sequentially, but I think the fundamental problem is that most human
beings have great difficulty conceptualizing parallel algorithms.
It's also always been true that to get maximum performance you have to
somehow get close to the specific hardware you're using--either by
explicitly programming for it, or by having a compiler do that for
you.

Note also that there have been extensions to C/C++ to support
parallelism.  Cilk, for example.

-Paul W.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-27 14:33           ` Clem Cole
@ 2018-06-27 14:38             ` Clem Cole
  2018-06-27 15:30             ` Paul Winalski
  1 sibling, 0 replies; 65+ messages in thread
From: Clem Cole @ 2018-06-27 14:38 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

I need to get a keyboard who keys don't stick.... sigh....  Clem
ᐧ

On Wed, Jun 27, 2018 at 10:33 AM, Clem Cole <clemc@ccc.com> wrote:

> I guess my take on it is mixed.   I see some of his points but over all I
> disagree with most of them.  I firmly believe if you look at anything long
> enough you will find flaws.  There is no perfect.   I think Fortran, C,
> even Algol are credits for more what people were able to think about at the
> time and how well they lasted.   As I have said in other places, Fortran is
> not going away. Clem Cole's answer is the Future of Fortran Programming
> Dead
> <https://www.quora.com/Is-the-future-of-Fortran-programming-dead/answer/Clem-Cole> also
> applies to C.   It's just not broken and he's wrong.   Go, Rust *et al*
> is not going to magically overtake C, just as Fortran as not been displaced
> in my lifetime (BTW, I >>like<< both Go and Rust and think they are
> interesting new languages).   He thinks C is not long a low level language
> because when Ken abstracted the PDP-7 into B and then Dennis abstracted the
> PDP-11 into C, the systems were simple.  The HW designers are in a giant
> fake out at this point, so things that used to work like 'register' no
> longer make sense.  Now its the compiler that binds to the primitives
> available to the functions under the covers and there is more to use than
> the PDP-11 and PDP-7 offered.    But wait, that is not always true.   So I
> think he's wrong.   I think you leave the language alone and if the HW
> moves on great.   But if we have a simple system like you have on the Amtel
> chips that most Arduino's and lots of other embedded C programs use, C is
> very low level and most his arguments go away.
>
> Cken
> ᐧ
>

[-- Attachment #2: Type: text/html, Size: 3009 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-27 11:26         ` Tony Finch
@ 2018-06-27 14:33           ` Clem Cole
  2018-06-27 14:38             ` Clem Cole
  2018-06-27 15:30             ` Paul Winalski
  0 siblings, 2 replies; 65+ messages in thread
From: Clem Cole @ 2018-06-27 14:33 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

I guess my take on it is mixed.   I see some of his points but over all I
disagree with most of them.  I firmly believe if you look at anything long
enough you will find flaws.  There is no perfect.   I think Fortran, C,
even Algol are credits for more what people were able to think about at the
time and how well they lasted.   As I have said in other places, Fortran is
not going away. Clem Cole's answer is the Future of Fortran Programming Dead
<https://www.quora.com/Is-the-future-of-Fortran-programming-dead/answer/Clem-Cole>
also
applies to C.   It's just not broken and he's wrong.   Go, Rust *et al* is
not going to magically overtake C, just as Fortran as not been displaced in
my lifetime (BTW, I >>like<< both Go and Rust and think they are
interesting new languages).   He thinks C is not long a low level language
because when Ken abstracted the PDP-7 into B and then Dennis abstracted the
PDP-11 into C, the systems were simple.  The HW designers are in a giant
fake out at this point, so things that used to work like 'register' no
longer make sense.  Now its the compiler that binds to the primitives
available to the functions under the covers and there is more to use than
the PDP-11 and PDP-7 offered.    But wait, that is not always true.   So I
think he's wrong.   I think you leave the language alone and if the HW
moves on great.   But if we have a simple system like you have on the Amtel
chips that most Arduino's and lots of other embedded C programs use, C is
very low level and most his arguments go away.

Cken
ᐧ

[-- Attachment #2: Type: text/html, Size: 2187 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 21:54       ` Ronald Natalie
  2018-06-26 21:59         ` Larry McVoy
@ 2018-06-27 11:26         ` Tony Finch
  2018-06-27 14:33           ` Clem Cole
  1 sibling, 1 reply; 65+ messages in thread
From: Tony Finch @ 2018-06-27 11:26 UTC (permalink / raw)
  To: Ronald Natalie; +Cc: tuhs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1142 bytes --]

Ronald Natalie <ron@ronnatalie.com> wrote:

> C is often a “You asked for it, you got it” type paradigm/

Sadly these days it's more like, you asked for a VAX, you got a
Deathstation 9000. (Sadly the classic DS9000 web page has disappeared
and it was never saved by archive.org.)

http://wikibin.org/articles/deathstation-9000.html

It's worth reading Chisnall's other paper (cited by the CACM article) on
formalizing de-facto C. The background for all this is that Robert
Watson's team in Cambridge's Computer Lab has been working on a
capability-secure RISC processor for a number of years, with the goal of
being able to retro-fit hardware accelerated memory security to existing
software. Which means running C on hardware that doesn't look much like a
VAX. So it's helpful to get a better idea of exactly how far you can
deviate from the gcc/clang model of DS9000.

https://dl.acm.org/citation.cfm?id=2908081

Tony.
-- 
f.anthony.n.finch  <dot@dotat.at>  http://dotat.at/
Southeast Iceland: Variable 3 or 4. Slight or moderate. Fog patches,
occasional rain at first. Moderate or good, occasionally very poor.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 22:20           ` Bakul Shah
  2018-06-26 22:33             ` Arthur Krewat
@ 2018-06-27  8:30             ` Tim Bradshaw
  1 sibling, 0 replies; 65+ messages in thread
From: Tim Bradshaw @ 2018-06-27  8:30 UTC (permalink / raw)
  To: Bakul Shah; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On 26 Jun 2018, at 23:20, Bakul Shah <bakul@bitblocks.com> wrote:
> 
> With new attacks like TLBleed etc. it is becoming increasingly clear that
> caching (hidden memory to continue with the illusion of a simple memory
> model) itself is a potential security issue. I didn't think anything the
> author said was particularly controversial any more. A lot of processor
> evolution seems to have been to accommodate C's simple memory model.

That's the strangest thing to see: *why do people think the point he's making is in any way controversial*, because it's so obvious.

(But then I'm also annoyed by the paper because I've been talking about 'giant PDP-11s' for a long time and now he's stolen (obviously not stolen: independently come up with) my term, pretty much.)

[-- Attachment #2: Type: text/html, Size: 4083 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 21:16   ` Arthur Krewat
  2018-06-26 21:50     ` Larry McVoy
@ 2018-06-27  6:27     ` arnold
  1 sibling, 0 replies; 65+ messages in thread
From: arnold @ 2018-06-27  6:27 UTC (permalink / raw)
  To: tuhs, krewat

Arthur Krewat <krewat@kilonet.net> wrote:

> Sometimes, I wonder... Programmers are supposed to be smarter than the 
> language. Not the other way around.
>
> art k.

After many years working in a variety of places, I have come to the
conclusion that Sturgeon's Law ("90% of everything is crud") applies to
working programmers as well.

:-(

W.R.T. the statement as given, the point is good; when a language is
huge and complex (yes C++, I'm looking at you) it becomes really hard
to be effective in it unless you are a super-star genius.

Arnold

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-27  0:11             ` Bakul Shah
@ 2018-06-27  6:10               ` arnold
  0 siblings, 0 replies; 65+ messages in thread
From: arnold @ 2018-06-27  6:10 UTC (permalink / raw)
  To: bakul, akosela; +Cc: tuhs

Bakul Shah <bakul@bitblocks.com> wrote:

> I primarily write code in Go these days and like it a lot (as
> a "better" C) but I am not sure it will have C's longevity.
> It still uses a flat shared memory model.

Digital Mars's D flips it around. Everything is thread-local storage
unless you explicitly mark something as shared. This makes a ton
of sense to me.

Arnold

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 22:33           ` Andy Kosela
@ 2018-06-27  0:11             ` Bakul Shah
  2018-06-27  6:10               ` arnold
  0 siblings, 1 reply; 65+ messages in thread
From: Bakul Shah @ 2018-06-27  0:11 UTC (permalink / raw)
  To: Andy Kosela; +Cc: tuhs

On Jun 26, 2018, at 3:33 PM, Andy Kosela <akosela@andykosela.com> wrote:
>  
> David Chisnall is known for pushing Go as a next generation C.  He even wrote a book about it.  I think he has a point in saying that Go was created as direct remedy to many things in C.  Most of it features come from decades of experience working with C, and seeing ways in which it can be improved.

I primarily write code in Go these days and like it a lot (as
a "better" C) but I am not sure it will have C's longevity.
It still uses a flat shared memory model. This is harder and
harder for hardware to emulate efficiently (and comes with
more complexity) at smaller and smaller minimum feature sizes
and higher & higher CPU clock rates & on-chip comm speeds. We
need something other than a better C to squeeze maximum
performance out of a CPU built out of 100s to 1000s of cores.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 22:33             ` Arthur Krewat
@ 2018-06-26 23:53               ` Bakul Shah
  0 siblings, 0 replies; 65+ messages in thread
From: Bakul Shah @ 2018-06-26 23:53 UTC (permalink / raw)
  To: Arthur Krewat; +Cc: tuhs

On Jun 26, 2018, at 3:33 PM, Arthur Krewat <krewat@kilonet.net> wrote:
> 
> On 6/26/2018 6:20 PM, Bakul Shah wrote:
>> it is becoming increasingly clear that
>> caching (hidden memory to continue with the illusion of a simple memory
>> model) itself is a potential security issue.
> 
> Then let's discuss why caching is the problem. If thread X reads memory location A, why is thread Y able to access that cached value? Shouldn't that cached value be associated with memory location A which I would assume would be in a protected space that thread Y shouldn't be able to access?
> 
> I know the nuts and bolts of how this cache exploit works, that's not what I'm asking.
> 
> What I'm asking is, why is cache accessible in the first place? Any cache offset should have the same memory protection as the value it represents. Isn't this the CPU manufacturer's fault?

As I understand it, the difference in cache access vs
other caches/memory access times allows for timing attacks.
By its nature a cache is much smaller than the next level
cache or memory so there will have to be a way to evict
stale data from it and there will be (false) sharing and
consequent access time difference. Knowledge of specific
attacks can help devise specific fixes but I don't think
we can say unequivocally we have seen the worst of it. 


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 22:20           ` Bakul Shah
@ 2018-06-26 22:33             ` Arthur Krewat
  2018-06-26 23:53               ` Bakul Shah
  2018-06-27  8:30             ` Tim Bradshaw
  1 sibling, 1 reply; 65+ messages in thread
From: Arthur Krewat @ 2018-06-26 22:33 UTC (permalink / raw)
  To: tuhs

On 6/26/2018 6:20 PM, Bakul Shah wrote:
> it is becoming increasingly clear that
> caching (hidden memory to continue with the illusion of a simple memory
> model) itself is a potential security issue.

Then let's discuss why caching is the problem. If thread X reads memory 
location A, why is thread Y able to access that cached value? Shouldn't 
that cached value be associated with memory location A which I would 
assume would be in a protected space that thread Y shouldn't be able to 
access?

I know the nuts and bolts of how this cache exploit works, that's not 
what I'm asking.

What I'm asking is, why is cache accessible in the first place? Any 
cache offset should have the same memory protection as the value it 
represents. Isn't this the CPU manufacturer's fault?


art k.






^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 21:59         ` Larry McVoy
  2018-06-26 22:20           ` Bakul Shah
@ 2018-06-26 22:33           ` Andy Kosela
  2018-06-27  0:11             ` Bakul Shah
  1 sibling, 1 reply; 65+ messages in thread
From: Andy Kosela @ 2018-06-26 22:33 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

On Tuesday, June 26, 2018, Larry McVoy <lm@mcvoy.com> wrote:

> On Tue, Jun 26, 2018 at 05:54:32PM -0400, Ronald Natalie wrote:
> > >
> > > So I agree, had the same initial reaction.  But I read the paper a
> > > second time and the point about Fortran, all these years later, still
> > > being a thing resonated.  The hardware guys stand on their heads to
> > > give us coherent caches.
> >
> > Fortran is a higher level language.    It gives he compiler more
> flexibility in deciding what the programmer intended and how to
> automatically optimize for the platform.
> > C is often a ???You asked for it, you got it??? type paradigm/
>
> I think you are more or less agreeing with the author.  (I also think, as
> Unix die hards, we all bridle a little when anyone dares to say anything
> negative about C.  We should resist that if it gets in the way of making
> things better.)
>
> The author at least has me thinking about how you could make a C like
> language that didn't ask as much from the hardware.
> --
>
>
David Chisnall is known for pushing Go as a next generation C.  He even
wrote a book about it.  I think he has a point in saying that Go was
created as direct remedy to many things in C.  Most of it features come
from decades of experience working with C, and seeing ways in which it can
be improved.

--Andy

[-- Attachment #2: Type: text/html, Size: 1668 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 21:59         ` Larry McVoy
@ 2018-06-26 22:20           ` Bakul Shah
  2018-06-26 22:33             ` Arthur Krewat
  2018-06-27  8:30             ` Tim Bradshaw
  2018-06-26 22:33           ` Andy Kosela
  1 sibling, 2 replies; 65+ messages in thread
From: Bakul Shah @ 2018-06-26 22:20 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

On Jun 26, 2018, at 2:59 PM, Larry McVoy <lm@mcvoy.com> wrote:
> 
> On Tue, Jun 26, 2018 at 05:54:32PM -0400, Ronald Natalie wrote:
>>> 
>>> So I agree, had the same initial reaction.  But I read the paper a 
>>> second time and the point about Fortran, all these years later, still
>>> being a thing resonated.  The hardware guys stand on their heads to
>>> give us coherent caches.  
>> 
>> Fortran is a higher level language.    It gives he compiler more flexibility in deciding what the programmer intended and how to automatically optimize for the platform.
>> C is often a ???You asked for it, you got it??? type paradigm/
> 
> I think you are more or less agreeing with the author.  (I also think, as
> Unix die hards, we all bridle a little when anyone dares to say anything
> negative about C.  We should resist that if it gets in the way of making
> things better.)

With new attacks like TLBleed etc. it is becoming increasingly clear that
caching (hidden memory to continue with the illusion of a simple memory
model) itself is a potential security issue. I didn't think anything the
author said was particularly controversial any more. A lot of processor
evolution seems to have been to accommodate C's simple memory model.

What is remarkable is how long this illusion has been maintained and
how far we have gotten with it.

> The author at least has me thinking about how you could make a C like 
> language that didn't ask as much from the hardware.

Erlang. Actor, vector & dataflow languages. Actually even C itself
can be used if it is used only on individual simple cores and instead
of caches any accessible memory is made explicit. Not sure if there
is a glue language for mapping & scheduling computation to a set of
simple cores with local memory and high speed links to their neighbors.



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 21:54       ` Ronald Natalie
@ 2018-06-26 21:59         ` Larry McVoy
  2018-06-26 22:20           ` Bakul Shah
  2018-06-26 22:33           ` Andy Kosela
  2018-06-27 11:26         ` Tony Finch
  1 sibling, 2 replies; 65+ messages in thread
From: Larry McVoy @ 2018-06-26 21:59 UTC (permalink / raw)
  To: Ronald Natalie; +Cc: tuhs

On Tue, Jun 26, 2018 at 05:54:32PM -0400, Ronald Natalie wrote:
> > 
> > So I agree, had the same initial reaction.  But I read the paper a 
> > second time and the point about Fortran, all these years later, still
> > being a thing resonated.  The hardware guys stand on their heads to
> > give us coherent caches.  
> 
> Fortran is a higher level language.    It gives he compiler more flexibility in deciding what the programmer intended and how to automatically optimize for the platform.
> C is often a ???You asked for it, you got it??? type paradigm/

I think you are more or less agreeing with the author.  (I also think, as
Unix die hards, we all bridle a little when anyone dares to say anything
negative about C.  We should resist that if it gets in the way of making
things better.)

The author at least has me thinking about how you could make a C like 
language that didn't ask as much from the hardware.
-- 
---
Larry McVoy            	     lm at mcvoy.com             http://www.mcvoy.com/lm 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 18:03 ` Cornelius Keck
  2018-06-26 21:21   ` Nelson H. F. Beebe
@ 2018-06-26 21:56   ` Kurt H Maier
  1 sibling, 0 replies; 65+ messages in thread
From: Kurt H Maier @ 2018-06-26 21:56 UTC (permalink / raw)
  To: Cornelius Keck; +Cc: tuhs

On Tue, Jun 26, 2018 at 01:03:24PM -0500, Cornelius Keck wrote:
> Now, that sounds interesting.. only hiccup is that I'm getting a "DOI 
> Not Found" for 10.1145/3209212. Could it be that the write-up is going 
> to take some time for the general public to see it?

ACM Communications is behind a paywall.  If you have a subscription it's
available here:

https://cacm.acm.org/magazines/2018/7/229036-c-is-not-a-low-level-language/abstract


khm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 21:50     ` Larry McVoy
@ 2018-06-26 21:54       ` Ronald Natalie
  2018-06-26 21:59         ` Larry McVoy
  2018-06-27 11:26         ` Tony Finch
  0 siblings, 2 replies; 65+ messages in thread
From: Ronald Natalie @ 2018-06-26 21:54 UTC (permalink / raw)
  To: Larry McVoy; +Cc: tuhs

> 
> So I agree, had the same initial reaction.  But I read the paper a 
> second time and the point about Fortran, all these years later, still
> being a thing resonated.  The hardware guys stand on their heads to
> give us coherent caches.  

Fortran is a higher level language.    It gives he compiler more flexibility in deciding what the programmer intended and how to automatically optimize for the platform.
C is often a “You asked for it, you got it” type paradigm/



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 21:16   ` Arthur Krewat
@ 2018-06-26 21:50     ` Larry McVoy
  2018-06-26 21:54       ` Ronald Natalie
  2018-06-27  6:27     ` arnold
  1 sibling, 1 reply; 65+ messages in thread
From: Larry McVoy @ 2018-06-26 21:50 UTC (permalink / raw)
  To: Arthur Krewat; +Cc: tuhs

On Tue, Jun 26, 2018 at 05:16:56PM -0400, Arthur Krewat wrote:
> 
> 
> On 6/26/2018 3:01 PM, Ronald Natalie wrote:
> >I???m not sure I buy his arguments.
> I was going to say it was total and complete BS, at least based on the
> quoted statement in the initial email. But I decided not to send it. ;)
> 
> I wrote a POSIX thread based queuing system a few years back that could
> handle thousands of threads on a dual processor SPARC-10 before it just
> completely locked up Solaris (I think 9). It was targeted at larger systems,
> and it could easily scale as far as I wanted it to.
> 
> While you could argue that pthreads are not "C", the language was quite
> happy doing what I asked of it.

So I agree, had the same initial reaction.  But I read the paper a 
second time and the point about Fortran, all these years later, still
being a thing resonated.  The hardware guys stand on their heads to
give us coherent caches.  

> Sometimes, I wonder... Programmers are supposed to be smarter than the
> language. Not the other way around.

That's a great quote.  But I do sort of grudgingly see the author's 
point of view, at least somewhat.

--lm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 18:03 ` Cornelius Keck
@ 2018-06-26 21:21   ` Nelson H. F. Beebe
  2018-06-26 21:56   ` Kurt H Maier
  1 sibling, 0 replies; 65+ messages in thread
From: Nelson H. F. Beebe @ 2018-06-26 21:21 UTC (permalink / raw)
  To: Cornelius Keck; +Cc: tuhs

>> DOI not found ...
>> Could it be that the write-up is going to take some time for the 
>> general public to see it?

Yes, that is a common problem with ACM journal publication
announcements: the URL

	https://dl.acm.org/citation.cfm?id=3209212

takes you today to a page that offers both HTML and PDF views of David
Chisnall's new article ``C is not a low-level language''.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
- 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 19:01 ` Ronald Natalie
@ 2018-06-26 21:16   ` Arthur Krewat
  2018-06-26 21:50     ` Larry McVoy
  2018-06-27  6:27     ` arnold
  0 siblings, 2 replies; 65+ messages in thread
From: Arthur Krewat @ 2018-06-26 21:16 UTC (permalink / raw)
  To: tuhs



On 6/26/2018 3:01 PM, Ronald Natalie wrote:
> I’m not sure I buy his arguments.
I was going to say it was total and complete BS, at least based on the 
quoted statement in the initial email. But I decided not to send it. ;)

I wrote a POSIX thread based queuing system a few years back that could 
handle thousands of threads on a dual processor SPARC-10 before it just 
completely locked up Solaris (I think 9). It was targeted at larger 
systems, and it could easily scale as far as I wanted it to.

While you could argue that pthreads are not "C", the language was quite 
happy doing what I asked of it.

Sometimes, I wonder... Programmers are supposed to be smarter than the 
language. Not the other way around.

art k.




^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 17:54 Nelson H. F. Beebe
  2018-06-26 18:03 ` Cornelius Keck
  2018-06-26 18:52 ` Ronald Natalie
@ 2018-06-26 19:01 ` Ronald Natalie
  2018-06-26 21:16   ` Arthur Krewat
  2018-06-27 16:00 ` Steve Johnson
  3 siblings, 1 reply; 65+ messages in thread
From: Ronald Natalie @ 2018-06-26 19:01 UTC (permalink / raw)
  To: Nelson H. F. Beebe; +Cc: tuhs

I’m not sure I buy his arguments.    First off, he argues that a true low level language requires knowledge of the “irrelevant” and then he goes and argues that with C you need such knowledge on other than a PDP-11.
His arguemnt that whowever he surveyed is ignorant of how C handles padding is equally pointless.     Further, his architecture world seems to be roughly limited to PDP-11’s and Intel x86 chips.

I’ve ported UNIX to a number of machines from the Denelcor HEP MIMD supercomputer to various micros from x86 to i860 etc…  In addition, I’ve ported high performance computer graphics applications to just about every UNIX platform aailable (my app is pretty much an OS into itself) including the x86 of various flavors, MIPS, 68000, Sparc, Stellar, Ardent, Apollo DN1000, HP9000, iTanium, Alpha, PA RISC, i860 (of various configurations), etc…   All done in C.   All done with exacting detail.   Yes, you do have to understand the underlying code being generated, and such is not as bad as he thinks.    In fact, all his arguments argue that C does fit his definition of the low level language.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 17:54 Nelson H. F. Beebe
  2018-06-26 18:03 ` Cornelius Keck
@ 2018-06-26 18:52 ` Ronald Natalie
  2018-06-26 19:01 ` Ronald Natalie
  2018-06-27 16:00 ` Steve Johnson
  3 siblings, 0 replies; 65+ messages in thread
From: Ronald Natalie @ 2018-06-26 18:52 UTC (permalink / raw)
  To: Nelson H. F. Beebe; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 432 bytes --]

Try this ilnk:  https://queue.acm.org/detail.cfm?id=3212479 <https://queue.acm.org/detail.cfm?id=3212479>

> On Jun 26, 2018, at 1:54 PM, Nelson H. F. Beebe <beebe@math.utah.edu> wrote:
> 
> There is a provocative article published today in the lastest issue of
> Communications of the ACM:
> 
> 	David Chisnall
> 	C is not a low-level language
> 	Comm ACM 61(7) 44--48 July 2018
> 	https://doi.org/10.1145/3209212
> 


[-- Attachment #2: Type: text/html, Size: 1267 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [TUHS] PDP-11 legacy, C, and modern architectures
  2018-06-26 17:54 Nelson H. F. Beebe
@ 2018-06-26 18:03 ` Cornelius Keck
  2018-06-26 21:21   ` Nelson H. F. Beebe
  2018-06-26 21:56   ` Kurt H Maier
  2018-06-26 18:52 ` Ronald Natalie
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 65+ messages in thread
From: Cornelius Keck @ 2018-06-26 18:03 UTC (permalink / raw)
  To: Nelson H. F. Beebe, tuhs

Now, that sounds interesting.. only hiccup is that I'm getting a "DOI 
Not Found" for 10.1145/3209212. Could it be that the write-up is going 
to take some time for the general public to see it?

Nelson H. F. Beebe wrote:
> There is a provocative article published today in the lastest issue of
> Communications of the ACM:
>
> 	David Chisnall
> 	C is not a low-level language
> 	Comm ACM 61(7) 44--48 July 2018
> 	https://doi.org/10.1145/3209212
>
> Because C is the implementation language of choice for a substantial
> part of the UNIX world, it seems useful to announce the new article to
> TUHS list members.
>
> David Chisnall discusses the PDP-11 legacy, the design of C, and the
> massive parallelism available in modern processors that is not so easy
> to exploit in C, particularly, portable C.  He also observes:
>
>>> ...
>>> A processor designed purely for speed, not for a compromise between
>>> speed and C support, would likely support large numbers of threads,
>>> have wide vector units, and have a much simpler memory model. Running
>>> C code on such a system would be problematic, so, given the large
>>> amount of legacy C code in the world, it would not likely be a
>>> commercial success.
>>> ...
>
> -------------------------------------------------------------------------------
> - Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
> - University of Utah                    FAX: +1 801 581 4148                  -
> - Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
> - 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
> - Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
> -------------------------------------------------------------------------------
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [TUHS]  PDP-11 legacy, C, and modern architectures
@ 2018-06-26 17:54 Nelson H. F. Beebe
  2018-06-26 18:03 ` Cornelius Keck
                   ` (3 more replies)
  0 siblings, 4 replies; 65+ messages in thread
From: Nelson H. F. Beebe @ 2018-06-26 17:54 UTC (permalink / raw)
  To: tuhs

There is a provocative article published today in the lastest issue of
Communications of the ACM:

	David Chisnall
	C is not a low-level language
	Comm ACM 61(7) 44--48 July 2018
	https://doi.org/10.1145/3209212

Because C is the implementation language of choice for a substantial
part of the UNIX world, it seems useful to announce the new article to
TUHS list members.

David Chisnall discusses the PDP-11 legacy, the design of C, and the
massive parallelism available in modern processors that is not so easy
to exploit in C, particularly, portable C.  He also observes:

>> ...
>> A processor designed purely for speed, not for a compromise between
>> speed and C support, would likely support large numbers of threads,
>> have wide vector units, and have a much simpler memory model. Running
>> C code on such a system would be problematic, so, given the large
>> amount of legacy C code in the world, it would not likely be a
>> commercial success.
>> ...

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
- 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2018-06-29 19:07 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29  1:06 [TUHS] PDP-11 legacy, C, and modern architectures Noel Chiappa
  -- strict thread matches above, loose matches on Subject: below --
2018-06-29  1:02 Noel Chiappa
2018-06-26 17:54 Nelson H. F. Beebe
2018-06-26 18:03 ` Cornelius Keck
2018-06-26 21:21   ` Nelson H. F. Beebe
2018-06-26 21:56   ` Kurt H Maier
2018-06-26 18:52 ` Ronald Natalie
2018-06-26 19:01 ` Ronald Natalie
2018-06-26 21:16   ` Arthur Krewat
2018-06-26 21:50     ` Larry McVoy
2018-06-26 21:54       ` Ronald Natalie
2018-06-26 21:59         ` Larry McVoy
2018-06-26 22:20           ` Bakul Shah
2018-06-26 22:33             ` Arthur Krewat
2018-06-26 23:53               ` Bakul Shah
2018-06-27  8:30             ` Tim Bradshaw
2018-06-26 22:33           ` Andy Kosela
2018-06-27  0:11             ` Bakul Shah
2018-06-27  6:10               ` arnold
2018-06-27 11:26         ` Tony Finch
2018-06-27 14:33           ` Clem Cole
2018-06-27 14:38             ` Clem Cole
2018-06-27 15:30             ` Paul Winalski
2018-06-27 16:55               ` Tim Bradshaw
2018-06-27  6:27     ` arnold
2018-06-27 16:00 ` Steve Johnson
2018-06-28  4:12   ` Bakul Shah
2018-06-28 14:15     ` Theodore Y. Ts'o
2018-06-28 14:40       ` Larry McVoy
2018-06-28 14:55         ` Perry E. Metzger
2018-06-28 14:58           ` Larry McVoy
2018-06-28 15:39             ` Tim Bradshaw
2018-06-28 16:02               ` Larry McVoy
2018-06-28 16:41                 ` Tim Bradshaw
2018-06-28 16:59                   ` Paul Winalski
2018-06-28 17:09                   ` Larry McVoy
2018-06-29 15:32                     ` tfb
2018-06-29 16:09                       ` Perry E. Metzger
2018-06-29 17:51                       ` Larry McVoy
2018-06-29 18:27                         ` Tim Bradshaw
2018-06-29 19:02                         ` Perry E. Metzger
2018-06-28 20:37                 ` Perry E. Metzger
2018-06-28 15:37         ` Clem Cole
2018-06-28 20:37           ` Lawrence Stewart
2018-06-28 14:43       ` Perry E. Metzger
2018-06-28 14:56         ` Larry McVoy
2018-06-28 15:07           ` Warner Losh
2018-06-28 19:42           ` Perry E. Metzger
2018-06-28 19:55             ` Paul Winalski
2018-06-28 20:42             ` Warner Losh
2018-06-28 21:03               ` Perry E. Metzger
2018-06-28 22:29                 ` Theodore Y. Ts'o
2018-06-29  0:18                   ` Larry McVoy
2018-06-29 15:41                     ` Perry E. Metzger
2018-06-29 18:01                       ` Larry McVoy
2018-06-29 19:07                         ` Perry E. Metzger
2018-06-29  5:58                   ` Michael Kjörling
2018-06-28 20:52             ` Lawrence Stewart
2018-06-28 21:07               ` Perry E. Metzger
2018-06-28 16:45       ` Paul Winalski
2018-06-28 20:47         ` Perry E. Metzger
2018-06-29 15:43         ` emanuel stiebler
2018-06-29  2:02       ` Bakul Shah
2018-06-29 12:58         ` Theodore Y. Ts'o
2018-06-29 18:41           ` Perry E. Metzger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).