The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Early Unix function calls: expensive?
@ 2016-01-04  1:59 Norman Wilson
  2016-01-04 15:09 ` John Cowan
  0 siblings, 1 reply; 33+ messages in thread
From: Norman Wilson @ 2016-01-04  1:59 UTC (permalink / raw)


As late as 1990, every UNIX I knew of still used the
expensive calls/ret instructions for subroutine calls.
I vaguely remember a consensus (and I certainly shared
the feeling) that in hindsight it would have been better
to use jsb/rsb, but changing everything would have been
so much work that nobody wanted to do it.

1990 was already past peak VAX in the UNIX world, so
I can't imagine anyone bothering to make such a change
to an existing system after then.  Especially a system
that already had many existing installations who would
have to deal with the resulting compatibility problem.

During the latter part of the 1990s, I was actively
supporting a private UNIX system just for myself on
a few MicroVAXes at home.  One of the things I did
was to write a VAX code generator for the then-current
version of lcc (the one around which the book was
written), so as to have an ISO-compatible compiler
and convert all of /usr/src (not so big even in those
days) to ISO.  It was an interesting exercise and I
learned a lot, but even then, I wasn't brave enough to
adopt an incompatible subroutine-calling convention.

Another big time waste in the original VAX UNIX was
the system-call interface: arguments were left on the
stack (where they had been put before calling the
syscall stub routine in libc); the kernel then had
to do a full-fledged copyin to get them.  It occurred
to me more than once to change the convention and have
the syscall stubs copy the arguments into registers
before executing the chmk (syscall) instruction.
That instruction didn't touch the registers; the
kernel saved them early in the chmk trap routine,
in its own address space, so no copying or access
checking would have been required to fetch their
call-time contents.

That would still have been a messy change to make,
because I'd have to be sure every program had been
relinked with the new-style libc before changing the
kernel.  (This was a system without shared libraries.)
But on a personal system it would have been doable.
I never did.

It's possible that current UNIX-descended/cloned systems
that have VAX ports, like Linux or Open/Free/NetBSD,
have had a chance to start over and do better on
subroutine calls and system calls.  Does anyone know?

Norman Wilson
Toronto ON


^ permalink raw reply	[flat|nested] 33+ messages in thread
* [TUHS] Early Unix function calls: expensive?
@ 2016-01-04 12:53 Noel Chiappa
  0 siblings, 0 replies; 33+ messages in thread
From: Noel Chiappa @ 2016-01-04 12:53 UTC (permalink / raw)


    > that's 28+13 = 41 memory cycles.
    > ...
    > purely in overhead (counting putting the args on the stack as overhead).

Oh, I missed an instruction for de-stacking the arguments, which was typically
something like 'add #N, sp', so another two instruction word fetches, or 43
cycles.

Ironically, if N=4, the compiler used to emit a 'cmp (sp)+, (sp)+', which is
more efficient space-wise (one word instead of two), but less time-wise
(3 cycles instead of 2).

   Noel


^ permalink raw reply	[flat|nested] 33+ messages in thread
* [TUHS] Early Unix function calls: expensive?
@ 2016-01-04  2:21 Clem Cole
  0 siblings, 0 replies; 33+ messages in thread
From: Clem Cole @ 2016-01-04  2:21 UTC (permalink / raw)


Folks remember, VAX was not designed with UNIX in mind.  It had two primary
influences, assembly programmers (Cutler et al) and FORTRAN compiler
writers.  The truth is, the Vax was incredibly successful in both UNIX and
its intended OS (VMS) sites, even if a number of the instructions it has
were ignored by the C compiler writers.  The fact the C did not map to it
as well as it would for other architectures later is not surprising given
the design constraints - C and UNIX were not part of the design. But it was
good enough (actually pretty darned good for the time) and was very, very
successful - I certainly stopped running a PDP11 when Vaxen were generally
available.  I would not stop doing that until the 68000 based workstations
came along.

From my own experience, when Dave (Patterson) was writing the RISC papers
in the early 1980s, a number of us ex-industry grad student types @ USB
were taking his architecture course having just come off some very
successful systems from the Vax, DG Eagle, Pr1me 750, etc.. [I'll leave the
names of said parties off to protect the innocent].  But what I will say is
that the four of used sit in the back of his calls and chuckle.  We used to
remind Dave that a lot of the choices that were made on those machines, we
not for "CS" style reasons.  IMO: Dave really did not "get it" -- all of
those system designers did make architectural choices, but the drivers were
the code base from the customer sites not how how well spell or grep
worked.   And those commercial systems generally did mapped well at what
the designers considered and >>why<< those engineers considered what they
did  [years later a HBS professor Clay Christensen's book explained why].

I've said this in other forums, but I contend that when we used pure CS to
design world's greatest pure computer architecture (Alpha) we ultimately
failed in the market.  The computer architecture was extremely successful
and many of miss it.   Hey, I now work for a company with one of the worst
instruction sets/ISA from a  Computer Science standpoint - INTEL*64 (C),
and like the Vax, it's easy to throw darts at the architecture from a
purity standpoint.   Alpha was great, C and other languages map to it well,
and the designers followed all of the CS knowledge at the time.   But as a
>>system<< it could not compete with the disruption caused by the 386 and
later it's child, INTEL*64.   And like Vaxen, INTEL*64 is ugly, but it
continues to win because of the economics.

At Intel we look at very specific codes and how they map and the choices of
what new things to add, how the system morphs are directly defined by what
we see from customers and in the case of scientific codes, how well the
FORTRAN compiler can exploit it -- because it is the same places (the
national labs and very large end users like weather, automotive, oil/gas or
life sciences) that have the same Fortran code that still need to run ;-)
 This is just want DEC did years ago with the VAX (and Alpha).


As an interesting footnote, the DNA from the old DEC Fortran compiler lives
on "ifort" (and icc).   Some of the same folks are still working on the
code generator, although they are leaving us fairly rapidly as they
approach and pass their 70s.  But that's a different story ;-)

So the question is not a particular calling sequence or set of instructions
is good, you need to look at the entire economics of the system - which to
me begs the question of if the smartphone/tablet and ARM be the disruptor
to INTEL*64 - time will tell.

Clem


On Sun, Jan 3, 2016 at 7:42 PM, <scj at yaccman.com
<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=scj at yaccman.com>> wrote:

> Well, I certainly said this on several occasions, and the fact that it is
> recorded more or less exactly as I remember saying it suggests that I may
> have even written it somewhere, but if so, I don't recall where...
>
> As part of the PCC work, I wrote a technical report on how to design a C
> calling sequence, but that was before the VAX.  Early calling sequences
> had both a stack pointer and a frame pointer, but for most machines it
> was possible to get by with just one, so calling sequences got better as
> time went on.  Also, RISC machines with many more registers than the
> PDP-11 also led to more efficient calls by putting some arguments in
> registers.  Later standardizations like varargs were painful on some
> architectures (especially those which had different registers for pointers
> and integers).
>
> The CALLS instruction was indeed a pig -- a space-time tradeoff in the
> wrong direction!  For languages like FORTRAN it might have been justified,
> but for C it was awful.  It is my memory too that CALLS was abandoned,
> perhaps first at UCB.  But I actually had little hands-on experience with
> the VAX C compiler...
>
> Steve
>
>
>
>
> > I just re-found a quote about Unix processes that I'd "lost". It's by
> > Steve Johnson:
> >
> >     Dennis Ritchie encouraged modularity by telling all and sundry that
> >     function calls were really, really cheap in C. Everybody started
> >     writing small functions and modularizing. Years later we found out
> >     that function calls were still expensive on the PDP-11, and VAX code
> >     was often spending 50% of its time in the CALLS instruction. Dennis
> >     had lied to us! But it was too late; we were all hooked...
> >     http://www.catb.org/esr/writings/taoup/html/modularitychapter.html
> >
> > Steve, can you recollect when you said this, was it just a quote for
> > Eric's book or did it come from elsewhere?
> >
> > Does anybodu have a measure of the expense of function calls under Unix
> > on either platform?
> >
> > Cheers, Warren
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20160103/875cf921/attachment.html>


^ permalink raw reply	[flat|nested] 33+ messages in thread
* [TUHS] Early Unix function calls: expensive?
@ 2016-01-04  1:31 Noel Chiappa
  2016-01-04  2:24 ` scj
  0 siblings, 1 reply; 33+ messages in thread
From: Noel Chiappa @ 2016-01-04  1:31 UTC (permalink / raw)


    > From: Warren Toomey

    > I just re-found a quote about Unix processes
    > ..
    >  Years later we found out that function calls were still expensive
    >  on the PDP-11
    > ..
    > Does anybodu have a measure of the expense of function calls under Unix
    > on either platform?

Procedure calls were not cheap on the PDP-11 with the V6/V6 C compiler (which
admittedly was not the most efficient with small routines, since it always
saved all three non-temporary registers, no matter whether the called routine
used them or not).

This was especially true when compared to the code produced by the compiler
with the optimizer turned on, if the programmer was careful about allocating
'register' variables, which was pretty good.

On most PDP-11's, the speed was basically linearly related to the number of
memory references (both instruction fetch, and data), since most -11 CPU's
were memory-bound for most instructions. So for that compiler, a subroutine
call had a fair amount of overhead:

	inst	data

call	4	1
	2	0	if any automatic variables
	1	1	minimum per single-word argument

csv	9	5

cret	9	5

(In V7, someone managed to bum one cycle out of csv, taking it down to 8+5.)

So assume a pair of arguments which were not register variables (i.e.
automatics, or in structures pointed to by register variables), and some
automatics in the called routine, and that's 4+2 for the arguments, plus 6+1,
a subtotal of 10+3; add in csv and cret, that's 28+13 = 41 memory cycles.

On a typical machine like an 11/40 or 11/23, which had roughly 1 megacycle
memory throughput, that meant 40 usec (on a 1 MIP machine) to do a procedure
call, purely in overhead (counting putting the args on the stack as overhead).

We found that, even with the limited memory on the -11, it made sense to run
the time/space tradeoff the other way for short things like queue
insertion/removal, and do them as macros.

A routine had to be pretty lengthy before it was worth paying the overhead, in
order to amortize the calling cost across a fair amount of work (although of
course, getting access to another three register variables could make the
compiled output for the routine somewhat shorter).

	Noel


^ permalink raw reply	[flat|nested] 33+ messages in thread
[parent not found: <mailman.3.1451865187.15972.tuhs@minnie.tuhs.org>]
* [TUHS] Early Unix function calls: expensive?
@ 2016-01-03 23:35 Warren Toomey
  2016-01-03 23:53 ` Tim Bradshaw
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Warren Toomey @ 2016-01-03 23:35 UTC (permalink / raw)


I just re-found a quote about Unix processes that I'd "lost". It's by
Steve Johnson:

    Dennis Ritchie encouraged modularity by telling all and sundry that
    function calls were really, really cheap in C. Everybody started
    writing small functions and modularizing. Years later we found out
    that function calls were still expensive on the PDP-11, and VAX code
    was often spending 50% of its time in the CALLS instruction. Dennis
    had lied to us! But it was too late; we were all hooked...
    http://www.catb.org/esr/writings/taoup/html/modularitychapter.html

Steve, can you recollect when you said this, was it just a quote for
Eric's book or did it come from elsewhere?

Does anybodu have a measure of the expense of function calls under Unix
on either platform?

Cheers, Warren


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-01-05 23:55 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-04  1:59 [TUHS] Early Unix function calls: expensive? Norman Wilson
2016-01-04 15:09 ` John Cowan
  -- strict thread matches above, loose matches on Subject: below --
2016-01-04 12:53 Noel Chiappa
2016-01-04  2:21 Clem Cole
2016-01-04  1:31 Noel Chiappa
2016-01-04  2:24 ` scj
2016-01-04  4:24   ` Larry McVoy
     [not found] <mailman.3.1451865187.15972.tuhs@minnie.tuhs.org>
2016-01-04  1:08 ` Johnny Billquist
2016-01-04  1:29   ` Larry McVoy
2016-01-03 23:35 Warren Toomey
2016-01-03 23:53 ` Tim Bradshaw
2016-01-04  0:01   ` John Cowan
2016-01-04  4:40     ` Armando Stettner
2016-01-04  8:52       ` Tim Bradshaw
2016-01-04 17:29         ` Larry McVoy
2016-01-04 13:50       ` Clem Cole
2016-01-05  2:00       ` Ronald Natalie
2016-01-05 15:13         ` Clem Cole
2016-01-05 16:46           ` John Cowan
2016-01-05 17:33             ` Diomidis Spinellis
2016-01-05 17:42             ` Clem Cole
2016-01-05 17:28           ` Ronald Natalie
2016-01-05 17:43             ` Clem Cole
2016-01-05 17:46               ` Ronald Natalie
2016-01-05 18:03                 ` Warner Losh
2016-01-05 18:24                   ` Ronald Natalie
2016-01-05 20:26                     ` scj
2016-01-05 20:49                     ` John Cowan
2016-01-05 23:24         ` Dave Horsfall
2016-01-05 23:55           ` Ronald Natalie
2016-01-04  0:00 ` John Cowan
2016-01-04  0:42 ` scj
2016-01-04 11:35   ` Tony Finch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).