From mboxrd@z Thu Jan  1 00:00:00 1970
From: Rick Hohensee <humbubba@smarty.smart.net>
Message-Id: <200107300346.XAA25539@smarty.smart.net>
To: 9fans@cse.psu.edu
In-Reply-To: <E15QBgQ-0003bz-00@starfruit.iw3d.co.uk> from "Theo Honohan" at Jul 27, 2001 06:52:55 PM
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: [9fans] Forth-like Simplifications for Native One-Stack Code
Date: Sun, 29 Jul 2001 23:46:48 -0400
Topicbox-Message-UUID: d77806dc-eac9-11e9-9e20-41e7f4b1d025

Forth-like Simplifications for Native One-Stack Code

Rick Hohensee                   July 2001
www.clienux.com

key phrases:
 osimpa, copy-on-write parameter passing, familial variables, subroutine
arrays, compembler, caller-hikes, uniform frames

Commodity microcomputers are one-stack machines, the return stack. This
one stack supports subroutine calls and returns. It is typically further
employed for "local variables" and so on to instantiate the copies of data
routines use so that routines can be reentrant, names can be scoped, and
so on. Data is usually affiliated with a routine by being pushed onto the
return stack before the routine is called. This is usually the province of
a compiler, and the details vary. The nature of reentrance and similar is
such that there isn't a much better way to do these things on
one-stackers, and the issues are independant of implementation language,
if any. The same issues pertain to assembly language, so a bottom-up
approach is to provide these facilities in assembly and see what you get.

There is some mention in the documentation for the BCPL programming
language for the idea of the caller just making the space allocation for
the callee's frame, and leaving the actual data movement to the callee
perhaps. It only takes one instruction to make the space allocation, since
commodity CPUs have thier stacks in memory, and can be indexed, and thus
thier stacks are really arrays. This is the internals of the BCPL
compiler. The same idea can be applied to a "compembler", an assembler
with certain vestigial features of a compiler, which is perhaps the
Forth-like approach. Mechanisms just beyond preprocessor macros in
complexity can be trivially arranged to perform this allocation within a
defined routine. This leaves the populating of the callee's frame to the
assembly programmer. Another simple set of glorified macros can provide a
simple interface to the items in a stack frame. These are normally called
local variables. We just give them stock names. The locals of the current
routine can have lowercase single-letter names. They often do anyway, e.g.
"i", "j" and so on. We can't populate our own stack frame from just our
own stack frame, so we provide access to and names for the items in the
caller's frame. We'll consider the caller to be the parent, and call it's
locals pa, pb, pc... et cetera. We can do the same for routines we have
just called that have returned to the current routine, our children, so we
have ca, cb, cc... We now have enough features that we need to refer to
this system or method by a name. I am calling it "osimpa" at this point.

Osimpa has explicit names for some adequate number of locals in 3 frames,
parent, self and child. Half an alphabet each should suffice, giving about
45 "familial variables". The programmer now may implement copy-on-write
parameter-passing, and given the facilities mentioned so far, that is what
they are most likely to do. Osimpa as explained so far is still just a
compembler though, and any method one prefers to use to shoot one's self
in the foot is available at no crossing of semantic or syntactic borders.

The code I have at the moment for osimpa routine definitions allows the
programmer to specify the size of the hike to be made for it's stack
frame. That code will be going away. While attempting to implement
execution arrays I noticed another advantage of caller-hikes. There is no
time cost related to the size of a stack frame hike. It's one add to the
stack pointer. Subroutine arrays in particular, where each indexed code
section is a normally defined pre-existing routine, make for a compelling
reason to always allocate the same size frame. Using uniform frames
might be a previously unused advantage of caller-hikes.

Give every routine 15 data cells on the stack. This is free, and what a
routine actually uses of that is up to it. If you need more, rethink. This
means each parent generation's data is 0x10 cells above it's child's.
GreatGrandparent's "a" is self's "a" offset by 0x30. More to the point,
having all frames in a system be the same size allows an execution array
implementation that is simple, fast, and meshes well with pre-existing
code.

Consider an array of various osimpa-compembled routine call addresses.
That is, addresses of routines that assume the caller has allocated them a
frame. If they all drop the same size frame on return, choosing which one
to run can be very flexible.

There's two ways to do the choosing. self can jump to an index, and then
the indexed subroutine will return to self's caller, rather than to self.
If self does an indexed call rather than a jump, self or the compembler
has to get the table itself physically out of the way of the program
counter. either way is one instruction on x86.

   2 0003 FF24C3        jmp * (%ebx,%eax,4)
   3 0006 FF14C3        call * (%ebx,%eax,4)


Gone Coding