From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rick Hohensee Message-Id: <200107300346.XAA25539@smarty.smart.net> To: 9fans@cse.psu.edu In-Reply-To: from "Theo Honohan" at Jul 27, 2001 06:52:55 PM MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: [9fans] Forth-like Simplifications for Native One-Stack Code Date: Sun, 29 Jul 2001 23:46:48 -0400 Topicbox-Message-UUID: d77806dc-eac9-11e9-9e20-41e7f4b1d025 Forth-like Simplifications for Native One-Stack Code Rick Hohensee July 2001 www.clienux.com key phrases: osimpa, copy-on-write parameter passing, familial variables, subroutine arrays, compembler, caller-hikes, uniform frames Commodity microcomputers are one-stack machines, the return stack. This one stack supports subroutine calls and returns. It is typically further employed for "local variables" and so on to instantiate the copies of data routines use so that routines can be reentrant, names can be scoped, and so on. Data is usually affiliated with a routine by being pushed onto the return stack before the routine is called. This is usually the province of a compiler, and the details vary. The nature of reentrance and similar is such that there isn't a much better way to do these things on one-stackers, and the issues are independant of implementation language, if any. The same issues pertain to assembly language, so a bottom-up approach is to provide these facilities in assembly and see what you get. There is some mention in the documentation for the BCPL programming language for the idea of the caller just making the space allocation for the callee's frame, and leaving the actual data movement to the callee perhaps. It only takes one instruction to make the space allocation, since commodity CPUs have thier stacks in memory, and can be indexed, and thus thier stacks are really arrays. This is the internals of the BCPL compiler. The same idea can be applied to a "compembler", an assembler with certain vestigial features of a compiler, which is perhaps the Forth-like approach. Mechanisms just beyond preprocessor macros in complexity can be trivially arranged to perform this allocation within a defined routine. This leaves the populating of the callee's frame to the assembly programmer. Another simple set of glorified macros can provide a simple interface to the items in a stack frame. These are normally called local variables. We just give them stock names. The locals of the current routine can have lowercase single-letter names. They often do anyway, e.g. "i", "j" and so on. We can't populate our own stack frame from just our own stack frame, so we provide access to and names for the items in the caller's frame. We'll consider the caller to be the parent, and call it's locals pa, pb, pc... et cetera. We can do the same for routines we have just called that have returned to the current routine, our children, so we have ca, cb, cc... We now have enough features that we need to refer to this system or method by a name. I am calling it "osimpa" at this point. Osimpa has explicit names for some adequate number of locals in 3 frames, parent, self and child. Half an alphabet each should suffice, giving about 45 "familial variables". The programmer now may implement copy-on-write parameter-passing, and given the facilities mentioned so far, that is what they are most likely to do. Osimpa as explained so far is still just a compembler though, and any method one prefers to use to shoot one's self in the foot is available at no crossing of semantic or syntactic borders. The code I have at the moment for osimpa routine definitions allows the programmer to specify the size of the hike to be made for it's stack frame. That code will be going away. While attempting to implement execution arrays I noticed another advantage of caller-hikes. There is no time cost related to the size of a stack frame hike. It's one add to the stack pointer. Subroutine arrays in particular, where each indexed code section is a normally defined pre-existing routine, make for a compelling reason to always allocate the same size frame. Using uniform frames might be a previously unused advantage of caller-hikes. Give every routine 15 data cells on the stack. This is free, and what a routine actually uses of that is up to it. If you need more, rethink. This means each parent generation's data is 0x10 cells above it's child's. GreatGrandparent's "a" is self's "a" offset by 0x30. More to the point, having all frames in a system be the same size allows an execution array implementation that is simple, fast, and meshes well with pre-existing code. Consider an array of various osimpa-compembled routine call addresses. That is, addresses of routines that assume the caller has allocated them a frame. If they all drop the same size frame on return, choosing which one to run can be very flexible. There's two ways to do the choosing. self can jump to an index, and then the indexed subroutine will return to self's caller, rather than to self. If self does an indexed call rather than a jump, self or the compembler has to get the table itself physically out of the way of the program counter. either way is one instruction on x86. 2 0003 FF24C3 jmp * (%ebx,%eax,4) 3 0006 FF14C3 call * (%ebx,%eax,4) Gone Coding