From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 16 Nov 1995 12:39:43 -0500 From: rob@plan9.att.com rob@plan9.att.com Subject: Graphics issues Topicbox-Message-UUID: 34dba1be-eac8-11e9-9e20-41e7f4b1d025 Message-ID: <19951116173943.AvwHBF7Z_w4dlD2YiqV6Jj3xam9zdAB-ZqUws6qD74k@z> Carmack's suggestions fall into two classes: the efficient display of dynamic pictures, and the easy programming of multiple inputs. Plan 9 was designed for software development, not interactive games (OK, relax, we screwed up, we admit it) but it can be adapted to serve both needs. The efficient display of pictures can be done very differently. In fact, Carmack's suggestions cover about half of the complete redesign of that interface done in Brazil. It has become clear to us that the 'right' answer is a hybrid: a (much simpler than in Plan 9) procedural device to handle most programs' needs, and a much more direct route to the display for those few programs that want the system out of the way. Both can be supported in one place, and we now have some evidence to back us up. On our 100MHz R4400 machines we can write to a clipped window at pretty close to memmove() speeds, about 20 megabytes per second. This is done by a very different interface to the system, in which programs write images directly to a device that copies them to the window. In the local case, when all is favorable (as it would be running, say, Doom), the only overhead is one copy to the display directly from the local memory of the program, where the real display is maintained. (On a PC, the VGA business gets in the way, so this becomes two copies . With some non-portability (see below), this could be reduced to one.) Adding a Plan 9-style secondary access to the display has been tried several ways but the design is not finalized. It is necessary for efficient remote graphics. In Brazil, we have demos of software doing 30 frame video in a 640x480 window; actually, several such windows simultaneously. We reject methods that map the display because they cannot be implemented in a way that is portable to the application. Display addressing, byte order issues, pixel packing and so on are just too messy. You need one indirection to hide the details. Remember, we have many machines that are not PC's. Somewhat closer to this home, Carmack's observation that the bitmap read/write protocol should use rectangles rather than whole scan lines is right, and already part of Brazil. Our solution was to require the scan lines of the rectangle to start and end at byte boundaries, not pixels, so that a general bitblt is not called for. Using memmove and a little care, the performance can be good. On many machines, but not SPARC, unaligned memmove can be as fast as aligned because of special instructions or silicon, so the quantization to bytes is good enough. His further suggestions on this topic lead pretty well to the current design in Brazil. As usual, Carmack is right on in his comments about compiling on the fly on RISC machines. Caches cause trouble: their management is expensive, non-portable even between different computers with the same CPU, and probably costs more than is won back by the compilation. We are using other techniques now. (It is, of course, still sensible on x86 and 680x0 machines, but we are after one design.) As for events, I side with dhog. The Plan 9 model is general and easy to get right. If you need special support - and there is no doubt that games do - I would suggest providing a connection to the raw devices underneath, and synthesizing whatever else is needed at user level. That is, rather than build special devices that fold together multiple devices and demand non-universal hardware features (e.g. key up), it is better to build devices in the kernel that export the hardware interface as directly as is practical, and encapsulate the special properties and desires in adaptable, user-level code. With some care, even the window system could pass such devices through cleanly. Performance is not critical here: with human-driven input, the extra context switch and system call required would be insignificant on modern machines, especially when compared to the generation of a 30Hz image. The language betrays the bias: it gets rid of the need to fork processes just to watch blocking files What's wrong with forking processes? That's what they're for: watching and waiting. They offer a generality no supposedly general event mechanism ever can. If you really want events, they're easy to build at user level with the tools Plan 9 provides: processes, shared memory, synchronization, and file system interfaces. -rob