From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Thu, 16 Nov 1995 12:39:43 -0500
From: rob@plan9.att.com rob@plan9.att.com
Subject: Graphics issues
Topicbox-Message-UUID: 34dba1be-eac8-11e9-9e20-41e7f4b1d025
Message-ID: <19951116173943.AvwHBF7Z_w4dlD2YiqV6Jj3xam9zdAB-ZqUws6qD74k@z>

Carmack's suggestions fall into two classes:  the efficient display of
dynamic pictures, and the easy programming of multiple inputs.  Plan 9
was designed for software development, not interactive games (OK,
relax, we screwed up, we admit it) but it can be adapted to serve both
needs.

The efficient display of pictures can be done very differently.  In
fact, Carmack's suggestions cover about half of the complete redesign
of that interface done in Brazil.  It has become clear to us that the
'right' answer is a hybrid:  a (much simpler than in Plan 9)
procedural device to handle most programs' needs, and a much more
direct route to the display for those few programs that want the
system out of the way.  Both can be supported in one place, and we now
have some evidence to back us up.  On our 100MHz R4400 machines we can
write to a clipped window at pretty close to memmove() speeds, about
20 megabytes per second.  This is done by a very different interface
to the system, in which programs write images directly to a device
that copies them to the window.  In the local case, when all is
favorable (as it would be running, say, Doom), the only overhead is
one copy to the display directly from the local memory of the program,
where the real display is maintained.  (On a PC, the VGA business gets
in the way, so this becomes two copies .  With some non-portability
(see below), this could be reduced to one.)  Adding a Plan 9-style
secondary access to the display has been tried several ways but the
design is not finalized.  It is necessary for efficient remote
graphics.

In Brazil, we have demos of software doing 30 frame video in a 640x480
window; actually, several such windows simultaneously.

We reject methods that map the display because they cannot be
implemented in a way that is portable to the application.  Display
addressing, byte order issues, pixel packing and so on are just too
messy.  You need one indirection to hide the details.  Remember, we
have many machines that are not PC's.

Somewhat closer to this home, Carmack's observation that the bitmap
read/write protocol should use rectangles rather than whole scan lines
is right, and already part of Brazil.  Our solution was to require the
scan lines of the rectangle to start and end at byte boundaries, not
pixels, so that a general bitblt is not called for.  Using memmove and
a little care, the performance can be good.  On many machines, but not
SPARC, unaligned memmove can be as fast as aligned because of special
instructions or silicon, so the quantization to bytes is good enough.

His further suggestions on this topic lead pretty well to the current
design in Brazil.

As usual, Carmack is right on in his comments about compiling on the
fly on RISC machines.  Caches cause trouble:  their management is
expensive, non-portable even between different computers with the same
CPU, and probably costs more than is won back by the compilation.  We
are using other techniques now.  (It is, of course, still sensible on
x86 and 680x0 machines, but we are after one design.)

As for events, I side with dhog.  The Plan 9 model is general and easy
to get right.  If you need special support - and there is no doubt
that games do - I would suggest providing a connection to the raw
devices underneath, and synthesizing whatever else is needed at user
level.  That is, rather than build special devices that fold together
multiple devices and demand non-universal hardware features (e.g.  key
up), it is better to build devices in the kernel that export the
hardware interface as directly as is practical, and encapsulate the
special properties and desires in adaptable, user-level code.  With
some care, even the window system could pass such devices through
cleanly.  Performance is not critical here:  with human-driven input,
the extra context switch and system call required would be
insignificant on modern machines, especially when compared to the
generation of a 30Hz image.

The language betrays the bias:
        it gets rid of the need to fork processes just to watch
        blocking files
What's wrong with forking processes?  That's what they're for:
watching and waiting.  They offer a generality no supposedly general
event mechanism ever can.  If you really want events, they're easy to
build at user level with the tools Plan 9 provides:  processes, shared
memory, synchronization, and file system interfaces.

-rob