From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Sun, 19 Nov 1995 10:31:35 -0500
From: Dan Hildebrand danh@qnx.com
Subject: Graphics issues
Topicbox-Message-UUID: 351badea-eac8-11e9-9e20-41e7f4b1d025
Message-ID: <19951119153135.EUFZ-ZyAajux20o9bRaBC1cRlMNZrh2nuW0fv693_x8@z>

In article <95Nov16.124635est.78461@colossus.cse.psu.edu>,
 <9fans@cse.psu.edu> wrote:

>In Brazil, we have demos of software doing 30 frame video in a 640x480
>window; actually, several such windows simultaneously.
>
>We reject methods that map the display because they cannot be
>implemented in a way that is portable to the application.  Display
>addressing, byte order issues, pixel packing and so on are just too
>messy.  You need one indirection to hide the details.  Remember, we
>have many machines that are not PC's.

Exactly - in our implementation of the graphics drivers for Photon (a 
microkernel GUI for QNX), the graphics drivers export a 24 bit color model 
into which applications draw, relying on the graphics driver to map the 
requested colors (ie: color select, dither, etc) onto the capabilities of 
the display hardware.  This is especially important as windows are dragged 
from desktop to desktop across a network, since the color capabilities of 
each machine could be very different.  In the case of Doom, each frame 
submitted to the driver contains a palette and rather than hitting the 
hardware palette, the driver does a closest match for all the colors in the 
frame.  It turns out that this calculation can be done more quickly than 
the hardware palette can be reprogrammed, and as a result, we actually get 
frame rates under Photon that rival the raw-VGA frame rates.

>Somewhat closer to this home, Carmack's observation that the bitmap
>read/write protocol should use rectangles rather than whole scan lines
>is right, and already part of Brazil.  Our solution was to require the
>scan lines of the rectangle to start and end at byte boundaries, not
>pixels, so that a general bitblt is not called for.  Using memmove and
>a little care, the performance can be good.  On many machines, but not
>SPARC, unaligned memmove can be as fast as aligned because of special
>instructions or silicon, so the quantization to bytes is good enough.

Exactly, in addition, with display hardware appearing that allows one or 
more clip rectangles to be "pushed" into the hardware before rendering the 
image, we can maximize the use of the display hardware's capabilities.  
It's up to the graphics driver to either use the hardware, or if the 
hardware is lacking, to make up the difference with software.

Another capability we want graphics drivers to be able to take advantage of 
is color space conversion.  It becomes possible to place the application's 
representation of the frame directly into off-screen memory in the video 
card, and then let the video card do the "stretch blit" as necessary to 
convert the off-screen representation into one which agrees with the 
current display mode.  As a result, it becomes unnecessary to export the 
hardware representation of the video memory.

>As for events, I side with dhog.  The Plan 9 model is general and easy
>to get right.  If you need special support - and there is no doubt
>that games do - I would suggest providing a connection to the raw
>devices underneath, and synthesizing whatever else is needed at user
>level.  That is, rather than build special devices that fold together
>multiple devices and demand non-universal hardware features (e.g.  key
>up), it is better to build devices in the kernel that export the
>hardware interface as directly as is practical, and encapsulate the
>special properties and desires in adaptable, user-level code.  With
>some care, even the window system could pass such devices through
>cleanly.

My impression is that Plan 9's approach has always been that rather than 
implementing new facilities, fix the ones that exist so that new services 
aren't necessary.  For example, with process-to-process IPX sufficiently 
fast, threads aren't needed to solve the slow-IPC problem that some OS's
demonstrate.  Threads may be useful for other reasons, but achieveing fast 
context switches doesn't need to be one of them.

>Performance is not critical here:  with human-driven input,
>the extra context switch and system call required would be
>insignificant on modern machines, especially when compared to the
>generation of a 30Hz image.

We still have to be careful here, because "human driven" input will be 
increasing in bandwidth as user-interface expectations increase.  For 
example, handwriting recognition and voice input are certainly 
high-bandwidth.  If we want to be able to deal with these input devices 
within the GUI, the GUI must provide fast and efficient event mechanisms to 
pass this data between the services processing that data.
-- 
Dan Hildebrand (danh@qnx.com)               QNX Software Systems, Ltd.
http://www.qnx.com/~danh                    175 Terence Matthews
phone: (613) 591-0931 (voice)               Kanata, Ontario, Canada
       (613) 591-3579 (fax)                 K2M 1W8