Graphics issues

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Graphics issues
@ 1995-11-14  0:48 John
  0 siblings, 0 replies; 8+ messages in thread
From: John @ 1995-11-14  0:48 UTC (permalink / raw)

John Carmack <9fans@cse.psu.edu> wrote:
>Events:
>
>Plan 9 should have a /dev/events device that combines the mouse,
>cons, and time devices.  There would be sizable efficiency,
>development ease, and user interface benefits from this.

Since the worm can of event handling has been opened, I'll toss a few
more ideas out.  It would be nice to have an elegant structure for
adding extension event streams.  E.g. spaceballs, touch screens, EM
field sensing devices (see CHI '95 proceedings, p. 280), and so forth.

Consider the following:

Create a special /dev/events/evstream that gives the proper event
interleaving for all event drivers in the current view of a
/dev/events directory tree.  This could include /dev/events/mouse,
/dev/events/cons, /dev/events/time, and possibly novel input event
drivers.  An application might then mask its event stream by modifying
its view of the /dev/events tree accordingly.

Assuming that the above is both feasible and sensible, it could also
provide a nice mechanism for integrating alternative event streams
into the event stream.

As opposed to the idea of using a control file for event selection, I
envision the evstream device as a sort of multiplexer that manages
streams from a set of input devices.  Event selection then gets left
to the file system layer, where evstream is only called on to manage
event streams from other drivers visible under /dev/events.  A
significant question remains: is an elegant implementation possible
that meets the reliability (no misses) and temporal ordering
constraints raised by John Carmack?

>I see this as a no-drawbacks, just-plain-right thing to do.

I have to agree with the issues that John raises.  Multimedia and
highly interactive applications are going to happen.  It is tantamount
that efficient and well-thought out support for these be integrated
into Plan 9 if it is to survive.  Wouldn't a VRML browser for Plan 9
be nice?

There are all sorts of applications that could be hindered by a lack
of proper multimedia support.

Now to turn the coin over, consider building a /dev/multimedia that
handles some of the ugly aspects of displaying synchronized video and
audio streams?  There is some question in my mind as to the format in
which to provide the raw streams to /dev/multimedia in such a manner
that it can synchronize them.  Might make a nice distributed solution,
too: audio and video streams from separate sources could arrive at a
an instance of /dev/multimedia to be synchronized and displayed at the
local machine.

-- John Whitley

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Graphics issues
@ 1995-11-30 16:03 Al
  0 siblings, 0 replies; 8+ messages in thread
From: Al @ 1995-11-30 16:03 UTC (permalink / raw)

In article <9511171105.AA16712@idnewt.idsoftware.com>,
John Carmack <9fans@cse.psu.edu> wrote:

>All of my event issues would be resolved with two changes to the =
>current interface:
>
>The mouse device must buffer state transitions, so clicks are =
>never missed.  This could be done transparently to current code.
>
>A raw keyboard device would need to be created that includes key =
>ups if available and time stamps the actions so they can be =
>accurately interleaved with mouse events.

   If you can find some old documentation on the Commodore Amiga,
take a look at its "food chain" that passes keyboard (raw up/down)
and mouse events via messages to an ordered list of processes that
ask for them -- with time stamps.  Looks a lot like the Mac mechanism,
but more "process" oriented.  If the chain is ordered by timestamp,
you have a reasonably integrated mouse/keyboard input mechanism.
And all this could be treated as a raw device (/dev/"raw food")?

>I think that plan9 would be an excellent environment to write a =
>video rate aware graphics/window system.
>
>Digression: In some extreme programming forms (demo coding), =
>drawing is sometimes performed in a controled enough fashion that =
>it can be direct to screen and manage to never produce an =
>inconsistant image by being totally aware of the relationship =
>between the location of the drawing and the current position of =
>the raster, but that isn't generaly useful.
>
>....  If PCs had scan line interrupts, that would even be a =
>practical thing to do...

   Amiga....  While per-scan-line changes in color translation tables
and display content might be OK for demos, it probably isn't useful when
multiple processes are each changing parts of the display simultaneously.
Unless you could have hardware hide the effects.  Maybe buffer vram
writes that are "ahead" of the raster in a "shadow" vram, then write-through
any changed areas after the raster passes ---- ughh, nope, forget it...

>The answer is to keep the window bitmaps in offscreen vram and =
>have the accelerator do the pixel pushing.

   Seriously, why not have the vram be part of the host ram memory space???
(Just don't cache that area.)

>Digression 2:  the next generation of PCI video cards are going to =
>support bus mastering, with the ability to pull pixels directly =
>out of host memory at speeds of up to nearly 100 megs a second.  I =
>doubt the main memory systems will be able to feed them that fast, =
>though.  It will change a lot of design decisions.

   Again, the Amiga model (sort of) has video take pixels from host
memory (and a DMA-based hardware Blitter that can synchronize with
raster position).  You might want to look at this "old" windowing
mechanism that supports a hierarchy of independent Screens
with Windows, although the windowing part seems to borrow a lot from the
Xerox/Mac model.

   Maybe the ultimate answer is non-raster displays.  But how would you "hide"
update details from a screen that instantly shows every pixel change!!

Al Varney

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Graphics issues
@ 1995-11-19 19:23 Andrew
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew @ 1995-11-19 19:23 UTC (permalink / raw)

John Carmack (johnc@idnewt.idsoftware.COM) wrote:

>Interactive priority scheduling would be an interesting thing to =

 [ deletia ]

>compute bound.  A compile running in the background should only =
>get cycles when all interactive applications are blocked on user =
>devices.  Perhapse processes could be classified "compute bound" =
>if they last blocked on a non-user IO device, and "interactive" if =
>they last blocked on mouse/keyboard.  If an interactive process =
>goes it's full (generous) timeslice without blocking again, =
>reclassify it until it again hits a user device.

The danger here is to forget that the processes associated with the
terminal may not be the only ones of importance on the machine.

There is a lot of good in plan9 for its ability to handle 
processes dealing with either remote or no, terminal in a 'fair'
way.

While game handling is important (!) -- we still want to be able
to handle large computational 'compiler type' sequences in the
background; and have them actually get a fair chunk of the CPU.

--
    andrew@snowhite.cis.uoguelph.ca          andrewhw@uoguelph.ca

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Graphics issues
@ 1995-11-19 15:31 Dan
  0 siblings, 0 replies; 8+ messages in thread
From: Dan @ 1995-11-19 15:31 UTC (permalink / raw)

In article <95Nov16.124635est.78461@colossus.cse.psu.edu>,
 <9fans@cse.psu.edu> wrote:

>In Brazil, we have demos of software doing 30 frame video in a 640x480
>window; actually, several such windows simultaneously.
>
>We reject methods that map the display because they cannot be
>implemented in a way that is portable to the application.  Display
>addressing, byte order issues, pixel packing and so on are just too
>messy.  You need one indirection to hide the details.  Remember, we
>have many machines that are not PC's.

Exactly - in our implementation of the graphics drivers for Photon (a 
microkernel GUI for QNX), the graphics drivers export a 24 bit color model 
into which applications draw, relying on the graphics driver to map the 
requested colors (ie: color select, dither, etc) onto the capabilities of 
the display hardware.  This is especially important as windows are dragged 
from desktop to desktop across a network, since the color capabilities of 
each machine could be very different.  In the case of Doom, each frame 
submitted to the driver contains a palette and rather than hitting the 
hardware palette, the driver does a closest match for all the colors in the 
frame.  It turns out that this calculation can be done more quickly than 
the hardware palette can be reprogrammed, and as a result, we actually get 
frame rates under Photon that rival the raw-VGA frame rates.

>Somewhat closer to this home, Carmack's observation that the bitmap
>read/write protocol should use rectangles rather than whole scan lines
>is right, and already part of Brazil.  Our solution was to require the
>scan lines of the rectangle to start and end at byte boundaries, not
>pixels, so that a general bitblt is not called for.  Using memmove and
>a little care, the performance can be good.  On many machines, but not
>SPARC, unaligned memmove can be as fast as aligned because of special
>instructions or silicon, so the quantization to bytes is good enough.

Exactly, in addition, with display hardware appearing that allows one or 
more clip rectangles to be "pushed" into the hardware before rendering the 
image, we can maximize the use of the display hardware's capabilities.  
It's up to the graphics driver to either use the hardware, or if the 
hardware is lacking, to make up the difference with software.

Another capability we want graphics drivers to be able to take advantage of 
is color space conversion.  It becomes possible to place the application's 
representation of the frame directly into off-screen memory in the video 
card, and then let the video card do the "stretch blit" as necessary to 
convert the off-screen representation into one which agrees with the 
current display mode.  As a result, it becomes unnecessary to export the 
hardware representation of the video memory.

>As for events, I side with dhog.  The Plan 9 model is general and easy
>to get right.  If you need special support - and there is no doubt
>that games do - I would suggest providing a connection to the raw
>devices underneath, and synthesizing whatever else is needed at user
>level.  That is, rather than build special devices that fold together
>multiple devices and demand non-universal hardware features (e.g.  key
>up), it is better to build devices in the kernel that export the
>hardware interface as directly as is practical, and encapsulate the
>special properties and desires in adaptable, user-level code.  With
>some care, even the window system could pass such devices through
>cleanly.

My impression is that Plan 9's approach has always been that rather than 
implementing new facilities, fix the ones that exist so that new services 
aren't necessary.  For example, with process-to-process IPX sufficiently 
fast, threads aren't needed to solve the slow-IPC problem that some OS's
demonstrate.  Threads may be useful for other reasons, but achieveing fast 
context switches doesn't need to be one of them.

>Performance is not critical here:  with human-driven input,
>the extra context switch and system call required would be
>insignificant on modern machines, especially when compared to the
>generation of a 30Hz image.

We still have to be careful here, because "human driven" input will be 
increasing in bandwidth as user-interface expectations increase.  For 
example, handwriting recognition and voice input are certainly 
high-bandwidth.  If we want to be able to deal with these input devices 
within the GUI, the GUI must provide fast and efficient event mechanisms to 
pass this data between the services processing that data.
-- 
Dan Hildebrand (danh@qnx.com)               QNX Software Systems, Ltd.
http://www.qnx.com/~danh                    175 Terence Matthews
phone: (613) 591-0931 (voice)               Kanata, Ontario, Canada
       (613) 591-3579 (fax)                 K2M 1W8

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Graphics issues
@ 1995-11-17 11:05 John
  0 siblings, 0 replies; 8+ messages in thread
From: John @ 1995-11-17 11:05 UTC (permalink / raw)


--NeXT-Mail-540168154-1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This message sat in a compose window for three days, and it grew, =
and grew, and grew...


>I much prefer the way that Plan 9 handles "events" already, that =
is,
>by using concurrency to handle multiple inputs rather than adding
>some mechanism for lumping them all together

>Consider where the mouse and keyboard events originate from.  =
They
>are two separate hardware devices, using two separate (usually)
>serial inputs to pass information to the terminal.  Any "proper
>event interleaving" is an illusion.

That keyboard and mouse are different input devices to the =
computer hardware is an artifact, not an excuse.  There is =
something to be said for running them over the same bus, like ADB.

Together (along with tablets, gloves, whatever), they constitute =
"the user's wishes" -- a single, sequenced stream of commands.  =
They are NOT independent streams where concurrency is apropriate.  =
Allowing them to slip reletive to each other is fairly analagous =
to allowing file seeks to slip reletive to file writes.  Bad =
Thing.  I'm not saying that the plan9 event system in unusable, =
just that is is nonoptimal enough to care.

> rob@plan9.att.com:
>Performance is not critical here:  with human-driven input,
>the extra context switch and system call required would be
>insignificant on modern machines, especially when compared to the
>generation of a 30Hz image.

On an unloaded system, I would agree, but the kernel to 8.5 slave =
process to 8.5 scheduler to your slave process to your main loop =
chain (seperately for mouse and keyboard) is plenty oportunity for =
the scheduler to decide to run something else.

Interactive priority scheduling would be an interesting thing to =
follow up on.  A process that is blocking on the user's input =
should have a temporarily boosted priority so that when the input =
is available, it automatically preempts any process that is =
compute bound.  A compile running in the background should only =
get cycles when all interactive applications are blocked on user =
devices.  Perhapse processes could be classified "compute bound" =
if they last blocked on a non-user IO device, and "interactive" if =
they last blocked on mouse/keyboard.  If an interactive process =
goes it's full (generous) timeslice without blocking again, =
reclassify it until it again hits a user device.

>The language betrays the bias:
>        it gets rid of the need to fork processes just to watch
>        blocking files

Ok, sure, I admit a little bias.  I'm used to treating the OS as =
an enemy that, given a chance, does everything wrong :-)

All of my event issues would be resolved with two changes to the =
current interface:

The mouse device must buffer state transitions, so clicks are =
never missed.  This could be done transparently to current code.

A raw keyboard device would need to be created that includes key =
ups if available and time stamps the actions so they can be =
accurately interleaved with mouse events.

I might still make some weak protests about the flow of control =
through the system, but I wouldn't have much of a leg to stand on =
because functionally identical results could be obtained.

A raw mouse device (movement deltas only, no screen clamping) =
would be cool for games, but that's so esoteric that I wouldn't =
push for it.


(segueing into high performance user interface systems)

Most of the comments I am making are not specifically targeted at =
games, but for user interfaces in general.  I am in the middle of =
a major revamping of our map editor at the moment, so app =
interactivity is much on my mind.  Any app can benefit from a more =
responsive interface.  Apps just don't eat you if you are slow :-)

There is a nice constant in user interface speed:  If a user's =
action shows feedback on the video frame following the input, it =
is fast enough.  If not, there is room for improvement.

Computers should feel instant whenever possible.  This involves =
the event path, whatever processing is done, the speed of drawing, =
and the way the drawing is displayed.

I consider it a general truth that you shouldn't see the computer =
performing drawing operations, because it is an artifact of =
serialized rasterization.  Abstractly, a program describes a final =
view with drawing primitives, not a sequence of frames that varies =
based on the speed of the target computer and position of the crt =
raster, which is what you get when you draw or flush directly to =
visible display memory without proper syncronization.  (the one =
exception to the dont-show-the-drawing rule is when the drawing =
takes a long enough time that the user is feedback starved)

On SGI machines, the graphics hardware is very fast, with many UI =
tasks performed at video frame rates, but the drawing is usually =
visible to the user as bad flicker.  It looks messy.

On NEXTSTEP machines, the drawing is hidden by buffered windows, =
but the flush to screen is bus bandwidth limited, so large windows =
have a sluggish feel to them and dragging a window can often =
result in multiple tear lines.  Display postscript prevents NS =
from utilizing hardware acceleration in most cases.

(finaly getting to the plan9 relevent part)

I think that plan9 would be an excellent environment to write a =
video rate aware graphics/window system.

Seeing  direct manipulation UI events (full window drag, live =
scrolling, etc) take place at syncronized video frame rates would =
be a very cool experience.

Plan9 has already bit the bullet and allocated backing store for =
all of the window layers, which is usually a hard fight.  The =
memory cost is worth it to avoid expose events, and enabling all =
drawing operations to be performed in an undisplayed area (which =
plan9 does not currently do).

The plan9 drawing primitives map almost directly to common =
accelerator functions.

And finally, the scope of the graphics code is manageable and easy =
to deal with.

Sounds like a good little project for me.

There are two ways to get a totally seamless display update: back =
buffering with a raster synronized flush, and page flipping.  =
Digression: In some extreme programming forms (demo coding), =
drawing is sometimes performed in a controled enough fashion that =
it can be direct to screen and manage to never produce an =
inconsistant image by being totally aware of the relationship =
between the location of the drawing and the current position of =
the raster, but that isn't generaly useful.

Some versions of plan9 allready completely double buffer the =
screen in system memory.  Unfortunately, a large window can take =
more than an entire frames time to push over the PCI bus, so even =
if you synced with the raster, you would still get a partial =
update (not to mention spending all of your cpu time moving =
bytes).  Digression: it is possible to get perfect updates even if =
you are blitting at roughly half the speed of the raster by =
"chasing the raster" -- starting just behind it, and letting it =
run away from you, but if it doesn't lap you, the image comes out =
consistant.  If PCs had scan line interrupts, that would even be a =
practical thing to do...

The answer is to keep the window bitmaps in offscreen vram and =
have the accelerator do the pixel pushing.  All of the modern =
video cards support linear frame buffer mode, where you can look =
at the entire 2-4-8-whatever megs of memory in a single block.  No =
more godaweful banking schemes.  The drawback, of cource, is that =
you need twice as much memory on your video, at a minimum.  For a =
lot of people that's too big of a price to pay (and you are SOL if =
you want 1600*1280*32 bit), but instant video operations often =
make a bigger user-perceptible difference than faster processors.

The current generation of windows accelerators have vram to vram =
blits at speeds in excess of 100 megs / second, which is =
conveniently fast enough to copy an entire screen full of data at =
1280*1024*8 bit*76hz in a single video field.  Properly utilized, =
you should be able to drag a window around on the screen of ANY =
size, and have it updated rock solid every single single frame.  =
That would be COOL.

An interesting PC fact:  good video cards have significantly =
higher write bandwidth than most main memory systems (40 megs / =
sec vs 25 megs / sec is typical).  Its sad but true -- most =
graphics operations can be performed faster going over the PCI bus =
to an optimized memory system than staying on the local processor =
bus and going to the rather lame motherboard memory system.  If =
you can also avoid the flush to screen by page flipping, you are =
winning even bigger.  Read / modify / write operations to video =
card memory often fall over dead, though.

Digression 2:  the next generation of PCI video cards are going to =
support bus mastering, with the ability to pull pixels directly =
out of host memory at speeds of up to nearly 100 megs a second.  I =
doubt the main memory systems will be able to feed them that fast, =
though.  It will change a lot of design decisions.

There are two options on implementing this:  use two pages of =
video memory and have the accelerator move the visible parts of =
the window while the host flushes the exposed areas, or try to =
keep all active bitmaps in video memory and work on them in place =
so the update can also be done by the accelerator.

There are 8 meg video cards that could statically provide as much =
bitmap memory as plan9 currently allocates in kernel, but I'm =
pretty sure you would want to have a proper caching scheme in =
place to spill to system memory.

If the bitmaps-in-vram route was taken, you could either use the =
host cpu or the accelerator for any drawing.


I have actually started working towards this goal, but given the =
small number of hours I allow myself for playing on plan9, I =
wouldn't hold my breath for it.  After we ship quake...

I started out just wanting to add full window drag to 8.5, but it =
turns out that the layers library just is not freindly to that, =
because the bitmaps keep their coordinates in global screen space =
instead of having a local origin (the only window system I know of =
like that), so they can't really be moved.

To correct that, the virtualization of devbit will need to perform =
fixups to every coordinate that it gets and layers needs to be =
replaced.  If anything, the structure is getting simpler, because =
nothing needs to worry about if it is visible or not, it just all =
draws to the cache, and a final stage looks at the set of all =
visible windows to see what needs to go to the screen.


John Carmack
Id Software


--NeXT-Mail-540168154-1
Content-Type: text/enriched; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This message sat in a compose window for three days, and it grew, =
and grew, and grew...



>I much prefer the way that Plan 9 handles "events" already, that =
is,

>by using concurrency to handle multiple inputs rather than adding

>some mechanism for lumping them all together


>Consider where the mouse and keyboard events originate from.  =
They

>are two separate hardware devices, using two separate (usually)

>serial inputs to pass information to the terminal.  Any "proper

>event interleaving" is an illusion.


That keyboard and mouse are different input devices to the =
computer hardware is an artifact, not an excuse.  There is =
something to be said for running them over the same bus, like ADB.


Together (along with tablets, gloves, whatever), they constitute =
"the user's wishes" -- a single, sequenced stream of commands.  =
They are NOT independent streams where concurrency is apropriate.  =
Allowing them to slip reletive to each other is fairly analagous =
to allowing file seeks to slip reletive to file writes.  Bad =
Thing.  I'm not saying that the plan9 event system in unusable, =
just that is is nonoptimal enough to care.


>
<bold>=20
</bold>rob@plan9.att.com:

>Performance is not critical here:  with human-driven input,

>the extra context switch and system call required would be

>insignificant on modern machines, especially when compared to the

>generation of a 30Hz image.


On an unloaded system, I would agree, but the kernel to 8.5 slave =
process to 8.5 scheduler to your slave process to your main loop =
chain (seperately for mouse and keyboard) is plenty oportunity for =
the scheduler to decide to run something else.


Interactive priority scheduling would be an interesting thing to =
follow up on.  A process that is blocking on the user's input =
should have a temporarily boosted priority so that when the input =
is available, it automatically preempts any process that is =
compute bound.  A compile running in the background should only =
get cycles when all interactive applications are blocked on user =
devices.  Perhapse processes could be classified "compute bound" =
if they last blocked on a non-user IO device, and "interactive" if =
they last blocked on mouse/keyboard.  If an interactive process =
goes it's full (generous) timeslice without blocking again, =
reclassify it until it again hits a user device.


>The language betrays the bias:

>        it gets rid of the need to fork processes just to watch

>        blocking files


Ok, sure, I admit a little bias.  I'm used to treating the OS as =
an enemy that, given a chance, does everything wrong :-)


All of my event issues would be resolved with two changes to the =
current interface:


The mouse device must buffer state transitions, so clicks are =
never missed.  This could be done transparently to current code.


A raw keyboard device would need to be created that includes key =
ups if available and time stamps the actions so they can be =
accurately interleaved with mouse events.


I might still make some weak protests about the flow of control =
through the system, but I wouldn't have much of a leg to stand on =
because functionally identical results could be obtained.


A raw mouse device (movement deltas only, no screen clamping) =
would be cool for games, but that's so esoteric that I wouldn't =
push for it.



(segueing into high performance user interface systems)


Most of the comments I am making are not specifically targeted at =
games, but for user interfaces in general.  I am in the middle of =
a major revamping of our map editor at the moment, so app =
interactivity is much on my mind.  Any app can benefit from a more =
responsive interface.  Apps just don't eat you if you are slow :-)


There is a nice constant in user interface speed:  If a user's =
action shows feedback on the video frame following the input, it =
is fast enough.  If not, there is room for improvement.


Computers should feel instant whenever possible.  This involves =
the event path, whatever processing is done, the speed of drawing, =
and the way the drawing is displayed.


I consider it a general truth that you shouldn't see the computer =
performing drawing operations, because it is an artifact of =
serialized rasterization.  Abstractly, a program describes a final =
view with drawing primitives, not a sequence of frames that varies =
based on the speed of the target computer and position of the crt =
raster, which is what you get when you draw or flush directly to =
visible display memory without proper syncronization.  (the one =
exception to the dont-show-the-drawing rule is when the drawing =
takes a long enough time that the user is feedback starved)


On SGI machines, the graphics hardware is very fast, with many UI =
tasks performed at video frame rates, but the drawing is usually =
visible to the user as bad flicker.  It looks messy.


On NEXTSTEP machines, the drawing is hidden by buffered windows, =
but the flush to screen is bus bandwidth limited, so large windows =
have a sluggish feel to them and dragging a window can often =
result in multiple tear lines.  Display postscript prevents NS =
from utilizing hardware acceleration in most cases.


(finaly getting to the plan9 relevent part)


I think that plan9 would be an excellent environment to write a =
video rate aware graphics/window system.


Seeing  direct manipulation UI events (full window drag, live =
scrolling, etc) take place at syncronized video frame rates would =
be a very cool experience.


Plan9 has already bit the bullet and allocated backing store for =
all of the window layers, which is usually a hard fight.  The =
memory cost is worth it to avoid expose events, and enabling all =
drawing operations to be performed in an undisplayed area (which =
plan9 does not currently do).


The plan9 drawing primitives map almost directly to common =
accelerator functions.


And finally, the scope of the graphics code is manageable and easy =
to deal with.


Sounds like a good little project for me.


There are two ways to get a totally seamless display update
: back buffering
 with a raster synronized flush,
 and page flipping. =20
Digression:=20
In some extreme programming forms
 (
demo coding
)
, drawing is sometimes performed in a controled enough fashion =
that it can be direct to screen and manage to never produce an =
inconsistant image by being totally aware of the relationship =
between the location of the drawing and the current position of =
the raster, but that isn't generaly useful.


Some versions of plan9 allready completely double buffer the =
screen in system memory.  Unfortunately, a large window can take =
more than an entire frames time to push over the PCI bus, so even =
if you synced with the raster, you would still get a partial =
update (not to mention spending all of your cpu time moving =
bytes).  Digression: it is possible to get perfect updates even if =
you are blitting at roughly half the speed of the raster by =
"chasing the raster" -- starting just behind it, and letting it =
run away from you, but if it doesn't lap you, the image comes out =
consistant.  If PCs had scan line interrupts, that would even be a =
practical thing to do...


The answer is to keep the window bitmaps in offscreen vram and =
have the accelerator do the pixel pushing.  All of the modern =
video cards support linear frame buffer mode, where you can look =
at the entire 2-4-8-whatever megs of memory in a single block.  No =
more godaweful banking schemes.  The drawback, of cource, is that =
you need twice as much memory on your video, at a minimum.  For a =
lot of people that's too big of a price to pay (and you are SOL if =
you want 1600*1280*32 bit), but instant video operations often =
make a bigger user-perceptible difference than faster processors.


The current generation of windows accelerators have vram to vram =
blits at speeds in excess of 100 megs / second, which is =
conveniently fast enough to copy an entire screen full of data at =
1280*1024*8 bit*76hz in a single video field.  Properly utilized, =
you should be able to drag a window around on the screen of ANY =
size, and have it updated rock solid every single single frame.  =
That would be COOL.


An interesting PC fact:  good video cards have significantly =
higher write bandwidth than most main memory systems (40 megs / =
sec vs 25 megs / sec is typical).  Its sad but true -- most =
graphics operations can be performed faster going over the PCI bus =
to an optimized memory system than staying on the local processor =
bus and going to the rather lame motherboard memory system.  If =
you can also avoid the flush to screen by page flipping, you are =
winning even bigger.  Read / modify / write operations to video =
card memory often fall over dead, though.


Digression 2:  the next generation of PCI video cards are going to =
support bus mastering, with the ability to pull pixels directly =
out of host memory at speeds of up to nearly 100 megs a second.  I =
doubt the main memory systems will be able to feed them that fast, =
though.  It will change a lot of design decisions.


There are two options on implementing this:  use two pages of =
video memory and have the accelerator move the visible parts of =
the window while the host flushes the exposed areas, or try to =
keep all active bitmaps in video memory and work on them in place =
so the update can also be done by the accelerator.


There are 8 meg video cards that could statically provide as much =
bitmap memory as plan9 currently allocates in kernel, but I'm =
pretty sure you would want to have a proper caching scheme in =
place to spill to system memory.


If the bitmaps-in-vram route was taken, you could either use the =
host cpu or the accelerator for any drawing.



I have actually started working towards this goal, but given the =
small number of hours I allow myself for playing on plan9, I =
wouldn't hold my breath for it.  After we ship quake...


I started out just wanting to add full window drag to 8.5, but it =
turns out that the layers library just is not freindly to that, =
because the bitmaps keep their coordinates in global screen space =
instead of having a local origin (the only window system I know of =
like that), so they can't really be moved.


To correct that, the virtualization of devbit will need to perform =
fixups to every coordinate that it gets and layers needs to be =
replaced.  If anything, the structure is getting simpler, because =
nothing needs to worry about if it is visible or not, it just all =
draws to the cache, and a final stage looks at the set of all =
visible windows to see what needs to go to the screen.



John Carmack

Id Software



--NeXT-Mail-540168154-1--






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Graphics issues
@ 1995-11-16 17:39 rob
  0 siblings, 0 replies; 8+ messages in thread
From: rob @ 1995-11-16 17:39 UTC (permalink / raw)


Carmack's suggestions fall into two classes:  the efficient display of
dynamic pictures, and the easy programming of multiple inputs.  Plan 9
was designed for software development, not interactive games (OK,
relax, we screwed up, we admit it) but it can be adapted to serve both
needs.

The efficient display of pictures can be done very differently.  In
fact, Carmack's suggestions cover about half of the complete redesign
of that interface done in Brazil.  It has become clear to us that the
'right' answer is a hybrid:  a (much simpler than in Plan 9)
procedural device to handle most programs' needs, and a much more
direct route to the display for those few programs that want the
system out of the way.  Both can be supported in one place, and we now
have some evidence to back us up.  On our 100MHz R4400 machines we can
write to a clipped window at pretty close to memmove() speeds, about
20 megabytes per second.  This is done by a very different interface
to the system, in which programs write images directly to a device
that copies them to the window.  In the local case, when all is
favorable (as it would be running, say, Doom), the only overhead is
one copy to the display directly from the local memory of the program,
where the real display is maintained.  (On a PC, the VGA business gets
in the way, so this becomes two copies .  With some non-portability
(see below), this could be reduced to one.)  Adding a Plan 9-style
secondary access to the display has been tried several ways but the
design is not finalized.  It is necessary for efficient remote
graphics.

In Brazil, we have demos of software doing 30 frame video in a 640x480
window; actually, several such windows simultaneously.

We reject methods that map the display because they cannot be
implemented in a way that is portable to the application.  Display
addressing, byte order issues, pixel packing and so on are just too
messy.  You need one indirection to hide the details.  Remember, we
have many machines that are not PC's.

Somewhat closer to this home, Carmack's observation that the bitmap
read/write protocol should use rectangles rather than whole scan lines
is right, and already part of Brazil.  Our solution was to require the
scan lines of the rectangle to start and end at byte boundaries, not
pixels, so that a general bitblt is not called for.  Using memmove and
a little care, the performance can be good.  On many machines, but not
SPARC, unaligned memmove can be as fast as aligned because of special
instructions or silicon, so the quantization to bytes is good enough.

His further suggestions on this topic lead pretty well to the current
design in Brazil.

As usual, Carmack is right on in his comments about compiling on the
fly on RISC machines.  Caches cause trouble:  their management is
expensive, non-portable even between different computers with the same
CPU, and probably costs more than is won back by the compilation.  We
are using other techniques now.  (It is, of course, still sensible on
x86 and 680x0 machines, but we are after one design.)

As for events, I side with dhog.  The Plan 9 model is general and easy
to get right.  If you need special support - and there is no doubt
that games do - I would suggest providing a connection to the raw
devices underneath, and synthesizing whatever else is needed at user
level.  That is, rather than build special devices that fold together
multiple devices and demand non-universal hardware features (e.g.  key
up), it is better to build devices in the kernel that export the
hardware interface as directly as is practical, and encapsulate the
special properties and desires in adaptable, user-level code.  With
some care, even the window system could pass such devices through
cleanly.  Performance is not critical here:  with human-driven input,
the extra context switch and system call required would be
insignificant on modern machines, especially when compared to the
generation of a 30Hz image.

The language betrays the bias:
        it gets rid of the need to fork processes just to watch
        blocking files
What's wrong with forking processes?  That's what they're for:
watching and waiting.  They offer a generality no supposedly general
event mechanism ever can.  If you really want events, they're easy to
build at user level with the tools Plan 9 provides:  processes, shared
memory, synchronization, and file system interfaces.

-rob






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Graphics issues
@ 1995-11-14 12:24 dhog
  0 siblings, 0 replies; 8+ messages in thread
From: dhog @ 1995-11-14 12:24 UTC (permalink / raw)

John Whitley <whitley@cs.buffalo.edu> writes:
>Since the worm can of event handling has been opened, I'll toss a few
>more ideas out.  It would be nice to have an elegant structure for
>adding extension event streams.  E.g. spaceballs, touch screens, EM
>field sensing devices (see CHI '95 proceedings, p. 280), and so forth.
>
>Consider the following:
>
>Create a special /dev/events/evstream that gives the proper event
>interleaving for all event drivers in the current view of a
>/dev/events directory tree.  This could include /dev/events/mouse,
>/dev/events/cons, /dev/events/time, and possibly novel input event
>drivers.  An application might then mask its event stream by modifying
>its view of the /dev/events tree accordingly.

This is beginning to sound unpleasantly like X  :-)  I much prefer the
way that Plan 9 handles "events" already, that is, by using concurrency
to handle multiple inputs rather than adding some mechanism for
lumping them all together.  I consider the problem seen by John Carmack
with the mouse to be mainly a scheduling problem, rather than a failure
of the design.

Consider where the mouse and keyboard events originate from.  They are
two separate hardware devices, using two separate (usually) serial inputs
to pass information to the terminal.  Any "proper event interleaving" is an
illusion.  This illusion can either be created in the kernel (as you suggest)
or in user mode (eg by libevent, or explicit use of concurrency).  The only
advantage in doing it in the kernel is that the interleaving occurs at interrupt
time, and hence with greater accuracy.  The user mode solution has the
advantage of generality.  (Ok, so you have to rfork() some extra processes,
but they're cheap on Plan 9).  If you get the wrong interleaving this way,
then it's the scheduler's fault.  Note that, as Phil pointed out, Plan 9 makes
no provision for real time scheduling.  You can, however, give some processes
a better priority, which might help.  Try writing a message of the form "pri n"
to /proc/<pid>/ctl.  "n" should be between 0 and 19 inclusive.

It's probably not helping that /dev/mouse is state driven, rather than
event driven.  Ideally, from the point of view of game playing, you want
the raw data straight from the mouse (but in some standard format)
rather than what /dev/mouse gives you now.  Similarly, you want all
the key presses and releases.  This has already been discussed on
9fans.  These raw devices should be added to Plan 9 as a matter of
urgency, to increase its utility as a games OS  :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Graphics issues
@ 1995-11-12 12:28 John
  0 siblings, 0 replies; 8+ messages in thread
From: John @ 1995-11-12 12:28 UTC (permalink / raw)



First, some investigative results:

Plan 9 provides no direct means of directly putting a dynamically =
generated bitmap onto a rectangle of the screen.  The model is to =
upload raw data to an offscreen bitmap, then bitblt from there to =
your destination position on screen.

It is possible to write bitmap data directly to your virtualize =
screen bitmap, theoretically avoiding the bitblt phase, but it =
turns out that the data path from app - 8.5 - kernel for this =
operation is not particularly optimized.  For small blits =
(320*200) there is a small savings because there are less context =
switches, but for large (640*480) blits it gets slower than doing =
the two steps seperately.  A side effect of doing this (bug?) is =
that the upload goes over the entire window layer, including the =
8.5 border, which is usually outside your cliprect.

The timing results I got (in ms, on a 25 mhz 2 bit nextstation, =
640*480 2 bit blits) are:

94	direct write to virtualized screen bitmap
91	write to offscreen bitmap, then bitblt to screen
75	time for the write
16	time for the bitblt

Running the test program without 8.5 dramatically helped the =
numbers.  Side note:  What is the proper way to exit 8.5?  Kill =
it?  I added an exit menu option, because I am hacking around =
inside it for some other things.

41	direct write to virtualized screen bitmap
33	write to offscreen bitmap, then bitblt to screen
22	time for the write
10	time for the bitblt

To see where this stands in absolute terms, I directly mapped the =
framebuffer into memory and timed the copy myself.   To grab the =
NeXT framebuffer, you need to:
Add in 9/next/segment.h:
/*JDC*/	{ SG_PHYSICAL,	"fb",		DISPLAYRAM, 262144,	0,	0 },
Add in your program:
	fb =3D (uchar *)segattach (0, "fb", 0, 262144);

After I got the vram mapped in, a quick looping copy returned 10 =
ms for the copy.  Unwinding only improved it to 9 ms, so the good =
new is that bitblt operates at basically full memory speed.

Some comments:

Timing numbers are very consistent under plan9, a welcome change =
from most unix.

The virtualized screen/bitblt/etc devices are a damn fine thing =
(debugging a window system inside a window is just plain First =
Order Cool), but they definately do get in the way of extracting =
good performance from multimedia / game type applications.

The generated blit code works very well.  My first reaction to the =
dynamically compiled libgnot code was "this is no longer =
apropriate on modern highly cached architectures" (I'm not =
positive about that, but my experience leads me to believe it) , =
but for an old 68040, it seems to be pretty spot on.  I was =
impressed at how well it handled missaligned and varying bit depth =
blittis.


Some potential suggestions:

The wrbitmap() call and the underlying bitblt protocol could be =
extended to accept a full rectangle for it's destination.  It =
might be necesary to 32 bit align the transfered rows, but I see =
little point in allowing row specifications and not column.  That =
would be the conceptual architecture for the action of "put these =
pixels there", without the baggage of the offscreen bitmap that =
never gets referenced with the same data twice.  With some =
attention paid to the efficiency of the data path through 8.5 and =
the devbit device, I'm sure it could be 2x the speed of the =
current operation.  Allowing arbitrary pixel alignment would =
complicate the write a lot, because it would basically become a =
bitblt instead of a copy.


If bitmap memory could be shared between user programs and the =
devbit driver, the only action required would be the bitblit, =
which is 4x to 6x faster than the current operation.  The current =
fixed limit of bitmaps allocated inside the kernel is a fairly big =
problem by itself, so it sounds reasonable to kill two birds with =
one stone by creating a new bitmap memory segment for each =
process, and let it be shared by the kernel and the user process.  =
Wrapper functions could be created to allow automatic =
virtualization over a network.


For some operations, there is just no substitute for having the =
framebuffer memory mapped in.  Yes, you have to deal with all =
format conversions yourself, but you can often combine a final =
operation on your data (like dithering / color space conversion) =
with the transfer to screen, saving at least two main memory =
operations per pixel.  An extreme example is the magnification of =
a rendered scene, where you want to do:

read some pixels
write them four times or nine times to the screen

Instead of:

read some pixels
write them multiple times into a large memory buffer
upload the magnified buffer to a bitmap
bitblt the magnified buffer to the screen.

Even without a multiplexer in the way of the last two steps, there =
is over a factor of 20x difference there.

Framebuffer access can be virtualized even over a network with a =
scheme like:

WriteFramebuffer (Rectangle r, uchar **start, uint *rowwidth);
<do stuff>
FinishFramebuffer ();

If the display is local and the rectangle is completely exposed, =
the start / rowwidth returned values are actually the framebuffer. =
 Otherwise, they are just a memory buffer that will be transfered =
to screen in a more conventional manner upon the call to =
FinishFramebuffer().


Events:

Plan 9 should have a /dev/events device that combines the mouse, =
cons, and time devices.  There would be sizable efficiency, =
development ease, and user interface benefits from this.

Currently, 8.5 can miss mouse clicks, and they aren't properly =
interleaved with keyboard input.  To do a game, I need to get key =
up events, and that totally doesn't fit the cons setup.

Instead of doing:

while (read a key)
	deal with it
read the mouse
	deal with it
read the time


You could do:

read all pending events into a buffer.
while (event)
	if we care about it
		deal with it
	record current time

A control file could allow you to mask events you don't care for, =
like mouse movements or key up events.  You should be able to =
enable a time event that is allways returned last, which would =
mean the read would never block, which is what you want on a sim =
anyway, and it gets rid of the need to fork processes just to =
watch blocking files.

I see this as a no-drawbacks, just-plain-right thing to do.


A side issue:

Occasionally I see a black flash when scrolling text.  It looks =
like what should happen when the devbit device is bit inverting as =
it goes to the screen, but the nextstations should have the =
correct  native pixel format.  Any ideas?



John Carmack
Id Software






^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1995-11-30 16:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1995-11-14  0:48 Graphics issues John
  -- strict thread matches above, loose matches on Subject: below --
1995-11-30 16:03 Al
1995-11-19 19:23 Andrew
1995-11-19 15:31 Dan
1995-11-17 11:05 John
1995-11-16 17:39 rob
1995-11-14 12:24 dhog
1995-11-12 12:28 John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).