From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Fri, 17 Nov 1995 06:05:23 -0500
From: John Carmack johnc@idnewt.idsoftware.com
Subject: Graphics issues
Topicbox-Message-UUID: 34efe7f0-eac8-11e9-9e20-41e7f4b1d025
Message-ID: <19951117110523.2Uw6btemNVaMAiA02YBEfj1P-l3eeHBAWzWz0Ih2Lv8@z>

--NeXT-Mail-540168154-1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This message sat in a compose window for three days, and it grew, =
and grew, and grew...


>I much prefer the way that Plan 9 handles "events" already, that =
is,
>by using concurrency to handle multiple inputs rather than adding
>some mechanism for lumping them all together

>Consider where the mouse and keyboard events originate from.  =
They
>are two separate hardware devices, using two separate (usually)
>serial inputs to pass information to the terminal.  Any "proper
>event interleaving" is an illusion.

That keyboard and mouse are different input devices to the =
computer hardware is an artifact, not an excuse.  There is =
something to be said for running them over the same bus, like ADB.

Together (along with tablets, gloves, whatever), they constitute =
"the user's wishes" -- a single, sequenced stream of commands.  =
They are NOT independent streams where concurrency is apropriate.  =
Allowing them to slip reletive to each other is fairly analagous =
to allowing file seeks to slip reletive to file writes.  Bad =
Thing.  I'm not saying that the plan9 event system in unusable, =
just that is is nonoptimal enough to care.

> rob@plan9.att.com:
>Performance is not critical here:  with human-driven input,
>the extra context switch and system call required would be
>insignificant on modern machines, especially when compared to the
>generation of a 30Hz image.

On an unloaded system, I would agree, but the kernel to 8.5 slave =
process to 8.5 scheduler to your slave process to your main loop =
chain (seperately for mouse and keyboard) is plenty oportunity for =
the scheduler to decide to run something else.

Interactive priority scheduling would be an interesting thing to =
follow up on.  A process that is blocking on the user's input =
should have a temporarily boosted priority so that when the input =
is available, it automatically preempts any process that is =
compute bound.  A compile running in the background should only =
get cycles when all interactive applications are blocked on user =
devices.  Perhapse processes could be classified "compute bound" =
if they last blocked on a non-user IO device, and "interactive" if =
they last blocked on mouse/keyboard.  If an interactive process =
goes it's full (generous) timeslice without blocking again, =
reclassify it until it again hits a user device.

>The language betrays the bias:
>        it gets rid of the need to fork processes just to watch
>        blocking files

Ok, sure, I admit a little bias.  I'm used to treating the OS as =
an enemy that, given a chance, does everything wrong :-)

All of my event issues would be resolved with two changes to the =
current interface:

The mouse device must buffer state transitions, so clicks are =
never missed.  This could be done transparently to current code.

A raw keyboard device would need to be created that includes key =
ups if available and time stamps the actions so they can be =
accurately interleaved with mouse events.

I might still make some weak protests about the flow of control =
through the system, but I wouldn't have much of a leg to stand on =
because functionally identical results could be obtained.

A raw mouse device (movement deltas only, no screen clamping) =
would be cool for games, but that's so esoteric that I wouldn't =
push for it.


(segueing into high performance user interface systems)

Most of the comments I am making are not specifically targeted at =
games, but for user interfaces in general.  I am in the middle of =
a major revamping of our map editor at the moment, so app =
interactivity is much on my mind.  Any app can benefit from a more =
responsive interface.  Apps just don't eat you if you are slow :-)

There is a nice constant in user interface speed:  If a user's =
action shows feedback on the video frame following the input, it =
is fast enough.  If not, there is room for improvement.

Computers should feel instant whenever possible.  This involves =
the event path, whatever processing is done, the speed of drawing, =
and the way the drawing is displayed.

I consider it a general truth that you shouldn't see the computer =
performing drawing operations, because it is an artifact of =
serialized rasterization.  Abstractly, a program describes a final =
view with drawing primitives, not a sequence of frames that varies =
based on the speed of the target computer and position of the crt =
raster, which is what you get when you draw or flush directly to =
visible display memory without proper syncronization.  (the one =
exception to the dont-show-the-drawing rule is when the drawing =
takes a long enough time that the user is feedback starved)

On SGI machines, the graphics hardware is very fast, with many UI =
tasks performed at video frame rates, but the drawing is usually =
visible to the user as bad flicker.  It looks messy.

On NEXTSTEP machines, the drawing is hidden by buffered windows, =
but the flush to screen is bus bandwidth limited, so large windows =
have a sluggish feel to them and dragging a window can often =
result in multiple tear lines.  Display postscript prevents NS =
from utilizing hardware acceleration in most cases.

(finaly getting to the plan9 relevent part)

I think that plan9 would be an excellent environment to write a =
video rate aware graphics/window system.

Seeing  direct manipulation UI events (full window drag, live =
scrolling, etc) take place at syncronized video frame rates would =
be a very cool experience.

Plan9 has already bit the bullet and allocated backing store for =
all of the window layers, which is usually a hard fight.  The =
memory cost is worth it to avoid expose events, and enabling all =
drawing operations to be performed in an undisplayed area (which =
plan9 does not currently do).

The plan9 drawing primitives map almost directly to common =
accelerator functions.

And finally, the scope of the graphics code is manageable and easy =
to deal with.

Sounds like a good little project for me.

There are two ways to get a totally seamless display update: back =
buffering with a raster synronized flush, and page flipping.  =
Digression: In some extreme programming forms (demo coding), =
drawing is sometimes performed in a controled enough fashion that =
it can be direct to screen and manage to never produce an =
inconsistant image by being totally aware of the relationship =
between the location of the drawing and the current position of =
the raster, but that isn't generaly useful.

Some versions of plan9 allready completely double buffer the =
screen in system memory.  Unfortunately, a large window can take =
more than an entire frames time to push over the PCI bus, so even =
if you synced with the raster, you would still get a partial =
update (not to mention spending all of your cpu time moving =
bytes).  Digression: it is possible to get perfect updates even if =
you are blitting at roughly half the speed of the raster by =
"chasing the raster" -- starting just behind it, and letting it =
run away from you, but if it doesn't lap you, the image comes out =
consistant.  If PCs had scan line interrupts, that would even be a =
practical thing to do...

The answer is to keep the window bitmaps in offscreen vram and =
have the accelerator do the pixel pushing.  All of the modern =
video cards support linear frame buffer mode, where you can look =
at the entire 2-4-8-whatever megs of memory in a single block.  No =
more godaweful banking schemes.  The drawback, of cource, is that =
you need twice as much memory on your video, at a minimum.  For a =
lot of people that's too big of a price to pay (and you are SOL if =
you want 1600*1280*32 bit), but instant video operations often =
make a bigger user-perceptible difference than faster processors.

The current generation of windows accelerators have vram to vram =
blits at speeds in excess of 100 megs / second, which is =
conveniently fast enough to copy an entire screen full of data at =
1280*1024*8 bit*76hz in a single video field.  Properly utilized, =
you should be able to drag a window around on the screen of ANY =
size, and have it updated rock solid every single single frame.  =
That would be COOL.

An interesting PC fact:  good video cards have significantly =
higher write bandwidth than most main memory systems (40 megs / =
sec vs 25 megs / sec is typical).  Its sad but true -- most =
graphics operations can be performed faster going over the PCI bus =
to an optimized memory system than staying on the local processor =
bus and going to the rather lame motherboard memory system.  If =
you can also avoid the flush to screen by page flipping, you are =
winning even bigger.  Read / modify / write operations to video =
card memory often fall over dead, though.

Digression 2:  the next generation of PCI video cards are going to =
support bus mastering, with the ability to pull pixels directly =
out of host memory at speeds of up to nearly 100 megs a second.  I =
doubt the main memory systems will be able to feed them that fast, =
though.  It will change a lot of design decisions.

There are two options on implementing this:  use two pages of =
video memory and have the accelerator move the visible parts of =
the window while the host flushes the exposed areas, or try to =
keep all active bitmaps in video memory and work on them in place =
so the update can also be done by the accelerator.

There are 8 meg video cards that could statically provide as much =
bitmap memory as plan9 currently allocates in kernel, but I'm =
pretty sure you would want to have a proper caching scheme in =
place to spill to system memory.

If the bitmaps-in-vram route was taken, you could either use the =
host cpu or the accelerator for any drawing.


I have actually started working towards this goal, but given the =
small number of hours I allow myself for playing on plan9, I =
wouldn't hold my breath for it.  After we ship quake...

I started out just wanting to add full window drag to 8.5, but it =
turns out that the layers library just is not freindly to that, =
because the bitmaps keep their coordinates in global screen space =
instead of having a local origin (the only window system I know of =
like that), so they can't really be moved.

To correct that, the virtualization of devbit will need to perform =
fixups to every coordinate that it gets and layers needs to be =
replaced.  If anything, the structure is getting simpler, because =
nothing needs to worry about if it is visible or not, it just all =
draws to the cache, and a final stage looks at the set of all =
visible windows to see what needs to go to the screen.


John Carmack
Id Software


--NeXT-Mail-540168154-1
Content-Type: text/enriched; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This message sat in a compose window for three days, and it grew, =
and grew, and grew...



>I much prefer the way that Plan 9 handles "events" already, that =
is,

>by using concurrency to handle multiple inputs rather than adding

>some mechanism for lumping them all together


>Consider where the mouse and keyboard events originate from.  =
They

>are two separate hardware devices, using two separate (usually)

>serial inputs to pass information to the terminal.  Any "proper

>event interleaving" is an illusion.


That keyboard and mouse are different input devices to the =
computer hardware is an artifact, not an excuse.  There is =
something to be said for running them over the same bus, like ADB.


Together (along with tablets, gloves, whatever), they constitute =
"the user's wishes" -- a single, sequenced stream of commands.  =
They are NOT independent streams where concurrency is apropriate.  =
Allowing them to slip reletive to each other is fairly analagous =
to allowing file seeks to slip reletive to file writes.  Bad =
Thing.  I'm not saying that the plan9 event system in unusable, =
just that is is nonoptimal enough to care.


>
<bold>=20
</bold>rob@plan9.att.com:

>Performance is not critical here:  with human-driven input,

>the extra context switch and system call required would be

>insignificant on modern machines, especially when compared to the

>generation of a 30Hz image.


On an unloaded system, I would agree, but the kernel to 8.5 slave =
process to 8.5 scheduler to your slave process to your main loop =
chain (seperately for mouse and keyboard) is plenty oportunity for =
the scheduler to decide to run something else.


Interactive priority scheduling would be an interesting thing to =
follow up on.  A process that is blocking on the user's input =
should have a temporarily boosted priority so that when the input =
is available, it automatically preempts any process that is =
compute bound.  A compile running in the background should only =
get cycles when all interactive applications are blocked on user =
devices.  Perhapse processes could be classified "compute bound" =
if they last blocked on a non-user IO device, and "interactive" if =
they last blocked on mouse/keyboard.  If an interactive process =
goes it's full (generous) timeslice without blocking again, =
reclassify it until it again hits a user device.


>The language betrays the bias:

>        it gets rid of the need to fork processes just to watch

>        blocking files


Ok, sure, I admit a little bias.  I'm used to treating the OS as =
an enemy that, given a chance, does everything wrong :-)


All of my event issues would be resolved with two changes to the =
current interface:


The mouse device must buffer state transitions, so clicks are =
never missed.  This could be done transparently to current code.


A raw keyboard device would need to be created that includes key =
ups if available and time stamps the actions so they can be =
accurately interleaved with mouse events.


I might still make some weak protests about the flow of control =
through the system, but I wouldn't have much of a leg to stand on =
because functionally identical results could be obtained.


A raw mouse device (movement deltas only, no screen clamping) =
would be cool for games, but that's so esoteric that I wouldn't =
push for it.



(segueing into high performance user interface systems)


Most of the comments I am making are not specifically targeted at =
games, but for user interfaces in general.  I am in the middle of =
a major revamping of our map editor at the moment, so app =
interactivity is much on my mind.  Any app can benefit from a more =
responsive interface.  Apps just don't eat you if you are slow :-)


There is a nice constant in user interface speed:  If a user's =
action shows feedback on the video frame following the input, it =
is fast enough.  If not, there is room for improvement.


Computers should feel instant whenever possible.  This involves =
the event path, whatever processing is done, the speed of drawing, =
and the way the drawing is displayed.


I consider it a general truth that you shouldn't see the computer =
performing drawing operations, because it is an artifact of =
serialized rasterization.  Abstractly, a program describes a final =
view with drawing primitives, not a sequence of frames that varies =
based on the speed of the target computer and position of the crt =
raster, which is what you get when you draw or flush directly to =
visible display memory without proper syncronization.  (the one =
exception to the dont-show-the-drawing rule is when the drawing =
takes a long enough time that the user is feedback starved)


On SGI machines, the graphics hardware is very fast, with many UI =
tasks performed at video frame rates, but the drawing is usually =
visible to the user as bad flicker.  It looks messy.


On NEXTSTEP machines, the drawing is hidden by buffered windows, =
but the flush to screen is bus bandwidth limited, so large windows =
have a sluggish feel to them and dragging a window can often =
result in multiple tear lines.  Display postscript prevents NS =
from utilizing hardware acceleration in most cases.


(finaly getting to the plan9 relevent part)


I think that plan9 would be an excellent environment to write a =
video rate aware graphics/window system.


Seeing  direct manipulation UI events (full window drag, live =
scrolling, etc) take place at syncronized video frame rates would =
be a very cool experience.


Plan9 has already bit the bullet and allocated backing store for =
all of the window layers, which is usually a hard fight.  The =
memory cost is worth it to avoid expose events, and enabling all =
drawing operations to be performed in an undisplayed area (which =
plan9 does not currently do).


The plan9 drawing primitives map almost directly to common =
accelerator functions.


And finally, the scope of the graphics code is manageable and easy =
to deal with.


Sounds like a good little project for me.


There are two ways to get a totally seamless display update
: back buffering
 with a raster synronized flush,
 and page flipping. =20
Digression:=20
In some extreme programming forms
 (
demo coding
)
, drawing is sometimes performed in a controled enough fashion =
that it can be direct to screen and manage to never produce an =
inconsistant image by being totally aware of the relationship =
between the location of the drawing and the current position of =
the raster, but that isn't generaly useful.


Some versions of plan9 allready completely double buffer the =
screen in system memory.  Unfortunately, a large window can take =
more than an entire frames time to push over the PCI bus, so even =
if you synced with the raster, you would still get a partial =
update (not to mention spending all of your cpu time moving =
bytes).  Digression: it is possible to get perfect updates even if =
you are blitting at roughly half the speed of the raster by =
"chasing the raster" -- starting just behind it, and letting it =
run away from you, but if it doesn't lap you, the image comes out =
consistant.  If PCs had scan line interrupts, that would even be a =
practical thing to do...


The answer is to keep the window bitmaps in offscreen vram and =
have the accelerator do the pixel pushing.  All of the modern =
video cards support linear frame buffer mode, where you can look =
at the entire 2-4-8-whatever megs of memory in a single block.  No =
more godaweful banking schemes.  The drawback, of cource, is that =
you need twice as much memory on your video, at a minimum.  For a =
lot of people that's too big of a price to pay (and you are SOL if =
you want 1600*1280*32 bit), but instant video operations often =
make a bigger user-perceptible difference than faster processors.


The current generation of windows accelerators have vram to vram =
blits at speeds in excess of 100 megs / second, which is =
conveniently fast enough to copy an entire screen full of data at =
1280*1024*8 bit*76hz in a single video field.  Properly utilized, =
you should be able to drag a window around on the screen of ANY =
size, and have it updated rock solid every single single frame.  =
That would be COOL.


An interesting PC fact:  good video cards have significantly =
higher write bandwidth than most main memory systems (40 megs / =
sec vs 25 megs / sec is typical).  Its sad but true -- most =
graphics operations can be performed faster going over the PCI bus =
to an optimized memory system than staying on the local processor =
bus and going to the rather lame motherboard memory system.  If =
you can also avoid the flush to screen by page flipping, you are =
winning even bigger.  Read / modify / write operations to video =
card memory often fall over dead, though.


Digression 2:  the next generation of PCI video cards are going to =
support bus mastering, with the ability to pull pixels directly =
out of host memory at speeds of up to nearly 100 megs a second.  I =
doubt the main memory systems will be able to feed them that fast, =
though.  It will change a lot of design decisions.


There are two options on implementing this:  use two pages of =
video memory and have the accelerator move the visible parts of =
the window while the host flushes the exposed areas, or try to =
keep all active bitmaps in video memory and work on them in place =
so the update can also be done by the accelerator.


There are 8 meg video cards that could statically provide as much =
bitmap memory as plan9 currently allocates in kernel, but I'm =
pretty sure you would want to have a proper caching scheme in =
place to spill to system memory.


If the bitmaps-in-vram route was taken, you could either use the =
host cpu or the accelerator for any drawing.



I have actually started working towards this goal, but given the =
small number of hours I allow myself for playing on plan9, I =
wouldn't hold my breath for it.  After we ship quake...


I started out just wanting to add full window drag to 8.5, but it =
turns out that the layers library just is not freindly to that, =
because the bitmaps keep their coordinates in global screen space =
instead of having a local origin (the only window system I know of =
like that), so they can't really be moved.


To correct that, the virtualization of devbit will need to perform =
fixups to every coordinate that it gets and layers needs to be =
replaced.  If anything, the structure is getting simpler, because =
nothing needs to worry about if it is visible or not, it just all =
draws to the cache, and a final stage looks at the set of all =
visible windows to see what needs to go to the screen.



John Carmack

Id Software



--NeXT-Mail-540168154-1--