From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 17 Nov 1995 06:05:23 -0500 From: John Carmack johnc@idnewt.idsoftware.com Subject: Graphics issues Topicbox-Message-UUID: 34efe7f0-eac8-11e9-9e20-41e7f4b1d025 Message-ID: <19951117110523.2Uw6btemNVaMAiA02YBEfj1P-l3eeHBAWzWz0Ih2Lv8@z> --NeXT-Mail-540168154-1 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable This message sat in a compose window for three days, and it grew, = and grew, and grew... >I much prefer the way that Plan 9 handles "events" already, that = is, >by using concurrency to handle multiple inputs rather than adding >some mechanism for lumping them all together >Consider where the mouse and keyboard events originate from. = They >are two separate hardware devices, using two separate (usually) >serial inputs to pass information to the terminal. Any "proper >event interleaving" is an illusion. That keyboard and mouse are different input devices to the = computer hardware is an artifact, not an excuse. There is = something to be said for running them over the same bus, like ADB. Together (along with tablets, gloves, whatever), they constitute = "the user's wishes" -- a single, sequenced stream of commands. = They are NOT independent streams where concurrency is apropriate. = Allowing them to slip reletive to each other is fairly analagous = to allowing file seeks to slip reletive to file writes. Bad = Thing. I'm not saying that the plan9 event system in unusable, = just that is is nonoptimal enough to care. > rob@plan9.att.com: >Performance is not critical here: with human-driven input, >the extra context switch and system call required would be >insignificant on modern machines, especially when compared to the >generation of a 30Hz image. On an unloaded system, I would agree, but the kernel to 8.5 slave = process to 8.5 scheduler to your slave process to your main loop = chain (seperately for mouse and keyboard) is plenty oportunity for = the scheduler to decide to run something else. Interactive priority scheduling would be an interesting thing to = follow up on. A process that is blocking on the user's input = should have a temporarily boosted priority so that when the input = is available, it automatically preempts any process that is = compute bound. A compile running in the background should only = get cycles when all interactive applications are blocked on user = devices. Perhapse processes could be classified "compute bound" = if they last blocked on a non-user IO device, and "interactive" if = they last blocked on mouse/keyboard. If an interactive process = goes it's full (generous) timeslice without blocking again, = reclassify it until it again hits a user device. >The language betrays the bias: > it gets rid of the need to fork processes just to watch > blocking files Ok, sure, I admit a little bias. I'm used to treating the OS as = an enemy that, given a chance, does everything wrong :-) All of my event issues would be resolved with two changes to the = current interface: The mouse device must buffer state transitions, so clicks are = never missed. This could be done transparently to current code. A raw keyboard device would need to be created that includes key = ups if available and time stamps the actions so they can be = accurately interleaved with mouse events. I might still make some weak protests about the flow of control = through the system, but I wouldn't have much of a leg to stand on = because functionally identical results could be obtained. A raw mouse device (movement deltas only, no screen clamping) = would be cool for games, but that's so esoteric that I wouldn't = push for it. (segueing into high performance user interface systems) Most of the comments I am making are not specifically targeted at = games, but for user interfaces in general. I am in the middle of = a major revamping of our map editor at the moment, so app = interactivity is much on my mind. Any app can benefit from a more = responsive interface. Apps just don't eat you if you are slow :-) There is a nice constant in user interface speed: If a user's = action shows feedback on the video frame following the input, it = is fast enough. If not, there is room for improvement. Computers should feel instant whenever possible. This involves = the event path, whatever processing is done, the speed of drawing, = and the way the drawing is displayed. I consider it a general truth that you shouldn't see the computer = performing drawing operations, because it is an artifact of = serialized rasterization. Abstractly, a program describes a final = view with drawing primitives, not a sequence of frames that varies = based on the speed of the target computer and position of the crt = raster, which is what you get when you draw or flush directly to = visible display memory without proper syncronization. (the one = exception to the dont-show-the-drawing rule is when the drawing = takes a long enough time that the user is feedback starved) On SGI machines, the graphics hardware is very fast, with many UI = tasks performed at video frame rates, but the drawing is usually = visible to the user as bad flicker. It looks messy. On NEXTSTEP machines, the drawing is hidden by buffered windows, = but the flush to screen is bus bandwidth limited, so large windows = have a sluggish feel to them and dragging a window can often = result in multiple tear lines. Display postscript prevents NS = from utilizing hardware acceleration in most cases. (finaly getting to the plan9 relevent part) I think that plan9 would be an excellent environment to write a = video rate aware graphics/window system. Seeing direct manipulation UI events (full window drag, live = scrolling, etc) take place at syncronized video frame rates would = be a very cool experience. Plan9 has already bit the bullet and allocated backing store for = all of the window layers, which is usually a hard fight. The = memory cost is worth it to avoid expose events, and enabling all = drawing operations to be performed in an undisplayed area (which = plan9 does not currently do). The plan9 drawing primitives map almost directly to common = accelerator functions. And finally, the scope of the graphics code is manageable and easy = to deal with. Sounds like a good little project for me. There are two ways to get a totally seamless display update: back = buffering with a raster synronized flush, and page flipping. = Digression: In some extreme programming forms (demo coding), = drawing is sometimes performed in a controled enough fashion that = it can be direct to screen and manage to never produce an = inconsistant image by being totally aware of the relationship = between the location of the drawing and the current position of = the raster, but that isn't generaly useful. Some versions of plan9 allready completely double buffer the = screen in system memory. Unfortunately, a large window can take = more than an entire frames time to push over the PCI bus, so even = if you synced with the raster, you would still get a partial = update (not to mention spending all of your cpu time moving = bytes). Digression: it is possible to get perfect updates even if = you are blitting at roughly half the speed of the raster by = "chasing the raster" -- starting just behind it, and letting it = run away from you, but if it doesn't lap you, the image comes out = consistant. If PCs had scan line interrupts, that would even be a = practical thing to do... The answer is to keep the window bitmaps in offscreen vram and = have the accelerator do the pixel pushing. All of the modern = video cards support linear frame buffer mode, where you can look = at the entire 2-4-8-whatever megs of memory in a single block. No = more godaweful banking schemes. The drawback, of cource, is that = you need twice as much memory on your video, at a minimum. For a = lot of people that's too big of a price to pay (and you are SOL if = you want 1600*1280*32 bit), but instant video operations often = make a bigger user-perceptible difference than faster processors. The current generation of windows accelerators have vram to vram = blits at speeds in excess of 100 megs / second, which is = conveniently fast enough to copy an entire screen full of data at = 1280*1024*8 bit*76hz in a single video field. Properly utilized, = you should be able to drag a window around on the screen of ANY = size, and have it updated rock solid every single single frame. = That would be COOL. An interesting PC fact: good video cards have significantly = higher write bandwidth than most main memory systems (40 megs / = sec vs 25 megs / sec is typical). Its sad but true -- most = graphics operations can be performed faster going over the PCI bus = to an optimized memory system than staying on the local processor = bus and going to the rather lame motherboard memory system. If = you can also avoid the flush to screen by page flipping, you are = winning even bigger. Read / modify / write operations to video = card memory often fall over dead, though. Digression 2: the next generation of PCI video cards are going to = support bus mastering, with the ability to pull pixels directly = out of host memory at speeds of up to nearly 100 megs a second. I = doubt the main memory systems will be able to feed them that fast, = though. It will change a lot of design decisions. There are two options on implementing this: use two pages of = video memory and have the accelerator move the visible parts of = the window while the host flushes the exposed areas, or try to = keep all active bitmaps in video memory and work on them in place = so the update can also be done by the accelerator. There are 8 meg video cards that could statically provide as much = bitmap memory as plan9 currently allocates in kernel, but I'm = pretty sure you would want to have a proper caching scheme in = place to spill to system memory. If the bitmaps-in-vram route was taken, you could either use the = host cpu or the accelerator for any drawing. I have actually started working towards this goal, but given the = small number of hours I allow myself for playing on plan9, I = wouldn't hold my breath for it. After we ship quake... I started out just wanting to add full window drag to 8.5, but it = turns out that the layers library just is not freindly to that, = because the bitmaps keep their coordinates in global screen space = instead of having a local origin (the only window system I know of = like that), so they can't really be moved. To correct that, the virtualization of devbit will need to perform = fixups to every coordinate that it gets and layers needs to be = replaced. If anything, the structure is getting simpler, because = nothing needs to worry about if it is visible or not, it just all = draws to the cache, and a final stage looks at the set of all = visible windows to see what needs to go to the screen. John Carmack Id Software --NeXT-Mail-540168154-1 Content-Type: text/enriched; charset=us-ascii Content-Transfer-Encoding: quoted-printable This message sat in a compose window for three days, and it grew, = and grew, and grew... >I much prefer the way that Plan 9 handles "events" already, that = is, >by using concurrency to handle multiple inputs rather than adding >some mechanism for lumping them all together >Consider where the mouse and keyboard events originate from. = They >are two separate hardware devices, using two separate (usually) >serial inputs to pass information to the terminal. Any "proper >event interleaving" is an illusion. That keyboard and mouse are different input devices to the = computer hardware is an artifact, not an excuse. There is = something to be said for running them over the same bus, like ADB. Together (along with tablets, gloves, whatever), they constitute = "the user's wishes" -- a single, sequenced stream of commands. = They are NOT independent streams where concurrency is apropriate. = Allowing them to slip reletive to each other is fairly analagous = to allowing file seeks to slip reletive to file writes. Bad = Thing. I'm not saying that the plan9 event system in unusable, = just that is is nonoptimal enough to care. > =20 rob@plan9.att.com: >Performance is not critical here: with human-driven input, >the extra context switch and system call required would be >insignificant on modern machines, especially when compared to the >generation of a 30Hz image. On an unloaded system, I would agree, but the kernel to 8.5 slave = process to 8.5 scheduler to your slave process to your main loop = chain (seperately for mouse and keyboard) is plenty oportunity for = the scheduler to decide to run something else. Interactive priority scheduling would be an interesting thing to = follow up on. A process that is blocking on the user's input = should have a temporarily boosted priority so that when the input = is available, it automatically preempts any process that is = compute bound. A compile running in the background should only = get cycles when all interactive applications are blocked on user = devices. Perhapse processes could be classified "compute bound" = if they last blocked on a non-user IO device, and "interactive" if = they last blocked on mouse/keyboard. If an interactive process = goes it's full (generous) timeslice without blocking again, = reclassify it until it again hits a user device. >The language betrays the bias: > it gets rid of the need to fork processes just to watch > blocking files Ok, sure, I admit a little bias. I'm used to treating the OS as = an enemy that, given a chance, does everything wrong :-) All of my event issues would be resolved with two changes to the = current interface: The mouse device must buffer state transitions, so clicks are = never missed. This could be done transparently to current code. A raw keyboard device would need to be created that includes key = ups if available and time stamps the actions so they can be = accurately interleaved with mouse events. I might still make some weak protests about the flow of control = through the system, but I wouldn't have much of a leg to stand on = because functionally identical results could be obtained. A raw mouse device (movement deltas only, no screen clamping) = would be cool for games, but that's so esoteric that I wouldn't = push for it. (segueing into high performance user interface systems) Most of the comments I am making are not specifically targeted at = games, but for user interfaces in general. I am in the middle of = a major revamping of our map editor at the moment, so app = interactivity is much on my mind. Any app can benefit from a more = responsive interface. Apps just don't eat you if you are slow :-) There is a nice constant in user interface speed: If a user's = action shows feedback on the video frame following the input, it = is fast enough. If not, there is room for improvement. Computers should feel instant whenever possible. This involves = the event path, whatever processing is done, the speed of drawing, = and the way the drawing is displayed. I consider it a general truth that you shouldn't see the computer = performing drawing operations, because it is an artifact of = serialized rasterization. Abstractly, a program describes a final = view with drawing primitives, not a sequence of frames that varies = based on the speed of the target computer and position of the crt = raster, which is what you get when you draw or flush directly to = visible display memory without proper syncronization. (the one = exception to the dont-show-the-drawing rule is when the drawing = takes a long enough time that the user is feedback starved) On SGI machines, the graphics hardware is very fast, with many UI = tasks performed at video frame rates, but the drawing is usually = visible to the user as bad flicker. It looks messy. On NEXTSTEP machines, the drawing is hidden by buffered windows, = but the flush to screen is bus bandwidth limited, so large windows = have a sluggish feel to them and dragging a window can often = result in multiple tear lines. Display postscript prevents NS = from utilizing hardware acceleration in most cases. (finaly getting to the plan9 relevent part) I think that plan9 would be an excellent environment to write a = video rate aware graphics/window system. Seeing direct manipulation UI events (full window drag, live = scrolling, etc) take place at syncronized video frame rates would = be a very cool experience. Plan9 has already bit the bullet and allocated backing store for = all of the window layers, which is usually a hard fight. The = memory cost is worth it to avoid expose events, and enabling all = drawing operations to be performed in an undisplayed area (which = plan9 does not currently do). The plan9 drawing primitives map almost directly to common = accelerator functions. And finally, the scope of the graphics code is manageable and easy = to deal with. Sounds like a good little project for me. There are two ways to get a totally seamless display update : back buffering with a raster synronized flush, and page flipping. =20 Digression:=20 In some extreme programming forms ( demo coding ) , drawing is sometimes performed in a controled enough fashion = that it can be direct to screen and manage to never produce an = inconsistant image by being totally aware of the relationship = between the location of the drawing and the current position of = the raster, but that isn't generaly useful. Some versions of plan9 allready completely double buffer the = screen in system memory. Unfortunately, a large window can take = more than an entire frames time to push over the PCI bus, so even = if you synced with the raster, you would still get a partial = update (not to mention spending all of your cpu time moving = bytes). Digression: it is possible to get perfect updates even if = you are blitting at roughly half the speed of the raster by = "chasing the raster" -- starting just behind it, and letting it = run away from you, but if it doesn't lap you, the image comes out = consistant. If PCs had scan line interrupts, that would even be a = practical thing to do... The answer is to keep the window bitmaps in offscreen vram and = have the accelerator do the pixel pushing. All of the modern = video cards support linear frame buffer mode, where you can look = at the entire 2-4-8-whatever megs of memory in a single block. No = more godaweful banking schemes. The drawback, of cource, is that = you need twice as much memory on your video, at a minimum. For a = lot of people that's too big of a price to pay (and you are SOL if = you want 1600*1280*32 bit), but instant video operations often = make a bigger user-perceptible difference than faster processors. The current generation of windows accelerators have vram to vram = blits at speeds in excess of 100 megs / second, which is = conveniently fast enough to copy an entire screen full of data at = 1280*1024*8 bit*76hz in a single video field. Properly utilized, = you should be able to drag a window around on the screen of ANY = size, and have it updated rock solid every single single frame. = That would be COOL. An interesting PC fact: good video cards have significantly = higher write bandwidth than most main memory systems (40 megs / = sec vs 25 megs / sec is typical). Its sad but true -- most = graphics operations can be performed faster going over the PCI bus = to an optimized memory system than staying on the local processor = bus and going to the rather lame motherboard memory system. If = you can also avoid the flush to screen by page flipping, you are = winning even bigger. Read / modify / write operations to video = card memory often fall over dead, though. Digression 2: the next generation of PCI video cards are going to = support bus mastering, with the ability to pull pixels directly = out of host memory at speeds of up to nearly 100 megs a second. I = doubt the main memory systems will be able to feed them that fast, = though. It will change a lot of design decisions. There are two options on implementing this: use two pages of = video memory and have the accelerator move the visible parts of = the window while the host flushes the exposed areas, or try to = keep all active bitmaps in video memory and work on them in place = so the update can also be done by the accelerator. There are 8 meg video cards that could statically provide as much = bitmap memory as plan9 currently allocates in kernel, but I'm = pretty sure you would want to have a proper caching scheme in = place to spill to system memory. If the bitmaps-in-vram route was taken, you could either use the = host cpu or the accelerator for any drawing. I have actually started working towards this goal, but given the = small number of hours I allow myself for playing on plan9, I = wouldn't hold my breath for it. After we ship quake... I started out just wanting to add full window drag to 8.5, but it = turns out that the layers library just is not freindly to that, = because the bitmaps keep their coordinates in global screen space = instead of having a local origin (the only window system I know of = like that), so they can't really be moved. To correct that, the virtualization of devbit will need to perform = fixups to every coordinate that it gets and layers needs to be = replaced. If anything, the structure is getting simpler, because = nothing needs to worry about if it is visible or not, it just all = draws to the cache, and a final stage looks at the set of all = visible windows to see what needs to go to the screen. John Carmack Id Software --NeXT-Mail-540168154-1--