i was tracking down a locking issue that was showing up as alarm() returning too late by up to 1/2 a second. this is not an issue in stock plan 9, because alarm processing happens on all processors. this gets around locking if you have more than 1 cpu, and at the expense of hammering the cachelines related to alarm. i'd be interested in timings done on 24-core amd machines. anyway, the problem is due to the suprisingly slow cga console. these timings are based on rdtsc() subtractions around the named areas for the simple test of cat'ing /lib/pci to the console. they are huge: cycles printing chars 482510340 scrolling 135746656968 total 137112900000 by introducing a frame buffer to avoid reading from the cga console for scrolling (a guess based on problems with graphics performance), we get about a 10x improvement: printing normal chars 1080381568 scrolling 12046340760 total 13610262120 by guessing that any string >40 bytes is likely to induce scrolling, we can redraw the whole screen once we're done. this gives us 100/1000x improvment on our hot spots, but just 7x in run time. printing chars 33186956 scrolling 24111480 total 1854594800 this looks like about all we can do. by the way, is there a reason to not use the cycle counter on archtectures that support it, or is there a reason to still maintain sys->ticks / MACHP(0)->ticks by using the clock interrupt in the portable code? the alarm test program is attached. the results should be interesting. it does assume that HZ=1000. - erik