i tried it myself: % for(i in 1 2 3 4){ time fcp sun.tgz /dev/null time cp sun.tgz /dev/null time hget http://plan9.bell-labs.com/magic/9down4e/compressed/1108754619.nm555mqv7uc7rvvyye52p4zcaeeziq2d/sun.tgz > /dev/null } 0.00u 0.01s 12.09r fcp sun.tgz /dev/null 0.00u 0.03s 30.37r cp sun.tgz /dev/null 0.03u 0.11s 11.93r hget http://plan9.bell-labs.com/magic/9down4e/compressed/1108754619.nm555mqv7uc7rvvyye52p4zcaeeziq2d/sun.tgz 0.00u 0.04s 12.16r fcp sun.tgz /dev/null 0.00u 0.00s 30.32r cp sun.tgz /dev/null 0.01u 0.06s 10.16r hget http://plan9.bell-labs.com/magic/9down4e/compressed/1108754619.nm555mqv7uc7rvvyye52p4zcaeeziq2d/sun.tgz 0.00u 0.04s 12.46r fcp sun.tgz /dev/null 0.00u 0.01s 30.24r cp sun.tgz /dev/null 0.08u 0.02s 9.71r hget http://plan9.bell-labs.com/magic/9down4e/compressed/1108754619.nm555mqv7uc7rvvyye52p4zcaeeziq2d/sun.tgz 0.00u 0.01s 11.86r fcp sun.tgz /dev/null 0.00u 0.03s 30.10r cp sun.tgz /dev/null 0.05u 0.07s 9.93r hget http://plan9.bell-labs.com/magic/9down4e/compressed/1108754619.nm555mqv7uc7rvvyye52p4zcaeeziq2d/sun.tgz overhead was averaging about 15% there. it seems it isn't nearly as bad as i remember, which is good! BTW, there's a bug in fcp; you need to malloc the buffer separately inside each thread, otherwise you get data corruption.