From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 9 Jan 2011 12:06:21 -0500 To: 9fans@9fans.net Message-ID: <16094d5a594bfa72dd0e9ac6f3f8b31c@plug.quanstro.net> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: [9fans] fs performance Topicbox-Message-UUID: 92b300f8-ead6-11e9-9d60-3106f5b1d025 the new auth server, which uses the fs as its root rather than a stand-alone fs, happens to be faster than our now-old cpu server, so i did a quick build test with a kernel including the massive-fw myricom driver. suspecting that latency kills even on 10gbe, i tried a second build with NPROC=24. a table comparing ken fs, fossil+venti, and ramfs follows. unfortunately, i was not able to use the same system for the fossil+venti tests, but there's a ramfs test on the same system to bring things into perspective due to the large differences in processor generation, network, &c. here's an example test: tyty; echo $NPROC 4 tyty; time mk>/dev/null && mk clean>/dev/null 2.93u 1.30s 3.36r mk tyty; NPROC=24 time mk >/dev/null && mk clean>/dev/null 1.32u 0.22s 2.29r mk and here are the compiled results: a Intel(R) Xeon(R) CPU X5550 @ 2.67GHz 4 active cores (8 threads; 4 enabled); http://ark.intel.com/Product.aspx?id=35365 intel 82598 10gbe nic; fs has myricom 10gbe nic; 54µs latency b Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz 4 active cores (4 threads; 4 enabled); http://www.intel.com/p/en_US/products/server/processor/xeon5000/specifications intel 82563-style gbe nic; 70µs latency mach fs nproc time a ken 4 2.93u 1.30s 3.36r mk 24 1.32u 0.22s 2.29r mk ramfs 4 3.10u 1.67s 3.01r mk 24 2.98u 1.23s 2.42r mk b venti 4 2.65u 3.44s 21.46r mk 24 2.98u 3.56s 21.58r mk ramfs 4 3.55u 2.22s 9.08r mk 24 3.50u 2.67s 9.41r mk it's interesting that neither venti nor ramfs get any faster on machine b with NPROCS set to 24, but both get faster on machine a and the fastest time of all is not ramfs, but ken's fs with NPROC=24. so i suppose the 64-bit question is, is that because moving data in and out of user space is slower than 10gbe, or because ramfs is single threaded and slow? in any event, it's clear that if the fs is good, latency can kill even on 10gbe lan. it would naively seem to me that using the Tstream model would be too expensive, requiring thousands of new streams, and require modifying at least 8c, 8l, mk, rc, awk (what am i forgetting?). but it would be worth a test. - erik