From mboxrd@z Thu Jan 1 00:00:00 1970 References: <103b012cdb5c55bca54c751bb88fb57c@brasstown.quanstro.net> <787846152d3d67fec1b9cb19e132bfb9@harness.quanstro.net> In-Reply-To: <787846152d3d67fec1b9cb19e132bfb9@harness.quanstro.net> Mime-Version: 1.0 (iPhone Mail 8L1) Content-Type: text/plain; charset=us-ascii Message-Id: Content-Transfer-Encoding: quoted-printable Cc: "9fans@9fans.net" <9fans@9fans.net> From: Nemo Date: Fri, 16 Sep 2011 09:02:00 +0200 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Subject: Re: [9fans] gar nix! Topicbox-Message-UUID: 22fde70e-ead7-11e9-9d60-3106f5b1d025 im working in smp.=20 On Sep 16, 2011, at 8:46 AM, erik quanstrom wrote: > hey, ron. >=20 > On Fri Sep 16 01:57:04 EDT 2011, rminnich@gmail.com wrote: >> for the 2M pages -- I'm willing to see some measurement but let's get >> the #s -- I've done some simple measurements and it's not the hit one >> would expect. These new machines have about 10 GB/s bandwidth (well, >> the ones we are targeting do) and that translates to sub-millisecond >> times to zero a 2M page. Further, the text page is in the image cache. >> So after first exec of a program, the only text issue is locating the >> page. It's not simply a case of having to write 6M each time you exec. >=20 > however, neither the stack nor the heap are. that's 4MB that need to be > cleared. that sounds like an operation that could take on the order of > ms, and well-worth measuring. >=20 > maybe it might make sense to use 4k pages for stack, and sometime for > the heap. >=20 >> I note that starting a proc, allocating and zeroing 2 GiB, takes >> *less* time with 2M pages than 4K pages -- this was measured in May >> when we still were supporting 4K pages -- the page faults are far more >=20 > i'll try to get some numbers soon, but i think a higher priority is to get= > a smp setup. is anyone testing with smp right now? >=20 > are you sure this isn't the difference between throughput and latency? > did you try a small-executable test like > for(i in `{seq 1 1000000})dd -if /dev/zero -of /dev/null -quiet 1 > ? now that the code has been removed, it's a little harder to replicate y= our > numbers. >=20 >> expensive than the time to write the memory. Again, YMMV, esp. on an >> Atom, but the cost of taking (say) 6 page faults for a 24k text >> segment that's already in memory may not be what you want. >>=20 >> There are plenty of games to be played to reduce the cost of zero >> filled pages but at least from what I measured the 2M pages are not a >> real hit. >=20 > i'm okay with the atom suffering a little bit (odd how far down the food > chain one can get 64 bits!), i'm actually more concerned about > being punshed severely for forking on a beefy but busy machine. > the atom is just my test mule. >=20 > (can't one just preemptively map the whole text on first-fault?) >=20 > - erik >=20 > p.s. i guess the claim i thought i saw that you need 1gb pages > isn't correct. that's good, but i need to track down why i don't see 'em o= n the atom. >=20