From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Mon, 26 May 2014 13:14:24 -0400
To: 9fans@9fans.net
Message-ID: <2f40e8ca50e83137d89718948b1b2c4b@brasstown.quanstro.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: [9fans] nix scheduler changes
Topicbox-Message-UUID: f14316a6-ead8-11e9-9d60-3106f5b1d025

so, i've done a little bit more work characterizing the performance
of the scheduler correctness changes, and i know have some understanding
on why e.g. ping times are a bit slower.

the old code essentially let processor 0 spin in runproc, other processor=
s called
halt.  the new code uses monmwait to wait for a change on all processors.
this has some significant impacts on performance and power use.  for exam=
ple,
on my test box with 4c/8t:

	spin/halt		monmwait	spin/monmwait
ping	8=C2=B5s		14=C2=B5s		8=C2=B5s		# ip/ping -n10 $sysname
mk	6.26s		3.98s		3.80		# make nix kernel
fans	audible		silent		audible
=CE=B4power	-		-24w		0		# resolution =3D .1A =3D 12w @ 120v)

this seems to indicate the latency is all in runproc(), and not waiting f=
or things
to be ready and assuming they will be has a big performance boost.

(the third column, testing spin on mach 0, plus monmwait on the others wa=
s done
to tell if monmwait has high latency or not.)

i'd really be interested to see what this does on 24c/48t machines.  some=
thing
tells me the performance impacts would be huge, and different.

- erik

---
ps. hzsched in the distribution is 10% off for HZ=3D100, since
schedticks =3D m->ticks + HZ/10, and delaysched tests
for > not the expected >=3D.