From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 4 Sep 2011 14:34:56 -0400 To: steve@quintile.net, 9fans@9fans.net Message-ID: <984d0269db1dd72f1ba77c875b1961f6@ladd.quanstro.net> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [9fans] high precision timings Topicbox-Message-UUID: 18e0a0e0-ead7-11e9-9d60-3106f5b1d025 On Sun Sep 4 13:48:31 EDT 2011, steve@quintile.net wrote: > after the recent discussions on nsec()... >=20 > does anyone already have the snippet of code to do fine grained > timeings on the x86 platform using the hardware performance counters? >=20 > I would use nsec() but I'am timing systemcalls so I expect my results > would be swamped by nsec()'s performance. i wrote up a little demo using a varient of nsec and using the x86 cycle counter, RDTSC. the source is in /n/sources/contrib/quanstro/highprec. i'd recommend doing timings on your particular hardware. here are my results: ; aux/cpuid -i AMD Phenom(tm) II X4 965 Processor ; 8.out nsec latency 25729ns nsec latency 24554ns cycle hz =3D 3393000000 cycles latency 88 cycles; 25 ns cycles latency 78 cycles; 22 ns ladd; aux/cpuid -i Intel(R) Atom(TM) CPU 330 @ 1.60GHz ladd; 8.out nsec latency 39501ns nsec latency 38901ns cycle hz =3D 1604000000 cycles latency 60 cycles; 37 ns cycles latency 48 cycles; 29 ns new; aux/cpuid -i Intel(R) Xeon(R) CPU E31220 @ 3.10GHz new; 8.out nsec latency 8591ns nsec latency 9155ns cycle hz =3D 3105000000 cycles latency 28 cycles; 9 ns cycles latency 28 cycles; 9 ns chula; aux/cpuid -i Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz chula; 8.out nsec latency 14319ns nsec latency 14451ns cycle hz =3D 2660000000 cycles latency 40 cycles; 15 ns cycles latency 32 cycles; 12 ns it seems like you can get =C2=B110ns at a few 10s of ns latency with _cycles and =C2=B110=C2=B5s at a few 10s of =C2=B5s latency with /dev/bintime. - erik