9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] bochs still no go
@ 2001-12-10 22:19 Matt
  2001-12-10 23:35 ` Mike Haertel
  0 siblings, 1 reply; 4+ messages in thread
From: Matt @ 2001-12-10 22:19 UTC (permalink / raw)
  To: 9fans

I tried bochs again with the lastest update

these days it fails with

00272198060 [CPU ] RDMSR: not implemented yet
00272198060 [CPU ] UndefinedOpcode: 132 causes exception 6

oh well


matt


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] bochs still no go
  2001-12-10 22:19 [9fans] bochs still no go Matt
@ 2001-12-10 23:35 ` Mike Haertel
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Haertel @ 2001-12-10 23:35 UTC (permalink / raw)
  To: 9fans

If "RDMSR" is being used to read the time stamp counter,
it should be replaced with RDTSC (0x0F 0x31).  RDMSR is
a much slower instruction.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] bochs still no go
  2001-12-11  3:25 Russ Cox
@ 2001-12-11  8:01 ` Mike Haertel
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Haertel @ 2001-12-11  8:01 UTC (permalink / raw)
  To: 9fans

>> If "RDMSR" is being used to read the time stamp counter,
>> it should be replaced with RDTSC (0x0F 0x31).  RDMSR is
>> a much slower instruction.
>
>That's not at all clear.  I bet they're approximately
>the same on real hardware.  RDMSR is much slower under
>VMware because it requires trapping into the VMware
>runtime, while RDTSC, an unprivileged instruction, does not.

Ok, I'll admit to a bit of an unfair advantage on this issue: I
can't speak for AMD processors, but I used to work at Intel, as an
architect on the team that did the Pentium Pro and Pentium 4
processors.  I've seen the microcode, and I can assure you that on
Intel processors RDMSR is indeed substantially slower.

The reason is that many of the so-called "machine-specific registers"
that you can read by RDMSR don't really exist as registers in
the hardware at all; instead they are just magic numbers specifying
particular values that the processor microcode can put together
for you by poking around at bits and pieces of internal state
that are often widely distributed throughout the hardware.

So the processor's microcode for the RDMSR instruction is roughly
equivalent to the following C fragment:

	RDMSR:
		if (not in kernel mode)
			fault;
		switch (ecx) {
		...
		case 0x10:
			copy the time stamp counter to (eax:edx);
			break;
		...
		}

whereas the microcode for RDTSC is just:

	RDTSC:
		copy the time stamp counter to (eax:edx);

On Intel processors, an indirect jump in the microcode (the switch)
is guaranteed to be mispredicted, since the usual branch prediction
mechanisms for macroinstruction branches do not apply to microcode
branches (and especially not microcode indirect jumps), so at minimum
RDMSR causes the pipeline to get flushed at least one extra time.
In addition RDMSR is specified to be a "serializing instruction",
which means that the pipeline is drained of older instructions
before the first microinstruction of RDMSR even starts executing.

On x86 processors with RDTSC, you can get pretty high precision
timing for even very fast operations with the following approach:
	x = rdtsc();
	y = rdtsc();
	thing_you_want_to_measure();
	z = rdtsc();
	cycles = (z - y) - (y - x);
(The idea is the "y - x" subtracts out the time required by RDTSC itself.)

Using this method on a Pentium III, I measured RDMSR with ecx == 0x10
to require ~90 cycles, and RDTSC to require "only" ~30 cycles.  The timing
will be similar or identical on the rest of the P6 family (Pentium Pro,
Pentium II, Celeron).

I don't have a Pentium 4 handy try this on, but I expect the performance
difference between RDMSR and RDTSC would be even more pronounced due
to the deeper pipeline among other things.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] bochs still no go
@ 2001-12-11  3:25 Russ Cox
  2001-12-11  8:01 ` Mike Haertel
  0 siblings, 1 reply; 4+ messages in thread
From: Russ Cox @ 2001-12-11  3:25 UTC (permalink / raw)
  To: 9fans

> If "RDMSR" is being used to read the time stamp counter,
> it should be replaced with RDTSC (0x0F 0x31).  RDMSR is
> a much slower instruction.

That's not at all clear.  I bet they're approximately
the same on real hardware.  RDMSR is much slower under
VMware because it requires trapping into the VMware
runtime, while RDTSC, an unprivileged instruction, does not.

Since Bochs isn't using the underlying machine to
execute _any_ instructions natively, I would be
hesitant to draw speed comparisons.  Of course,
it appears that RDMSR is unsupported, but that's
different.  We use RDMSR for things other than
reading the time stamp counter, though.

Russ


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-12-11  8:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-10 22:19 [9fans] bochs still no go Matt
2001-12-10 23:35 ` Mike Haertel
2001-12-11  3:25 Russ Cox
2001-12-11  8:01 ` Mike Haertel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).