[9fans] cache lines, and 60000 cycles of doom

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] cache lines, and 60000 cycles of doom
@ 2014-06-19 16:01 erik quanstrom
  2014-06-20  5:03 ` Devon H. O'Dell
  2014-06-20  5:09 ` Bakul Shah
  0 siblings, 2 replies; 7+ messages in thread
From: erik quanstrom @ 2014-06-19 16:01 UTC (permalink / raw)
  To: 9fans

i'm seeing some mighty interesting timing on my intel ivy bridge.
i found a bug in the file server aoe implementation (can't happen
if you're using the uniprocessor x86 version) that happens because
the Srb is freed before wakeup completes.  to solve this there is
some code that sets the state (this is from ken's ancient scheduler,
by way of sape)

	wakeup(&srb);
	srb->state = Free;

code that receives it is like this

	sleep(&srb, srbdone, srb);
	cycles(&t0);
	for(n = 0; srb->state != Free; n++){
		if(srb->wmach == m->machno)
			sched();
		else
			monmwait(&srb->state, Alloc);
	}
	cycles(&t1);
	free(srb);

the astounding thing is that t1-t0 is often ~ 60,000 cycles.
it only hits a small fraction of the time, and the average is
much lower.  but that just blows the mind.  60000 cycles!

(other versions with sched were much worse.)

as far as i can tell, there are no funny bits in the scheduler that
can cause this, and no wierd scheduling is going on.

i'm baffled.

- erik

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] cache lines, and 60000 cycles of doom
  2014-06-19 16:01 [9fans] cache lines, and 60000 cycles of doom erik quanstrom
@ 2014-06-20  5:03 ` Devon H. O'Dell
  2014-06-20 11:50   ` erik quanstrom
  2014-06-20  5:09 ` Bakul Shah
  1 sibling, 1 reply; 7+ messages in thread
From: Devon H. O'Dell @ 2014-06-20  5:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]

Weird. I assume cycles is using rdtsc or rdtscp. Perhaps some of it is due
to a combination of contention and rdtsc(p) being serializing instructions?
On Jun 19, 2014 12:04 PM, "erik quanstrom" <quanstro@quanstro.net> wrote:

> i'm seeing some mighty interesting timing on my intel ivy bridge.
> i found a bug in the file server aoe implementation (can't happen
> if you're using the uniprocessor x86 version) that happens because
> the Srb is freed before wakeup completes.  to solve this there is
> some code that sets the state (this is from ken's ancient scheduler,
> by way of sape)
>
>         wakeup(&srb);
>         srb->state = Free;
>
> code that receives it is like this
>
>         sleep(&srb, srbdone, srb);
>         cycles(&t0);
>         for(n = 0; srb->state != Free; n++){
>                 if(srb->wmach == m->machno)
>                         sched();
>                 else
>                         monmwait(&srb->state, Alloc);
>         }
>         cycles(&t1);
>         free(srb);
>
> the astounding thing is that t1-t0 is often ~ 60,000 cycles.
> it only hits a small fraction of the time, and the average is
> much lower.  but that just blows the mind.  60000 cycles!
>
> (other versions with sched were much worse.)
>
> as far as i can tell, there are no funny bits in the scheduler that
> can cause this, and no wierd scheduling is going on.
>
> i'm baffled.
>
> - erik
>
>

[-- Attachment #2: Type: text/html, Size: 1898 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] cache lines, and 60000 cycles of doom
  2014-06-19 16:01 [9fans] cache lines, and 60000 cycles of doom erik quanstrom
  2014-06-20  5:03 ` Devon H. O'Dell
@ 2014-06-20  5:09 ` Bakul Shah
  2014-06-20 11:45   ` erik quanstrom
  1 sibling, 1 reply; 7+ messages in thread
From: Bakul Shah @ 2014-06-20  5:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, 19 Jun 2014 12:01:10 EDT erik quanstrom <quanstro@quanstro.net> wrote:
> i'm seeing some mighty interesting timing on my intel ivy bridge.
> i found a bug in the file server aoe implementation (can't happen
> if you're using the uniprocessor x86 version) that happens because
> the Srb is freed before wakeup completes.  to solve this there is
> some code that sets the state (this is from ken's ancient scheduler,
> by way of sape)
>
> 	wakeup(&srb);
> 	srb->state = Free;
>
> code that receives it is like this
>
> 	sleep(&srb, srbdone, srb);
> 	cycles(&t0);
> 	for(n = 0; srb->state != Free; n++){
> 		if(srb->wmach == m->machno)
> 			sched();
> 		else
> 			monmwait(&srb->state, Alloc);
> 	}
> 	cycles(&t1);
> 	free(srb);
>
> the astounding thing is that t1-t0 is often ~ 60,000 cycles.
> it only hits a small fraction of the time, and the average is
> much lower.  but that just blows the mind.  60000 cycles!
>
> (other versions with sched were much worse.)
>
> as far as i can tell, there are no funny bits in the scheduler that
> can cause this, and no wierd scheduling is going on.
>
> i'm baffled.

Could there've been a context switch?



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] cache lines, and 60000 cycles of doom
  2014-06-20  5:09 ` Bakul Shah
@ 2014-06-20 11:45   ` erik quanstrom
  0 siblings, 0 replies; 7+ messages in thread
From: erik quanstrom @ 2014-06-20 11:45 UTC (permalink / raw)
  To: 9fans

> > the astounding thing is that t1-t0 is often ~ 60,000 cycles.
> > it only hits a small fraction of the time, and the average is
> > much lower.  but that just blows the mind.  60000 cycles!
> >
> > (other versions with sched were much worse.)
> >
> > as far as i can tell, there are no funny bits in the scheduler that
> > can cause this, and no wierd scheduling is going on.
> >
> > i'm baffled.
>
> Could there've been a context switch?

the file server does not have context switches.

- erik



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] cache lines, and 60000 cycles of doom
  2014-06-20  5:03 ` Devon H. O'Dell
@ 2014-06-20 11:50   ` erik quanstrom
  2014-06-20 12:47     ` Devon H. O'Dell
  0 siblings, 1 reply; 7+ messages in thread
From: erik quanstrom @ 2014-06-20 11:50 UTC (permalink / raw)
  To: 9fans

On Fri Jun 20 01:04:20 EDT 2014, devon.odell@gmail.com wrote:

> Weird. I assume cycles is using rdtsc or rdtscp. Perhaps some of it is due
> to a combination of contention and rdtsc(p) being serializing instructions?
> On Jun 19, 2014 12:04 PM, "erik quanstrom" <quanstro@quanstro.net> wrote:

other than the code i posted, nobody else touching the Srb,
and it's bigger than a cacheline.

why would serialization cause a big issue?

- erik



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] cache lines, and 60000 cycles of doom
  2014-06-20 11:50   ` erik quanstrom
@ 2014-06-20 12:47     ` Devon H. O'Dell
  2014-06-20 13:07       ` erik quanstrom
  0 siblings, 1 reply; 7+ messages in thread
From: Devon H. O'Dell @ 2014-06-20 12:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2014-06-20 7:50 GMT-04:00 erik quanstrom <quanstro@quanstro.net>:
> On Fri Jun 20 01:04:20 EDT 2014, devon.odell@gmail.com wrote:
>
>> Weird. I assume cycles is using rdtsc or rdtscp. Perhaps some of it is due
>> to a combination of contention and rdtsc(p) being serializing instructions?

I forget that rdtsc isn't, and one uses cpuid to get that behavior.

>> On Jun 19, 2014 12:04 PM, "erik quanstrom" <quanstro@quanstro.net> wrote:
>
> other than the code i posted, nobody else touching the Srb,
> and it's bigger than a cacheline.
>
> why would serialization cause a big issue?

It disables out-of-order execution by the processor, so there's a
pipeline stall.

There's overhead to calling the tsc instructions, but not that much.

Does `srb->wmach != m->machno` imply that t0 and t1 could be run on
different CPUs? TSC is synchronized between cores (unless someone does
wrmsr), but if you bounce to another processor, there's no guarantee.
Perhaps the difference between when the CPUs came online was on the
order of 60k cycles. No clue how cheap sched() is these days.

I should probably start reading the code again before I reply to these
things. Sorry.

--dho

> - erik
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] cache lines, and 60000 cycles of doom
  2014-06-20 12:47     ` Devon H. O'Dell
@ 2014-06-20 13:07       ` erik quanstrom
  0 siblings, 0 replies; 7+ messages in thread
From: erik quanstrom @ 2014-06-20 13:07 UTC (permalink / raw)
  To: 9fans

> It disables out-of-order execution by the processor, so there's a
> pipeline stall.

we know there's going to be a stall already, since we can't get
the cacheline we're looking for.

> There's overhead to calling the tsc instructions, but not that much.
>
> Does `srb->wmach != m->machno` imply that t0 and t1 could be run on
> different CPUs? TSC is synchronized between cores (unless someone does
> wrmsr), but if you bounce to another processor, there's no guarantee.
> Perhaps the difference between when the CPUs came online was on the
> order of 60k cycles. No clue how cheap sched() is these days.

srb->wmach is the wakeup mach, m->machno is us.  and i just realized
that that deadlock prevention is not necessary.  if srb->wmach == m->machno,
then wakeup has cleared the hazard.

i've rewritten the function as so
	static void
	srbfree(Srb *srb)
	{
		while(monmwait(&srb->state, Alloc) == Alloc)
			{}
		mbfree(srb->msgbuf);
	}

by the way, to head off the next speculation, i tried wrapping the
wakeup in splhi(), but that made no difference.  the waker is not being
scheduled.

by the way, this is ken's kernel on amd64: /n/atom/plan9/sys/src/fs
the development version has drifted a bit, but i can give you access
if you're interested.

- erik

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-06-20 13:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-19 16:01 [9fans] cache lines, and 60000 cycles of doom erik quanstrom
2014-06-20  5:03 ` Devon H. O'Dell
2014-06-20 11:50   ` erik quanstrom
2014-06-20 12:47     ` Devon H. O'Dell
2014-06-20 13:07       ` erik quanstrom
2014-06-20  5:09 ` Bakul Shah
2014-06-20 11:45   ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).