9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning.
@ 2008-07-02 12:40 ron minnich
  2008-07-02 13:16 ` Russ Cox
  0 siblings, 1 reply; 10+ messages in thread
From: ron minnich @ 2008-07-02 12:40 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Thread 9 (Thread -1758905456 (LWP 2695)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e6f7a43 in poll () from /lib/libc.so.6
#2  0x4117fa99 in ?? () from /usr/lib/libX11.so.6
#3  0x4117fe7f in _XRead () from /usr/lib/libX11.so.6
#4  0x411816bb in _XReadEvents () from /usr/lib/libX11.so.6
#5  0x4116a7ab in XNextEvent () from /usr/lib/libX11.so.6
#6  0x080a19e9 in _xproc (v=0x0) at 9vx/x11/x11-kernel.c:141
#7  0x08056f7b in linkproc () at 9vx/trap.c:484
#8  0x00000000 in ?? ()

Thread 8 (Thread -1768612976 (LWP 2696)):
#0  0x4e701723 in __call_pselect6 () from /lib/libc.so.6
#1  0x4e6fa723 in pselect () from /lib/libc.so.6
#2  0x0805647a in timerkproc (v=0x0) at 9vx/time.c:187
#3  0x08056f7b in linkproc () at 9vx/trap.c:484
#4  0x00000000 in ?? ()

Thread 7 (Thread -1778320496 (LWP 2697)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e7d7206 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x08053686 in runproc () at 9vx/sched.c:259
#3  0x0808e2bb in sched () at 9vx/a/proc.c:165
#4  0x08052444 in squidboy (v=0x828f128) at 9vx/main.c:732
#5  0x4e7d344b in start_thread () from /lib/libpthread.so.0
#6  0x4e70180e in clone () from /lib/libc.so.6

Thread 6 (Thread -1797231728 (LWP 2698)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e7da008 in accept () from /lib/libpthread.so.0
#2  0x0804f277 in so_accept (fd=38, raddr=0x9614ff64,
rport=0x9614ff6a) at 9vx/devip-posix.c:102
#3  0x0804e412 in ipopen (c=0x82cfb48, omode=2) at 9vx/devip.c:303
#4  0x08050bbb in kserve (kc=0x92e8eb50) at 9vx/kprocdev.c:98
#5  0x080515aa in kserver (v=0x827e668) at 9vx/kprocdev.c:155
#6  0x08056f7b in linkproc () at 9vx/trap.c:484
#7  0x00000000 in ?? ()

Thread 5 (Thread -1805624432 (LWP 2699)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e7da008 in accept () from /lib/libpthread.so.0
#2  0x0804f277 in so_accept (fd=40, raddr=0x93b81f64,
rport=0x93b81f6a) at 9vx/devip-posix.c:102
#3  0x0804e412 in ipopen (c=0x830ab00, omode=2) at 9vx/devip.c:303
#4  0x08050bbb in kserve (kc=0x925c7b50) at 9vx/kprocdev.c:98
#5  0x080515aa in kserver (v=0x82b2a10) at 9vx/kprocdev.c:155
#6  0x08056f7b in linkproc () at 9vx/trap.c:484
#7  0x00000000 in ?? ()


Thread 4 (Thread -1846002800 (LWP 2701)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e7d7206 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x08053686 in runproc () at 9vx/sched.c:259
#3  0x0808e2bb in sched () at 9vx/a/proc.c:165
---Type <return> to continue, or q <return> to quit---
#4  0x08052444 in squidboy (v=0x8309958) at 9vx/main.c:732
#5  0x4e7d344b in start_thread () from /lib/libpthread.so.0
#6  0x4e70180e in clone () from /lib/libc.so.6

Thread 3 (Thread -1866228848 (LWP 2702)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e7da108 in recv () from /lib/libpthread.so.0
#2  0x0804ecf1 in ipread (ch=0x83086c0, a=0x82ce098, n=2,
offset=53579) at 9vx/devip.c:404
#3  0x08050bf4 in kserve (kc=0x91640af4) at 9vx/kprocdev.c:104
#4  0x080515aa in kserver (v=0x82a07e0) at 9vx/kprocdev.c:155
#5  0x08056f7b in linkproc () at 9vx/trap.c:484
#6  0x00000000 in ?? ()

Thread 2 (Thread -1885140080 (LWP 2703)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e7d7206 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x08053686 in runproc () at 9vx/sched.c:259
#3  0x0808e2bb in sched () at 9vx/a/proc.c:165
#4  0x08052444 in squidboy (v=0x82deec8) at 9vx/main.c:732
#5  0x4e7d344b in start_thread () from /lib/libpthread.so.0
#6  0x4e70180e in clone () from /lib/libc.so.6

Thread 1 (Thread -1208650048 (LWP 2694)):
#0  0xb7f78424 in __kernel_vsyscall ()
#1  0x4e7d7206 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x080538e4 in idlehands () at 9vx/sched.c:259
#3  0x0808cf96 in _runproc () at 9vx/a/proc.c:542
#4  0x080535ca in runproc () at 9vx/sched.c:130
#5  0x0808e2bb in sched () at 9vx/a/proc.c:165
#6  0x08052812 in main (argc=<value optimized out>, argv=0xbf97757c)
at 9vx/main.c:208
#0  0xb7f78424 in __kernel_vsyscall ()
(gdb)

Is this another out of memory? This happened on a mk clean on kernel source

ron



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning.
  2008-07-02 12:40 [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning ron minnich
@ 2008-07-02 13:16 ` Russ Cox
  2008-07-02 13:30   ` ron minnich
  0 siblings, 1 reply; 10+ messages in thread
From: Russ Cox @ 2008-07-02 13:16 UTC (permalink / raw)
  To: 9fans

> Is this another out of memory? This
> happened on a mk clean on kernel source

It would not surprise me if the new pager (post-0.12)
caused the problem, but in order for that
to happen the kernel would have had to print

	uh oh.  someone woke the pager

first.  Did that happen?  I've done a lot without
running out of memory (including building gs)
so it wouldn't be my first guess, unless that print
happened, preferably very close to when acme died.

Russ



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning.
  2008-07-02 13:16 ` Russ Cox
@ 2008-07-02 13:30   ` ron minnich
  2008-07-02 13:40     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar erik quanstrom
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: ron minnich @ 2008-07-02 13:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Jul 2, 2008 at 6:16 AM, Russ Cox <rsc@swtch.com> wrote:
>> Is this another out of memory? This
>> happened on a mk clean on kernel source
>
> It would not surprise me if the new pager (post-0.12)
> caused the problem, but in order for that
> to happen the kernel would have had to print
>
>        uh oh.  someone woke the pager

That I did not see. Just the no segment error.

Here's a bigger question, now that I've read the paper and briefly
scanned the code. Do you have some thoughts on the long term ability
of vx32 to get close to unity performance on a system (like Plan 9)
with a high rate of context switches between file server processes
(you allude ot this cost in the paper).  It's an ideal terminal right
now. I don't see a need to use drawterm any more.

But running fossil and venti, it's got a ways to go in terms of
performance (i.e. mk clean in /sys/src/9/pc takes ~60 seconds).

At this point, the fastest virtualization system for kernel mk on my
x60 is still xen, at 12 seconds. I had expected lguest to beat that,
but it never has. There are claims that kvm is running at close to
unity, but that's probably for linux -- I have not tested kvm lately
with plan 9.

At the same time, in terms of effort, vx32 is far easier to run than
the alternatives, and hence is superior in the long term as a way to
get plan 9 into people's hands.

Also, opteron. lguest on opteron should be ready soonish. But vx32 is
still a highly desirable alternative. Do you have thoughts on how to
sandbox on opteron, where you don't have the segment registers? Could
you use mctl to sandbox and then filter mmap system calls from the
sandboxed code to make sure the sandbox can not be escaped?

enough rambling. I'm in a west coast brain on the east coast, still
not awake at this point.

Thanks

ron



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar
  2008-07-02 13:30   ` ron minnich
@ 2008-07-02 13:40     ` erik quanstrom
  2008-07-02 14:17       ` ron minnich
  2008-07-02 15:42     ` vx32 and 9vx performance, and on x86-64 Russ Cox
  2008-07-02 15:43     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning Russ Cox
  2 siblings, 1 reply; 10+ messages in thread
From: erik quanstrom @ 2008-07-02 13:40 UTC (permalink / raw)
  To: 9fans

> Here's a bigger question, now that I've read the paper and briefly
> scanned the code. Do you have some thoughts on the long term ability
> of vx32 to get close to unity performance on a system (like Plan 9)
> with a high rate of context switches between file server processes
> (you allude ot this cost in the paper).  It's an ideal terminal right
> now. I don't see a need to use drawterm any more.
>
> But running fossil and venti, it's got a ways to go in terms of
> performance (i.e. mk clean in /sys/src/9/pc takes ~60 seconds).

import(1)'ed files, host files and ramfs files are similarly slow.
this is no benchmark, and it's a bit questionable because it's
right on the limit of time's measurement, but it does reflect
a lag i see that i don't see in drawterm:

drawterm (the file is 4368 bytes),
	>/dev/null time cat usps.jpg
	0.00u 0.00s 0.00r 	 cat usps.jpg
9vx
	>/dev/null time cat usps.jpg
	0.00u 0.00s 0.01r	cat usps.jpg

- erik




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar
  2008-07-02 13:40     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar erik quanstrom
@ 2008-07-02 14:17       ` ron minnich
  0 siblings, 0 replies; 10+ messages in thread
From: ron minnich @ 2008-07-02 14:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Jul 2, 2008 at 6:40 AM, erik quanstrom <quanstro@quanstro.net> wrote:
>> Here's a bigger question, now that I've read the paper and briefly
>> scanned the code. Do you have some thoughts on the long term ability
>> of vx32 to get close to unity performance on a system (like Plan 9)
>> with a high rate of context switches between file server processes
>> (you allude ot this cost in the paper).  It's an ideal terminal right
>> now. I don't see a need to use drawterm any more.
>>
>> But running fossil and venti, it's got a ways to go in terms of
>> performance (i.e. mk clean in /sys/src/9/pc takes ~60 seconds).
>
> import(1)'ed files, host files and ramfs files are similarly slow.

I don't want to turn this into a "let's pile onto vx" discussion, just
more a discussion of how far we can go with the approach
performance-wise. Some of the issues are well brought out in the
paper, I'm just rolling the questions around in my sleep-deprived
mental state.

I'm currently just trying to figure out how to (re)package THX to make
it easier for people to use, and which virtualization system to use
...
ron



^ permalink raw reply	[flat|nested] 10+ messages in thread

* vx32 and 9vx performance, and on x86-64
  2008-07-02 13:30   ` ron minnich
  2008-07-02 13:40     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar erik quanstrom
@ 2008-07-02 15:42     ` Russ Cox
  2008-07-04  0:26       ` [9fans] " erik quanstrom
  2008-07-02 15:43     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning Russ Cox
  2 siblings, 1 reply; 10+ messages in thread
From: Russ Cox @ 2008-07-02 15:42 UTC (permalink / raw)
  To: 9fans

There's not likely anything in the guts of vx32 that hasn't been done
before.  What's new is the fact that we managed to package it up in
a way that runs on a variety of out-of-the-box OS'es with
neither kernel modifications nor special privileges on any x86.
That portability is key to being able to deploy interesting apps,
like 9vx.

> Here's a bigger question, now that I've read the paper and briefly
> scanned the code. Do you have some thoughts on the long term ability
> of vx32 to get close to unity performance on a system (like Plan 9)
> with a high rate of context switches between file server processes
> (you allude ot this cost in the paper).  It's an ideal terminal right
> now. I don't see a need to use drawterm any more.
>
> But running fossil and venti, it's got a ways to go in terms of
> performance (i.e. mk clean in /sys/src/9/pc takes ~60 seconds).

I have spent approximately no time at all measuring 9vx
performance other than the numbers in the paper, and even
those were just what it was the first time I measured, not
something I tuned for.  I've been much more focused on
correctness and functionality than speed.  That should be
encouraging, because there's probably a lot of room for
improvement.

Creating new processes and context switching is definitely slow.
One of the slowest parts of the kernel build for me is the line

	rc ../port/mksystab > ../port/systab.h

which invokes a sam script that does

	,x/SYS[A-Z0-9_]+,/ | tr A-Z a-z

which forks and execs tr many times.  There are three potential
sources of slowdown that I can think of right now:

	* context switches, which involve a lot of mmap/munmap
	  and trigger potentially many page faults

	* floating point: apps that use floating point are probably flushing
	  the vx32 translation cache more than they could be.

	* the kprocdev framework.  all i/o into devip, devfs, and devdraw
	  is marshalled and handed off to a kproc running in a different
	  pthread, so that blocking i/o won't block the cpu0 pthread,
	  which is the only one that can run vx32.  this means that
	  all i/o gets copied one extra time inside the kernel.

All of these could be improved, but you'd have to profile 9vx to
figure out where the time is going first.

You can reduce the effect of context switches by maintaining more
than one user address space and by keeping track of which processes
have address spaces that differ only in their stack segment.

You can rework the way vx32 signals "no floating point" exceptions
so that they wouldn't require flushing the translation cache.

If one is doing i/o into a kernel buffer, like in demand paging, that
doesn't need to be copied during the kprocdev switch, but it is.
If the i/o doesn't span a page boundary and an appropriate fault-free
physical mapping is known, kprocdev could use the kernel's physical
mapping of that page instead of doing a copy.

But again, you'd have to profile to figure out which of these is
worth doing, if any.  I don't have any sense of where the time is
going.  User-level profilers are going to be difficult to use, because
9vx wants to handle SIGVTALRM itself.  You should be able to do
pretty well with oprofile on Linux, or maybe dtrace on OS X.
That would also have the benefit of telling you how much kernel
effort 9vx is inducing.

I hope that people will do this.  I have very little time to put into
this for the rest of the summer, but I'm always happy to explain
things and process patches.

> At this point, the fastest virtualization system for kernel mk on my
> x60 is still xen, at 12 seconds. I had expected lguest to beat that,
> but it never has. There are claims that kvm is running at close to
> unity, but that's probably for linux -- I have not tested kvm lately
> with plan 9.

Notice that vx32 itself is not on my list above.  I think that
there are plenty of things 9vx is doing inefficiently that
dwarf the potential 1.8x performance hit in raw x86 execution
speed.  Also, inner loops tend to run close to 1.0 already.
Once everyone's x86 processors have hardware support
for virtualization and the operating systems allow arbitrary
user code to get at it, maybe it would make sense to let vx32
take advantage of that instead, but right now it's not a
priority for me.

> Also, opteron. lguest on opteron should be ready soonish. But vx32 is
> still a highly desirable alternative. Do you have thoughts on how to
> sandbox on opteron, where you don't have the segment registers? Could
> you use mctl to sandbox and then filter mmap system calls from the
> sandboxed code to make sure the sandbox can not be escaped?

Vx32 already runs on x86-64 hosts, but it can only run
x86-32 code.  I don't see any reasonable way around that
limitation right now, but it also doesn't bother me.
Maybe in a few years kvm would be an answer.

Right now, you can build 9vx with -m32 and get a binary
that will work on x86-64, assuming you already have a
32-bit libX11 or you give up graphics.  Plan 9, like many systems,
assumes that kernel pointers and user pointers are the same size.
A native x86-64 version of 9vx that ran 32-bit x86 user code
would be possible, but you'd have to remove that assumption
from the code.  That probably wouldn't be as bad as it sounds:
I removed the assumption that user 0 = kernel 0 already, and it
only took a day.  Also, while doing that I made sure that the
kernel never has a C pointer holding a user address (always a
ulong instead), and all the translations between kernel and user
pointer now either mention uzero or uvalidaddr.  So they should
be easy to find.

Again I hope that people will do this, but it won't be me
any time soon.

Russ



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning.
  2008-07-02 13:30   ` ron minnich
  2008-07-02 13:40     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar erik quanstrom
  2008-07-02 15:42     ` vx32 and 9vx performance, and on x86-64 Russ Cox
@ 2008-07-02 15:43     ` Russ Cox
  2 siblings, 0 replies; 10+ messages in thread
From: Russ Cox @ 2008-07-02 15:43 UTC (permalink / raw)
  To: 9fans

>> It would not surprise me if the new pager (post-0.12)
>> caused the problem, but in order for that
>> to happen the kernel would have had to print
>>
>>        uh oh.  someone woke the pager
>
> That I did not see. Just the no segment error.

You can run acid inside 9vx to get a stack trace of the
broken processes.  Maybe a pattern will emerge.

Russ



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] vx32 and 9vx performance, and on x86-64
  2008-07-02 15:42     ` vx32 and 9vx performance, and on x86-64 Russ Cox
@ 2008-07-04  0:26       ` erik quanstrom
  2008-07-04  2:03         ` andrey mirtchovski
  2008-07-04  2:15         ` Russ Cox
  0 siblings, 2 replies; 10+ messages in thread
From: erik quanstrom @ 2008-07-04  0:26 UTC (permalink / raw)
  To: 9fans

> 	* the kprocdev framework.  all i/o into devip, devfs, and devdraw
> 	  is marshalled and handed off to a kproc running in a different
> 	  pthread, so that blocking i/o won't block the cpu0 pthread,
> 	  which is the only one that can run vx32.  this means that
> 	  all i/o gets copied one extra time inside the kernel.

why can only one thread run vx32?

- erik




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] vx32 and 9vx performance, and on x86-64
  2008-07-04  0:26       ` [9fans] " erik quanstrom
@ 2008-07-04  2:03         ` andrey mirtchovski
  2008-07-04  2:15         ` Russ Cox
  1 sibling, 0 replies; 10+ messages in thread
From: andrey mirtchovski @ 2008-07-04  2:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> why can only one thread run vx32?

i think i found part of the answer just now. see the comment above
9vx/main.c:^setsigsegv



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] vx32 and 9vx performance, and on x86-64
  2008-07-04  0:26       ` [9fans] " erik quanstrom
  2008-07-04  2:03         ` andrey mirtchovski
@ 2008-07-04  2:15         ` Russ Cox
  1 sibling, 0 replies; 10+ messages in thread
From: Russ Cox @ 2008-07-04  2:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>>       * the kprocdev framework.  all i/o into devip, devfs, and devdraw
>>         is marshalled and handed off to a kproc running in a different
>>         pthread, so that blocking i/o won't block the cpu0 pthread,
>>         which is the only one that can run vx32.  this means that
>>         all i/o gets copied one extra time inside the kernel.
>
> why can only one thread run vx32?

9vx requires that the page fault handler runs on an alternate stack
during vx32 ("user") execution and on the kernel stack during
kernel execution.  That bit--whether or not to run the signal handler
on the pthread's alternate signal stack--is part of the struct sigaction
defining the signal-handling behavior, which is shared by all
pthreads in the process, not per-pthread.  9vx arranges that the
global bit is correct for all pthreads by only allowing one of the
pthreads--cpu0--to page fault.  The others, which run supporting
kprocs, arrange never to fault.  When user i/o is moved off cpu0
to the supporting kprocs, the i/o has to be done into fault-free
kernel buffers and then copied back into user space on cpu0.

This is essentially a failure of vision in the pthreads interface:
sigaltstack is per-pthread, but sigaction is not.  Linux does make
it possible to have different sigactions per pthread, but you'd
have to hack up your own thread library.  If the Linux guys had
really understood the wisdom of rfork, one could just do
rfork(RFSIGHAND) at the start of each new pthread instead of
having to drag in a whole new library.  FreeBSD didn't get
this right either, for what it's worth.  In both cases you could
work around this by linking with a modified pthread library.
Ironically, OS X doesn't have this problem because its signal
handling was so bad 9vx has to reimplement it from scratch
in terms of Mach exceptions (see 9vx/osx/signal.c).

There are other simplifying assumptions, like having just one
address range for the "user address space", but they could be
removed if necessary.  The real difficulty is the sigaction
SA_ONSTACK bit.

None of this is terribly important for performance: right now there
are plenty of inefficiencies not related to having just one cpu on
which to run user code.

Russ


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-04  2:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-02 12:40 [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning ron minnich
2008-07-02 13:16 ` Russ Cox
2008-07-02 13:30   ` ron minnich
2008-07-02 13:40     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar erik quanstrom
2008-07-02 14:17       ` ron minnich
2008-07-02 15:42     ` vx32 and 9vx performance, and on x86-64 Russ Cox
2008-07-04  0:26       ` [9fans] " erik quanstrom
2008-07-04  2:03         ` andrey mirtchovski
2008-07-04  2:15         ` Russ Cox
2008-07-02 15:43     ` [9fans] 118 acme fault 0 no segment -- seeing a lot of simliar things this morning Russ Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).