9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] fs performance
@ 2011-01-09 17:06 erik quanstrom
  2011-01-09 17:29 ` ron minnich
  0 siblings, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2011-01-09 17:06 UTC (permalink / raw)
  To: 9fans

the new auth server, which uses the fs as its root rather than
a stand-alone fs,  happens to be faster than our now-old
cpu server, so i did a quick build test with a kernel including
the massive-fw myricom driver.  suspecting that latency kills
even on 10gbe, i tried a second build with NPROC=24. a
table comparing ken fs, fossil+venti, and ramfs follows.
unfortunately, i was not able to use the same system for the
fossil+venti tests, but there's a ramfs test on the same system
to bring things into perspective due to the large differences
in processor generation, network, &c.  here's an example test:

	tyty; echo $NPROC
	4
	tyty; time mk>/dev/null && mk clean>/dev/null
	2.93u 1.30s 3.36r 	 mk
	tyty; NPROC=24 time mk >/dev/null && mk clean>/dev/null
	1.32u 0.22s 2.29r 	 mk

and here are the compiled results:

a	Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
	4 active cores (8 threads; 4 enabled);
	http://ark.intel.com/Product.aspx?id=35365
	intel 82598 10gbe nic; fs has myricom 10gbe nic; 54µs latency
b	Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
	4 active cores (4 threads; 4 enabled);
	http://www.intel.com/p/en_US/products/server/processor/xeon5000/specifications
	intel 82563-style gbe nic; 70µs latency

mach	fs	nproc	time
a	ken	4	2.93u 1.30s 3.36r 	 mk
		24	1.32u 0.22s 2.29r 	 mk
	ramfs	4	3.10u 1.67s 3.01r 	 mk
		24	2.98u 1.23s 2.42r 	 mk
b	venti	4	2.65u 3.44s 21.46r 	 mk
		24	2.98u 3.56s 21.58r 	 mk
	ramfs	4	3.55u 2.22s 9.08r 	 mk
		24	3.50u 2.67s 9.41r 	 mk

it's interesting that neither venti nor ramfs get any faster
on machine b with NPROCS set to 24, but both get
faster on machine a and the fastest time of all is not
ramfs, but ken's fs with NPROC=24.  so i suppose the
64-bit question is, is that because moving data in and
out of user space is slower than 10gbe, or because ramfs
is single threaded and slow?

in any event, it's clear that if the fs is good, latency
can kill even on 10gbe lan.  it would naively seem to me that
using the Tstream model would be too expensive, requiring
thousands of new streams, and require modifying at
least 8c, 8l, mk, rc, awk (what am i forgetting?).  but
it would be worth a test.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 17:06 [9fans] fs performance erik quanstrom
@ 2011-01-09 17:29 ` ron minnich
  2011-01-09 17:51   ` John Floren
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: ron minnich @ 2011-01-09 17:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

simple question: what's it take to set up a kenfs + coraid combo? Or
is there a howto somewhere on your site? I'd like to give this a go.

Those are interesting numbers. Actually, however, changing a program
to use the stream stuff is trivial. I would expect the streaming to be
a real loser in a site with 10GE but we can try it. As John has
pointed out the streaming only makes sense where the inherent network
latency is pretty high (10s of milliseconds), i.e. the wide area.

ron



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 17:29 ` ron minnich
@ 2011-01-09 17:51   ` John Floren
  2011-01-10 18:07     ` Francisco J Ballesteros
  2011-01-09 18:31   ` erik quanstrom
  2011-01-09 19:54   ` Bakul Shah
  2 siblings, 1 reply; 27+ messages in thread
From: John Floren @ 2011-01-09 17:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Jan 9, 2011 at 9:29 AM, ron minnich <rminnich@gmail.com> wrote:
[snipped]
> As John has
> pointed out the streaming only makes sense where the inherent network
> latency is pretty high (10s of milliseconds), i.e. the wide area.
>
> ron
>
>

Right, my results were that you get pretty much exactly the same
performance when you're working over a LAN whether you choose streams
or regular 9P. Streaming only really starts to help when you're up
into the multiple-millisecond RTT range.

One thing that definitely needs to be done is checking the scalability
of streaming. I think it can probably handle erik's case, though...
there are tens of thousands of ports available, and even over a
low-latency network I think we should be able to push data across fast
enough.


John



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 17:29 ` ron minnich
  2011-01-09 17:51   ` John Floren
@ 2011-01-09 18:31   ` erik quanstrom
  2011-01-09 19:54   ` Bakul Shah
  2 siblings, 0 replies; 27+ messages in thread
From: erik quanstrom @ 2011-01-09 18:31 UTC (permalink / raw)
  To: 9fans

On Sun Jan  9 12:41:37 EST 2011, rminnich@gmail.com wrote:
> simple question: what's it take to set up a kenfs + coraid combo? Or
> is there a howto somewhere on your site? I'd like to give this a go.

since i've done this a number of times, it's getting easier.
i've added some features to the fs and written a few programs
and a few scripts to make preparing the boot lun easy, and
make dual-booting the fs easy while one is preparing the fs.

unfortunately, there still is a bit of magic, i'll work up some
man pages.

the basic strategy is this.
1.  create 2 luns, one for the worm one for the cache.
2.  use prep to partition the cache lun and create a "fscache"
partition that's 25gb or so.
3.  use the "mkboot" script and the configuration files to make
a bootable local disk.  i use a sata ssd with the ahci driver.
4.  boot file server and
- ream main,
- enter password information

if you're really anxious to get going, i can provide the boot
scripts & programs without proper documentation right now.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 17:29 ` ron minnich
  2011-01-09 17:51   ` John Floren
  2011-01-09 18:31   ` erik quanstrom
@ 2011-01-09 19:54   ` Bakul Shah
  2011-01-09 20:25     ` ron minnich
  2011-01-09 21:14     ` erik quanstrom
  2 siblings, 2 replies; 27+ messages in thread
From: Bakul Shah @ 2011-01-09 19:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, 09 Jan 2011 09:29:04 PST ron minnich <rminnich@gmail.com>  wrote:
>
> Those are interesting numbers. Actually, however, changing a program
> to use the stream stuff is trivial. I would expect the streaming to be
> a real loser in a site with 10GE but we can try it. As John has
> pointed out the streaming only makes sense where the inherent network
> latency is pretty high (10s of milliseconds), i.e. the wide area.

For "streaming" at wirespeed you want to keep the pipe full
at all times.  For that to happen, by the time ack for byte#
N arrives, you would have sent at least RTT*bandwidth more
bytes. So it is the latency*bandwidth product and not just
latency that matters -- that is, streaming makes sense at
lower latencies for higher speed networks.

Ideally on a 10GbE you should get something like 1.1GBps of
TCP bandwidth.  If the throughput you get from local
filesystems is not in this neighborhood, that would mean
there are bigger bottlenecks than dealing with latency.  In
this case streaming won't help much.

On FreeBSD (on a 3.2Ghz w3550) dd </dev/md0 > /dev/null
yields 3.5GBps (@128k blocksize).  md0 is a ramdisk.  Write
to is at about 1/4th read speed.  dd </dev/zero >/dev/null
gives 15+GBps -- basically system overhead for IO. dd from an
uncached file on ZFS is about 350MBps. Once the file is
cached, the throughput increases to 2.5GBps. None of these
use any streaming (though there *is* readahead at the FS
level).

[GBps == 10^9 bytes/second. MBps == 1^6 bytes/second]

The point of mentioning FreeBSD numbers is to show what is
possible. To really improve plan9 fs performance one would
have to look at things like syscall overhead, number of data
copies made, number of syscalls and context switches etc. and
tune each component.

[Aside: has ttcp been ported to plan9?]




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 19:54   ` Bakul Shah
@ 2011-01-09 20:25     ` ron minnich
  2011-01-09 20:47       ` erik quanstrom
  2011-01-09 21:17       ` Bakul Shah
  2011-01-09 21:14     ` erik quanstrom
  1 sibling, 2 replies; 27+ messages in thread
From: ron minnich @ 2011-01-09 20:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Jan 9, 2011 at 11:54 AM, Bakul Shah <bakul+plan9@bitblocks.com> wrote:
>None of these
> use any streaming (though there *is* readahead at the FS
> level).

yes, all the systems that perform well do so via aggressive readahead
-- which, from one point of view, is one way of creating a stream from
a discrete set of requests.

If you think about it, a single 9p connection is a multiplexed stream
for managing file I/O requests. What john's work did is to create an
individual stream for each file. And, as Andrey's results and John's
results show, it can be a win. The existence of readahead supports the
idea that some form of streaming might work well in even the local
area.

But who knows. Until we try.

BTW, if certain lurkers still read this list, they may realize that
some of this work was inspired by work SGI did in 1994 or so called
"NFS bypass".

ron



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 20:25     ` ron minnich
@ 2011-01-09 20:47       ` erik quanstrom
  2011-01-09 21:04         ` ron minnich
  2011-01-09 21:17       ` Bakul Shah
  1 sibling, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2011-01-09 20:47 UTC (permalink / raw)
  To: 9fans

> If you think about it, a single 9p connection is a multiplexed stream
> for managing file I/O requests. What john's work did is to create an
> individual stream for each file. And, as Andrey's results and John's
> results show, it can be a win. The existence of readahead supports the
> idea that some form of streaming might work well in even the local
> area.

however, i think we could do even better by modifying devmnt
to keep more than 1 outstanding message per channel, as a mount
option.  each 9p connection can stream without the overhead of
seperate connections.

this is the stragegy used by aoe.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 20:47       ` erik quanstrom
@ 2011-01-09 21:04         ` ron minnich
  0 siblings, 0 replies; 27+ messages in thread
From: ron minnich @ 2011-01-09 21:04 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Jan 9, 2011 at 12:47 PM, erik quanstrom <quanstro@quanstro.net> wrote:

> however, i think we could do even better by modifying devmnt
> to keep more than 1 outstanding message per channel, as a mount
> option.  each 9p connection can stream without the overhead of
> seperate connections.
>
> this is the stragegy used by aoe.

sounds lke fun! I'm up for it.

ron



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 19:54   ` Bakul Shah
  2011-01-09 20:25     ` ron minnich
@ 2011-01-09 21:14     ` erik quanstrom
  2011-01-09 21:38       ` Bakul Shah
  1 sibling, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2011-01-09 21:14 UTC (permalink / raw)
  To: 9fans

> The point of mentioning FreeBSD numbers is to show what is
> possible. To really improve plan9 fs performance one would
> have to look at things like syscall overhead, number of data
> copies made, number of syscalls and context switches etc. and
> tune each component.

i don't see any evidence that plan 9 suffers in system call overhead
time, etc.  do you have some numbers that say it does?

i also think that your examples don't translate well into the
plan 9 world.  we trade performance for keeping ramfs out of
the kernel, etc. (620mb/s on my much slower machine, btw.)
comparing to a fuse ramfs would be more apt.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 20:25     ` ron minnich
  2011-01-09 20:47       ` erik quanstrom
@ 2011-01-09 21:17       ` Bakul Shah
  2011-01-09 21:59         ` erik quanstrom
  2011-01-09 22:58         ` Charles Forsyth
  1 sibling, 2 replies; 27+ messages in thread
From: Bakul Shah @ 2011-01-09 21:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, 09 Jan 2011 12:25:41 PST ron minnich <rminnich@gmail.com>  wrote:
> On Sun, Jan 9, 2011 at 11:54 AM, Bakul Shah <bakul+plan9@bitblocks.com> wrote
> :
> >None of these
> > use any streaming (though there *is* readahead at the FS
> > level).
>
> yes, all the systems that perform well do so via aggressive readahead
> -- which, from one point of view, is one way of creating a stream from
> a discrete set of requests.

On freebsd there is no readahead when dding from /dev/zero or
/dev/md0 or any device for that matter (at least that used to
be the case). Readahead is only at the FS level.  That is why
I showed dd numbers for devices -- freebsd does well even
when there is no readahead.  May be plan9 does too -- I would
run these tests but I don't have plan9 on a fast machine.

Readahead is a bet that the next block will be needed.
Distinct from streaming (making sure we can *transfer* data
as fast as possible). You don't need readahead if you can
generate data fast enough!

> If you think about it, a single 9p connection is a multiplexed stream
> for managing file I/O requests. What john's work did is to create an
> individual stream for each file. And, as Andrey's results and John's
> results show, it can be a win. The existence of readahead supports the
> idea that some form of streaming might work well in even the local
> area.

Windowing overhead (as in TCP) is not necessary but file
readahead is a win in the local case.

There are really N separate knobs you can tune:
- read ahead
- windowing (when latency*bandwidth is large)
- parallelism
- other local optimizations (does plan9 pay marshalling,
  unmarshalling cost for node local communication?)
- pushing performance sensitive code into kernel (to reduce
  context switches).
- more generally, collapsing code paths that go through
  multiple servers
- zero copy data transfer

Of course, some are worth doing *only* if you want absolutely
the very best performance but they come at the cost of
reduced flexibility, increased complexity & codebloat.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 21:14     ` erik quanstrom
@ 2011-01-09 21:38       ` Bakul Shah
  2011-01-09 21:56         ` ron minnich
  2011-01-09 22:00         ` erik quanstrom
  0 siblings, 2 replies; 27+ messages in thread
From: Bakul Shah @ 2011-01-09 21:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, 09 Jan 2011 16:14:21 EST erik quanstrom <quanstro@quanstro.net>  wrote:
> > The point of mentioning FreeBSD numbers is to show what is
> > possible. To really improve plan9 fs performance one would
> > have to look at things like syscall overhead, number of data
> > copies made, number of syscalls and context switches etc. and
> > tune each component.
>
> i don't see any evidence that plan 9 suffers in system call overhead
> time, etc.  do you have some numbers that say it does?

I didn't say plan9 "suffers". Merely that one has to look at
other aspects as well (implying putting in Tstream may not
make a huge difference).

> i also think that your examples don't translate well into the
> plan 9 world.  we trade performance for keeping ramfs out of
> the kernel, etc. (620mb/s on my much slower machine, btw.)

This is for dd </dev/zero >/dev/null?  What do you get for
various block sizes?

If you are getting 620MBps that means you will definitely not
exceed that number for disk based filesystems.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 21:38       ` Bakul Shah
@ 2011-01-09 21:56         ` ron minnich
  2011-01-09 22:02           ` erik quanstrom
  2011-01-10 14:45           ` David Leimbach
  2011-01-09 22:00         ` erik quanstrom
  1 sibling, 2 replies; 27+ messages in thread
From: ron minnich @ 2011-01-09 21:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Jan 9, 2011 at 1:38 PM, Bakul Shah <bakul+plan9@bitblocks.com> wrote:

> I didn't say plan9 "suffers". Merely that one has to look at
> other aspects as well (implying putting in Tstream may not
> make a huge difference).

well, what we do know from one set of measurements is that it makes a
measurable difference when latency is measured in the tens of
milliseconds. :-)

I have done some of these other measurements, e.g. system call
overhead. Plan 9 system call time is quite a bit longer than Linux
nowadays, when Linux uses the SYSENTER support.

At the same time, the Plan 9 "mon" device that Andrey wrote was
considerably faster than the procfs-based "mon" device I wrote: 30K
samples/second on Plan 9 vs. 12K samples/second on Linux.

John did do some measurement of file system times via the trace device
we wrote. I think it's fair to say that the IO path for fossil is
considerably slower than the IO path for kernel-based file systems in
Linux: slower as in multiples of 10, not multiples. There's a fair
amount of copying, allocation, and bouncing in and out of the kernel,
and this activity does not come cheap.

So, one speculation is that a kernel-based Plan 9 file system might be
quite fast. And that's enough random text for a Sunday.

ron



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 21:17       ` Bakul Shah
@ 2011-01-09 21:59         ` erik quanstrom
  2011-01-09 22:58         ` Charles Forsyth
  1 sibling, 0 replies; 27+ messages in thread
From: erik quanstrom @ 2011-01-09 21:59 UTC (permalink / raw)
  To: 9fans

> - other local optimizations (does plan9 pay marshalling,
>   unmarshalling cost for node local communication?)

not unless it hits the mount driver.  since a user level fs is
a 9p server, it is clear that io must go through the mnt driver
and kernel fileservers or pipes need not.

> - pushing performance sensitive code into kernel (to reduce
>   context switches).

i never understood the point of plan 9 to be to hold
the performance crown at all costs.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 21:38       ` Bakul Shah
  2011-01-09 21:56         ` ron minnich
@ 2011-01-09 22:00         ` erik quanstrom
  1 sibling, 0 replies; 27+ messages in thread
From: erik quanstrom @ 2011-01-09 22:00 UTC (permalink / raw)
  To: 9fans

> > i also think that your examples don't translate well into the
> > plan 9 world.  we trade performance for keeping ramfs out of
> > the kernel, etc. (620mb/s on my much slower machine, btw.)
>
> This is for dd </dev/zero >/dev/null?  What do you get for
> various block sizes?

that's for dd -if /dev/zero -of /n/testramfs/bigfile -bs 128k -count `{aux/number 100m/128k}

the block size won't matter much, since ramfs will get 8k writes
no matter what.

> If you are getting 620MBps that means you will definitely not
> exceed that number for disk based filesystems.

wrong-o!

when i first started testing the myricom 10gbe in 2006, i was able
to get 470MB/s from an sr with the aoe driver with no caching
or readahead on the plan 9 side.  by today's standards, those were
very slow systems.

i don't have the time to set up this test rig right now, but i think
it's safe to assume that we can pound this performance with modern
hardware and a modern aoe appliance.

but even with 470ms/s from the disk, with memory caching and
ken fs's good performance (no context switches, no tlb flushes),
it's safe to assume that we can already beat 620mb/s from the
file server.

what you're seeing is the fact that ramfs is just plain slow.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 21:56         ` ron minnich
@ 2011-01-09 22:02           ` erik quanstrom
  2011-01-10 14:45           ` David Leimbach
  1 sibling, 0 replies; 27+ messages in thread
From: erik quanstrom @ 2011-01-09 22:02 UTC (permalink / raw)
  To: 9fans

> John did do some measurement of file system times via the trace device
> we wrote. I think it's fair to say that the IO path for fossil is
> considerably slower than the IO path for kernel-based file systems in
> Linux: slower as in multiples of 10, not multiples. There's a fair
> amount of copying, allocation, and bouncing in and out of the kernel,
> and this activity does not come cheap.

that is, the pool library must die.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 22:58         ` Charles Forsyth
@ 2011-01-09 22:55           ` ron minnich
  2011-01-09 23:50             ` Charles Forsyth
  2011-01-10  3:26           ` Bakul Shah
  1 sibling, 1 reply; 27+ messages in thread
From: ron minnich @ 2011-01-09 22:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Jan 9, 2011 at 2:58 PM, Charles Forsyth <forsyth@terzarima.net> wrote:
> it's curious that people are still worrying about "local" file systems
> when so much of most people's data increasingly is miles
> away on Google, S3, S3 via Drop Box, etc, which model is closer if anything to the
> original plan 9 model of dedicated file servers than the
> unix/linux model of "the whole world is in the box in front of you".
>
>

Yes. If you look at the Chrome laptop model, to me it just looks like
a Plan 9 terminal.

The way I move files to/from Dropbox and these other services is via
streams, btw :-)

ron



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 21:17       ` Bakul Shah
  2011-01-09 21:59         ` erik quanstrom
@ 2011-01-09 22:58         ` Charles Forsyth
  2011-01-09 22:55           ` ron minnich
  2011-01-10  3:26           ` Bakul Shah
  1 sibling, 2 replies; 27+ messages in thread
From: Charles Forsyth @ 2011-01-09 22:58 UTC (permalink / raw)
  To: 9fans

it's curious that people are still worrying about "local" file systems
when so much of most people's data increasingly is miles
away on Google, S3, S3 via Drop Box, etc, which model is closer if anything to the
original plan 9 model of dedicated file servers than the
unix/linux model of "the whole world is in the box in front of you".



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 22:55           ` ron minnich
@ 2011-01-09 23:50             ` Charles Forsyth
  0 siblings, 0 replies; 27+ messages in thread
From: Charles Forsyth @ 2011-01-09 23:50 UTC (permalink / raw)
  To: 9fans

>The way I move files to/from Dropbox and these other services is via
>streams, btw :-)

yes, and some streams are better than others, but i suspect (based on observed
behaviour and wireshark) that there are non-trivial delays and thus latency
visible within the stream. it isn't a nice stream of regularly-arriving data.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 22:58         ` Charles Forsyth
  2011-01-09 22:55           ` ron minnich
@ 2011-01-10  3:26           ` Bakul Shah
  1 sibling, 0 replies; 27+ messages in thread
From: Bakul Shah @ 2011-01-10  3:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, 09 Jan 2011 22:58:22 GMT Charles Forsyth <forsyth@terzarima.net>  wrote:
> it's curious that people are still worrying about "local" file systems
> when so much of most people's data increasingly is miles
> away on Google, S3, S3 via Drop Box, etc, which model is closer if anything t
> o the
> original plan 9 model of dedicated file servers than the
> unix/linux model of "the whole world is in the box in front of you".

Peak Local file access bandwidth is typically 50 to 100 MBPs
x number of disks; over the localnet it is about 80MBps. On
my internet connection I barely get 1MBps download (& 0.2MBps
upload) speeds. Not to mention server side slowdowns, loss of
control over one's files, sites you download from don't
always stick around (or change) etc. etc.  So I mostly use
local filesystems (as in on the same box or on a local
network) & that is where my interest lies.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 21:56         ` ron minnich
  2011-01-09 22:02           ` erik quanstrom
@ 2011-01-10 14:45           ` David Leimbach
  2011-01-10 15:06             ` Charles Forsyth
  1 sibling, 1 reply; 27+ messages in thread
From: David Leimbach @ 2011-01-10 14:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sunday, January 9, 2011, ron minnich <rminnich@gmail.com> wrote:
> On Sun, Jan 9, 2011 at 1:38 PM, Bakul Shah <bakul+plan9@bitblocks.com> wrote:
>
>> I didn't say plan9 "suffers". Merely that one has to look at
>> other aspects as well (implying putting in Tstream may not
>> make a huge difference).
>
> well, what we do know from one set of measurements is that it makes a
> measurable difference when latency is measured in the tens of
> milliseconds. :-)
>
> I have done some of these other measurements, e.g. system call
> overhead. Plan 9 system call time is quite a bit longer than Linux
> nowadays, when Linux uses the SYSENTER support.

Linux maps the kernel in the high 1GB of VM too doesn't it?  What does
Plan 9 do (haven't looked yet)

>
> At the same time, the Plan 9 "mon" device that Andrey wrote was
> considerably faster than the procfs-based "mon" device I wrote: 30K
> samples/second on Plan 9 vs. 12K samples/second on Linux.
>
> John did do some measurement of file system times via the trace device
> we wrote. I think it's fair to say that the IO path for fossil is
> considerably slower than the IO path for kernel-based file systems in
> Linux: slower as in multiples of 10, not multiples. There's a fair
> amount of copying, allocation, and bouncing in and out of the kernel,
> and this activity does not come cheap.
>
> So, one speculation is that a kernel-based Plan 9 file system might be
> quite fast. And that's enough random text for a Sunday.

Well number of syscalls to hand off delegating filesystem tasks to a
userspace filesystem implementation is key.  Microkernels try to
optimize this as do virtualization hypervisors, because, as observed,
bouncing around between kernel and userspace gives performance hopes
the beat-down.


>
> ron
>
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-10 14:45           ` David Leimbach
@ 2011-01-10 15:06             ` Charles Forsyth
  0 siblings, 0 replies; 27+ messages in thread
From: Charles Forsyth @ 2011-01-10 15:06 UTC (permalink / raw)
  To: 9fans

I think it's fair to say that the IO path for fossil is
> considerably slower than the IO path for kernel-based file systems in
> Linux: slower as in multiples of 10, not multiples. There's a fair
> amount of copying, allocation, and bouncing in and out of the kernel,

for common applications you'd certainly hope that linux is
many times faster than some systems, because it's loading things like this:

% size *
   text	   data	    bss	    dec	    hex	filename
44,520,643	 105232	 350084	44975959	2ae4757	chrome
   9527	    472	      8	  10007	   2717	chrome-sandbox
2,049,594	  12784	1089440	3151818	 3017ca	libffmpegsumo.so
11,711,965	 221500	 192368	12125833	 b90689	libgcflashplayer.so
18,376,687	 122092	  96344	18595123	11bbd33	libpdf.so

% file chrome
chrome: i386 ELF executable

is it statically linked, i wonder? no, all that comes attached to this:

% ldd chrome
	linux-gate.so.1 =>  (0x00c08000)
	libX11.so.6 => /usr/lib/libX11.so.6 (0x00a2d000)
	libdl.so.2 => /lib/libdl.so.2 (0x0050c000)
	libXrender.so.1 => /usr/lib/libXrender.so.1 (0x0089a000)
	libXss.so.1 => /usr/lib/libXss.so.1 (0x0076e000)
	libXext.so.6 => /usr/lib/libXext.so.6 (0x00835000)
	librt.so.1 => /lib/librt.so.1 (0x00110000)
	libgtk-x11-2.0.so.0 => /usr/lib/libgtk-x11-2.0.so.0 (0x00119000)
	libgdk-x11-2.0.so.0 => /usr/lib/libgdk-x11-2.0.so.0 (0x00eea000)
	libgdk_pixbuf-2.0.so.0 => /usr/lib/libgdk_pixbuf-2.0.so.0 (0x00eae000)
	libpangocairo-1.0.so.0 => /usr/lib/libpangocairo-1.0.so.0 (0x004e9000)
	libpango-1.0.so.0 => /usr/lib/libpango-1.0.so.0 (0x00510000)
	libcairo.so.2 => /usr/lib/libcairo.so.2 (0x00552000)
	libgobject-2.0.so.0 => /usr/lib/libgobject-2.0.so.0 (0x00c92000)
	libgthread-2.0.so.0 => /usr/lib/libgthread-2.0.so.0 (0x004f5000)
	libglib-2.0.so.0 => /lib/libglib-2.0.so.0 (0x00689000)
	libnss3.so.1d => /usr/lib/libnss3.so.1d (0x008a4000)
	libnssutil3.so.1d => /usr/lib/libnssutil3.so.1d (0x00865000)
	libsmime3.so.1d => /usr/lib/libsmime3.so.1d (0x00605000)
	libplc4.so.0d => /usr/lib/libplc4.so.0d (0x004fa000)
	libnspr4.so.0d => /usr/lib/libnspr4.so.0d (0x00d5c000)
	libpthread.so.0 => /lib/libpthread.so.0 (0x0062a000)
	libz.so.1 => /lib/libz.so.1 (0x009d7000)
	libfontconfig.so.1 => /usr/lib/libfontconfig.so.1 (0x00644000)
	libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0x00772000)
	libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x007e9000)
	libpng12.so.0 => /lib/libpng12.so.0 (0x0080a000)
	libgconf-2.so.4 => /usr/lib/libgconf-2.so.4 (0x009ec000)
	libresolv.so.2 => /lib/libresolv.so.2 (0x00674000)
	libcups.so.2 => /usr/lib/libcups.so.2 (0x00b4a000)
	libgcrypt.so.11 => /lib/libgcrypt.so.11 (0x00b94000)
	libbz2.so.1.0 => /lib/libbz2.so.1.0 (0x00758000)
	libasound.so.2 => /usr/lib/libasound.so.2 (0x00f81000)
	libexpat.so.1 => /lib/libexpat.so.1 (0x00c09000)
	libdbus-glib-1.so.2 => /usr/lib/libdbus-glib-1.so.2 (0x00845000)
	libdbus-1.so.3 => /lib/libdbus-1.so.3 (0x00c30000)
	libXdamage.so.1 => /usr/lib/libXdamage.so.1 (0x004ff000)
	libXtst.so.6 => /usr/lib/libXtst.so.6 (0x00503000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x21b78000)
	libm.so.6 => /lib/libm.so.6 (0x00c6c000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0087e000)
	libc.so.6 => /lib/libc.so.6 (0x1bf3d000)
	libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00d27000)
	/lib/ld-linux.so.2 (0x00dfe000)
	libXinerama.so.1 => /usr/lib/libXinerama.so.1 (0x0076a000)
	libXi.so.6 => /usr/lib/libXi.so.6 (0x009b5000)
	libXrandr.so.2 => /usr/lib/libXrandr.so.2 (0x009c3000)
	libXcursor.so.1 => /usr/lib/libXcursor.so.1 (0x009cb000)
	libXcomposite.so.1 => /usr/lib/libXcomposite.so.1 (0x0082f000)
	libXfixes.so.3 => /usr/lib/libXfixes.so.3 (0x00e88000)
	libatk-1.0.so.0 => /usr/lib/libatk-1.0.so.0 (0x00cd4000)
	libgio-2.0.so.0 => /usr/lib/libgio-2.0.so.0 (0x0c8cc000)
	libpangoft2-1.0.so.0 => /usr/lib/libpangoft2-1.0.so.0 (0x00cef000)
	libgmodule-2.0.so.0 => /usr/lib/libgmodule-2.0.so.0 (0x00a1d000)
	libpixman-1.so.0 => /usr/lib/libpixman-1.so.0 (0x00d90000)
	libxcb-shm.so.0 => /usr/lib/libxcb-shm.so.0 (0x00a21000)
	libxcb-render.so.0 => /usr/lib/libxcb-render.so.0 (0x00a25000)
	libpcre.so.3 => /lib/libpcre.so.3 (0x00e1c000)
	libplds4.so => /usr/lib/libplds4.so (0x00d15000)
	libORBit-2.so.0 => /usr/lib/libORBit-2.so.0 (0x19467000)
	libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x00e51000)
	libgnutls.so.26 => /usr/lib/libgnutls.so.26 (0x06660000)
	libavahi-common.so.3 => /usr/lib/libavahi-common.so.3 (0x00d19000)
	libavahi-client.so.3 => /usr/lib/libavahi-client.so.3 (0x00d41000)
	libgpg-error.so.0 => /lib/libgpg-error.so.0 (0x00d51000)
	libXau.so.6 => /usr/lib/libXau.so.6 (0x00d56000)
	libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x00df0000)
	libselinux.so.1 => /lib/libselinux.so.1 (0x00e8e000)
	libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x12d2b000)
	libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0x015b3000)
	libcom_err.so.2 => /lib/libcom_err.so.2 (0x00df6000)
	libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0x00e80000)
	libkeyutils.so.1 => /lib/libkeyutils.so.1 (0x00dfa000)
	libtasn1.so.3 => /usr/lib/libtasn1.so.3 (0x00ec7000)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-09 17:51   ` John Floren
@ 2011-01-10 18:07     ` Francisco J Ballesteros
  2011-01-10 18:48       ` hiro
  0 siblings, 1 reply; 27+ messages in thread
From: Francisco J Ballesteros @ 2011-01-10 18:07 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>
> Right, my results were that you get pretty much exactly the same
> performance when you're working over a LAN whether you choose streams
> or regular 9P. Streaming only really starts to help when you're up
> into the multiple-millisecond RTT range.

This is weird. Didn't read the thesis yet, sorry, but, do you know the reason?
I think that when I measured op I found that even on lans, using get instead
of multiple rpcs was measurable. Of course users would not notice unless latency
gets higher or many rpcs add their times.
thanks



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-10 18:07     ` Francisco J Ballesteros
@ 2011-01-10 18:48       ` hiro
  2011-01-10 19:06         ` erik quanstrom
  2011-01-10 19:53         ` John Floren
  0 siblings, 2 replies; 27+ messages in thread
From: hiro @ 2011-01-10 18:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

What bandwidth? With a gbit I could notice a difference. But probably
the fault of the linux v9fs modules I used (half usec RTT).

On 1/10/11, Francisco J Ballesteros <nemo@lsub.org> wrote:
>>
>> Right, my results were that you get pretty much exactly the same
>> performance when you're working over a LAN whether you choose streams
>> or regular 9P. Streaming only really starts to help when you're up
>> into the multiple-millisecond RTT range.
>
> This is weird. Didn't read the thesis yet, sorry, but, do you know the
> reason?
> I think that when I measured op I found that even on lans, using get instead
> of multiple rpcs was measurable. Of course users would not notice unless
> latency
> gets higher or many rpcs add their times.
> thanks
>
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-10 18:48       ` hiro
@ 2011-01-10 19:06         ` erik quanstrom
  2011-01-10 19:53         ` John Floren
  1 sibling, 0 replies; 27+ messages in thread
From: erik quanstrom @ 2011-01-10 19:06 UTC (permalink / raw)
  To: 9fans

On Mon Jan 10 13:50:09 EST 2011, 23hiro@googlemail.com wrote:
> What bandwidth? With a gbit I could notice a difference. But probably
> the fault of the linux v9fs modules I used (half usec RTT).
>

could you perhaps have intended 0.5ms, not µs?  here's mellanox
bragging about 4µs latency for 10gbe:

http://www.highfrequencytraders.com/technology/2010/09/20/mellanox-and-arista-deliver-breakthrough-10gbe-latency-for-financial-services-applications/

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-10 18:48       ` hiro
  2011-01-10 19:06         ` erik quanstrom
@ 2011-01-10 19:53         ` John Floren
  2011-01-11 11:33           ` hiro
  1 sibling, 1 reply; 27+ messages in thread
From: John Floren @ 2011-01-10 19:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I was using a slightly weird configuration, partially because it's the
hardware I had available, and partly because I thought it might more
adequately represent a typical internet connection. On one side of the
Linux bridge was a 10 Mbit hub, on the other side, a 100 Mbit switch.

The average latency was 500 us RTT.

Results may vary greatly when you bump up to gbit, but since the
Internet isn't gbit (and I can't afford gbit) I used much slower
hardware.

The testing setup and the results are all in the document; I
generalized a bit in my earlier email, in that streaming 9P had a
*slight* edge over regular 9P at 500 us RTT, but it's very slight. I
think at that point the bottleneck was bandwidth rather than latency,
and 9P was still able to get RPCs over as quickly as streams.


John

On Mon, Jan 10, 2011 at 10:48 AM, hiro <23hiro@googlemail.com> wrote:
> What bandwidth? With a gbit I could notice a difference. But probably
> the fault of the linux v9fs modules I used (half usec RTT).
>
> On 1/10/11, Francisco J Ballesteros <nemo@lsub.org> wrote:
>>>
>>> Right, my results were that you get pretty much exactly the same
>>> performance when you're working over a LAN whether you choose streams
>>> or regular 9P. Streaming only really starts to help when you're up
>>> into the multiple-millisecond RTT range.
>>
>> This is weird. Didn't read the thesis yet, sorry, but, do you know the
>> reason?
>> I think that when I measured op I found that even on lans, using get instead
>> of multiple rpcs was measurable. Of course users would not notice unless
>> latency
>> gets higher or many rpcs add their times.
>> thanks
>>
>>
>
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
  2011-01-10 19:53         ` John Floren
@ 2011-01-11 11:33           ` hiro
  0 siblings, 0 replies; 27+ messages in thread
From: hiro @ 2011-01-11 11:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Sorry, I wanted to say half a ms.
I also see 100us on another pc.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] fs performance
@ 2011-01-10  3:57 erik quanstrom
  0 siblings, 0 replies; 27+ messages in thread
From: erik quanstrom @ 2011-01-10  3:57 UTC (permalink / raw)
  To: 9fans

> Peak Local file access bandwidth is typically 50 to 100 MBPs
> x number of disks; over the localnet it is about 80MBps. On
> my internet connection I barely get 1MBps download (& 0.2MBps
> upload) speeds.

interesting observation: when i first set up the diskless fileserver
at coraid, we had a mirror of the worm in another building across
an awful wireless connection.  we had no more than 1mbit.
at first i was a little worried about this, but then i realized that
128k/s * 86400s/day is 10.5gb/day.

btw, with aoe, you should saturate the network—125 mb/s/interface
for gbe so a typical el-cheepo computer these days can do 250mb/s
over aoe without breaking a sweat.  of course you'll get 10x that with
10gbe.

i agree with charles, network attached, or even internet-attached
storage seems like the way to go.  for internet-attached storage,
the amazing imbalances of very slow last-mile networks/very cheep
mass storage and the power of slow networks over time lead me
to think there are some very interesting engineering tradeoffs to
be made.

- erik



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-01-11 11:33 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-09 17:06 [9fans] fs performance erik quanstrom
2011-01-09 17:29 ` ron minnich
2011-01-09 17:51   ` John Floren
2011-01-10 18:07     ` Francisco J Ballesteros
2011-01-10 18:48       ` hiro
2011-01-10 19:06         ` erik quanstrom
2011-01-10 19:53         ` John Floren
2011-01-11 11:33           ` hiro
2011-01-09 18:31   ` erik quanstrom
2011-01-09 19:54   ` Bakul Shah
2011-01-09 20:25     ` ron minnich
2011-01-09 20:47       ` erik quanstrom
2011-01-09 21:04         ` ron minnich
2011-01-09 21:17       ` Bakul Shah
2011-01-09 21:59         ` erik quanstrom
2011-01-09 22:58         ` Charles Forsyth
2011-01-09 22:55           ` ron minnich
2011-01-09 23:50             ` Charles Forsyth
2011-01-10  3:26           ` Bakul Shah
2011-01-09 21:14     ` erik quanstrom
2011-01-09 21:38       ` Bakul Shah
2011-01-09 21:56         ` ron minnich
2011-01-09 22:02           ` erik quanstrom
2011-01-10 14:45           ` David Leimbach
2011-01-10 15:06             ` Charles Forsyth
2011-01-09 22:00         ` erik quanstrom
2011-01-10  3:57 erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).