[9fans] sept. town hall meeting

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] sept. town hall meeting
@ 2005-09-09 20:55 Francisco Ballesteros
  2005-09-09 21:05 ` Uriel
  0 siblings, 1 reply; 11+ messages in thread
From: Francisco Ballesteros @ 2005-09-09 20:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

What about sched´ing the next town hall meeting
for the third friday of september?
I'm willing to run it if there are no objections or anyone else
prefers to do that.

Preferences regarding time? Was the last one (8pm gmt?)
convenient?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] sept. town hall meeting
  2005-09-09 20:55 [9fans] sept. town hall meeting Francisco Ballesteros
@ 2005-09-09 21:05 ` Uriel
  2005-09-09 22:10   ` [9fans] reliability and failing over Francisco Ballesteros
  0 siblings, 1 reply; 11+ messages in thread
From: Uriel @ 2005-09-09 21:05 UTC (permalink / raw)
  To: 9fans

On Fri, Sep 09, 2005 at 10:55:01PM +0200, Francisco Ballesteros wrote:
> What about sched´ing the next town hall meeting
> for the third friday of september?
> I'm willing to run it if there are no objections or anyone else
> prefers to do that.
> 
> Preferences regarding time? Was the last one (8pm gmt?)
> convenient?
I had tentatively scheduled the next THM for Sep 10th, but I have been
too stressed to even send out an announcement.

After some research it seems that Saturdays are the best days for most
people(specially given the diversity of timezones involved), 20:00 GMT
seems like a time that works reasonably well for most people.

For consistency(and avoid having to come up with a new date/time), I
suggest having a THM the third Saturday of the month at 20:00 GMT

Any complains? if not, I will make it 'official' in the Wiki

For the Summary of the last THM see(thanks to Hyperion for writing it
up):

http://plan9.bell-labs.com/wiki/plan9/thm_2005-08-15_Summary/

uriel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [9fans] reliability and failing over.
  2005-09-09 21:05 ` Uriel
@ 2005-09-09 22:10   ` Francisco Ballesteros
  2005-09-09 22:24     ` Russ Cox
  2005-09-10  1:22     ` Eric Van Hensbergen
  0 siblings, 2 replies; 11+ messages in thread
From: Francisco Ballesteros @ 2005-09-09 22:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Funny. The 9p reliability project looks to me a lot like the redirfs
that we played with before introducing the Plan B volumes into the kernel.
It provided failover (on replicated FSs, by some other means) and could
recover the fids just by keeping track of their paths.

The user level process I'm with now is quite similar to that (appart from
including the language to select particular volumes) it maintains a fid
table knowning which server, and which path within the server are the ones
for each fid. It's what the Plan B kernel ns does, but within a server.

Probably, the increase in latency you are seeing is the one I'm going to
see in volfs. The 2x penalty in performace is what one could expect, because
you have twice the latency. However, the in-kernel implementation has no
penalty at all, because the kernel can rewrite the mount tables.

Maybe we should talk about this.
Eric? Russ? What do you say? Is it worth to pay the extra
latency just to avoid a change (serious, I admit) in the kernel?

On 9/9/05, Uriel <uriell@binarydream.org> wrote:
> On Fri, Sep 09, 2005 at 10:55:01PM +0200, Francisco Ballesteros wrote:
> > What about sched´ing the next town hall meeting
> > for the third friday of september?
> > I'm willing to run it if there are no objections or anyone else
> > prefers to do that.
> >
> > Preferences regarding time? Was the last one (8pm gmt?)
> > convenient?
> I had tentatively scheduled the next THM for Sep 10th, but I have been
> too stressed to even send out an announcement.
> 
> After some research it seems that Saturdays are the best days for most
> people(specially given the diversity of timezones involved), 20:00 GMT
> seems like a time that works reasonably well for most people.
> 
> For consistency(and avoid having to come up with a new date/time), I
> suggest having a THM the third Saturday of the month at 20:00 GMT
> 
> Any complains? if not, I will make it 'official' in the Wiki
> 
> For the Summary of the last THM see(thanks to Hyperion for writing it
> up):
> 
> http://plan9.bell-labs.com/wiki/plan9/thm_2005-08-15_Summary/
> 
> uriel
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-09 22:10   ` [9fans] reliability and failing over Francisco Ballesteros
@ 2005-09-09 22:24     ` Russ Cox
  2005-09-09 22:50       ` Gorka guardiola
  2005-09-10  1:22     ` Eric Van Hensbergen
  1 sibling, 1 reply; 11+ messages in thread
From: Russ Cox @ 2005-09-09 22:24 UTC (permalink / raw)
  To: Francisco Ballesteros, Fans of the OS Plan 9 from Bell Labs

> Funny. The 9p reliability project looks to me a lot like the redirfs
> that we played with before introducing the Plan B volumes into the kernel.
> It provided failover (on replicated FSs, by some other means) and could
> recover the fids just by keeping track of their paths.
> 
> The user level process I'm with now is quite similar to that (appart from
> including the language to select particular volumes) it maintains a fid
> table knowning which server, and which path within the server are the ones
> for each fid. It's what the Plan B kernel ns does, but within a server.
> 
> Probably, the increase in latency you are seeing is the one I'm going to
> see in volfs. The 2x penalty in performace is what one could expect, because
> you have twice the latency. However, the in-kernel implementation has no
> penalty at all, because the kernel can rewrite the mount tables.
> 
> Maybe we should talk about this.
> Eric? Russ? What do you say? Is it worth to pay the extra
> latency just to avoid a change (serious, I admit) in the kernel?

I keep seeing that 2x number but I still don't believe it's actually
reasonable to measure the hit on an empty loopback file server.
Do something over a 100Mbps file server connection
talking to fossil and see what performance hit you get then.

Stuff in the kernel is much harder to change and debug.
Unless there's a compelling reason, I'd like to see it stay
in user space.  And I'm not yet convinced that performance
is a compelling reason.

Russ


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-09 22:24     ` Russ Cox
@ 2005-09-09 22:50       ` Gorka guardiola
  2005-09-09 23:02         ` Francisco Ballesteros
  0 siblings, 1 reply; 11+ messages in thread
From: Gorka guardiola @ 2005-09-09 22:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 9/9/05, Russ Cox <rsc@swtch.com> wrote:
> > Funny. The 9p reliability project looks to me a lot like the redirfs
> > that we played with before introducing the Plan B volumes into the kernel.
> > It provided failover (on replicated FSs, by some other means) and could
> > recover the fids just by keeping track of their paths.

It is almost the same thing. The difference is that one is 9P-9P
an the other is 9P-syscall(read-write...). We didnt have a plan 9 kernel on the
other side, so we couldnt use redirfs, that is why we ported recover.
It is better for linux (p9p) too, as it doesnt require to mount the
filesystem and then use it, so you dont depend on having someone to
serve the 9P files by mounting them. I am not sure about Plan 9. On
one side, recover knows more about the stuff under it, so it has more
granularity, and can fail in a nicer way. On the other side, redirfs
is much much simpler.

> >
> > The user level process I'm with now is quite similar to that (appart from
> > including the language to select particular volumes) it maintains a fid
> > table knowning which server, and which path within the server are the ones
> > for each fid. It's what the Plan B kernel ns does, but within a server.
> >
> > Probably, the increase in latency you are seeing is the one I'm going to
> > see in volfs. The 2x penalty in performace is what one could expect, because
> > you have twice the latency. However, the in-kernel implementation has no
> > penalty at all, because the kernel can rewrite the mount tables.
> >
> > Maybe we should talk about this.
> > Eric? Russ? What do you say? Is it worth to pay the extra
> > latency just to avoid a change (serious, I admit) in the kernel?
> 
> I keep seeing that 2x number but I still don't believe it's actually
> reasonable to measure the hit on an empty loopback file server.
> Do something over a 100Mbps file server connection
> talking to fossil and see what performance hit you get then.

Yes, this increase in latency is in the loopback. If you are
using a network it is probably completely lost in the noise, the
network being probably 100 times slower than the loopback.

> 
> Stuff in the kernel is much harder to change and debug.
> Unless there's a compelling reason, I'd like to see it stay
> in user space.  And I'm not yet convinced that performance
> is a compelling reason.

I agree, though it depends on the application. For us (normal users) I
agree completely that it is not compelling. Some people out there are
doing stuff in which performance
is important (let them write the code? :-)).
-- 
- curiosity sKilled the cat


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-09 22:50       ` Gorka guardiola
@ 2005-09-09 23:02         ` Francisco Ballesteros
  2005-09-09 23:06           ` Russ Cox
  2005-09-10 13:11           ` Sape Mullender
  0 siblings, 2 replies; 11+ messages in thread
From: Francisco Ballesteros @ 2005-09-09 23:02 UTC (permalink / raw)
  To: paurea, Fans of the OS Plan 9 from Bell Labs

Beware of the latency.

Plan B usage shows that latency is the biggest problem.
I admit that only in unions, but if you join N different file servers
in a directory, and then you dup the latency, it may be a problem.
A workaround is just not to join too many servers, but it's a workaround,
not a fix.

Anyway, when I complete volfs we'll see if it´s beareable or not.
If it is, I'll be happy to discard the kernel changes.  If it's not,
we'll have to see why.

On 9/10/05, Gorka guardiola <paurea@gmail.com> wrote:
> On 9/9/05, Russ Cox <rsc@swtch.com> wrote:
> > > Funny. The 9p reliability project looks to me a lot like the redirfs
> > > that we played with before introducing the Plan B volumes into the kernel.
> > > It provided failover (on replicated FSs, by some other means) and could
> > > recover the fids justh by keeping track of their paths.
> 
> It is almost the same thing. The difference is that one is 9P-9P
> an the other is 9P-syscall(read-write...). We didnt have a plan 9 kernel on the
> other side, so we couldnt use redirfs, that is why we ported recover.
> It is better for linux (p9p) too, as it doesnt require to mount the
> filesystem and then use it, so you dont depend on having someone to
> serve the 9P files by mounting them. I am not sure about Plan 9. On
> one side, recover knows more about the stuff under it, so it has more
> granularity, and can fail in a nicer way. On the other side, redirfs
> is much much simpler.
> 
> > >
> > > The user level process I'm with now is quite similar to that (appart from
> > > including the language to select particular volumes) it maintains a fid
> > > table knowning which server, and which path within the server are the ones
> > > for each fid. It's what the Plan B kernel ns does, but within a server.
> > >
> > > Probably, the increase in latency you are seeing is the one I'm going to
> > > see in volfs. The 2x penalty in performace is what one could expect, because
> > > you have twice the latency. However, the in-kernel implementation has no
> > > penalty at all, because the kernel can rewrite the mount tables.
> > >
> > > Maybe we should talk about this.
> > > Eric? Russ? What do you say? Is it worth to pay the extra
> > > latency just to avoid a change (serious, I admit) in the kernel?
> >
> > I keep seeing that 2x number but I still don't believe it's actually
> > reasonable to measure the hit on an empty loopback file server.
> > Do something over a 100Mbps file server connection
> > talking to fossil and see what performance hit you get then.
> 
> Yes, this increase in latency is in the loopback. If you are
> using a network it is probably completely lost in the noise, the
> network being probably 100 times slower than the loopback.
> 
> >
> > Stuff in the kernel is much harder to change and debug.
> > Unless there's a compelling reason, I'd like to see it stay
> > in user space.  And I'm not yet convinced that performance
> > is a compelling reason.
> 
> I agree, though it depends on the application. For us (normal users) I
> agree completely that it is not compelling. Some people out there are
> doing stuff in which performance
> is important (let them write the code? :-)).
> --
> - curiosity sKilled the cat
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-09 23:02         ` Francisco Ballesteros
@ 2005-09-09 23:06           ` Russ Cox
  2005-09-09 23:16             ` Francisco Ballesteros
  2005-09-10 13:11           ` Sape Mullender
  1 sibling, 1 reply; 11+ messages in thread
From: Russ Cox @ 2005-09-09 23:06 UTC (permalink / raw)
  To: Francisco Ballesteros, Fans of the OS Plan 9 from Bell Labs

> Plan B usage shows that latency is the biggest problem.
> I admit that only in unions, but if you join N different file servers
> in a directory, and then you dup the latency, it may be a problem.
> A workaround is just not to join too many servers, but it's a workaround,
> not a fix.

Of course, one could issue walks in parallel to all members
of a union and then you'd get rid of this particular problem.
And doing so would be easier in user space.

Russ


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-09 23:06           ` Russ Cox
@ 2005-09-09 23:16             ` Francisco Ballesteros
  0 siblings, 0 replies; 11+ messages in thread
From: Francisco Ballesteros @ 2005-09-09 23:16 UTC (permalink / raw)
  To: Russ Cox, Fans of the OS Plan 9 from Bell Labs

Yep. I stand corrected. thanks.
But let's try first the simplest (unoptimized) version. If it
works well for real usage, I prefer a simpler (slower maybe) version.


On 9/10/05, Russ Cox <rsc@swtch.com> wrote:
> > Plan B usage shows that latency is the biggest problem.
> > I admit that only in unions, but if you join N different file servers
> > in a directory, and then you dup the latency, it may be a problem.
> > A workaround is just not to join too many servers, but it's a workaround,
> > not a fix.
> 
> Of course, one could issue walks in parallel to all members
> of a union and then you'd get rid of this particular problem.
> And doing so would be easier in user space.
> 
> Russ
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-09 23:02         ` Francisco Ballesteros
  2005-09-09 23:06           ` Russ Cox
@ 2005-09-10 13:11           ` Sape Mullender
  2005-09-10 21:36             ` Francisco Ballesteros
  1 sibling, 1 reply; 11+ messages in thread
From: Sape Mullender @ 2005-09-10 13:11 UTC (permalink / raw)
  To: nemo, 9fans

> Beware of the latency.
> 
> Plan B usage shows that latency is the biggest problem.

That's funny.  We've been working on a low-latency wireless comms protocol
here at the Labs that we've called Plan B (our wireless emulator runs on Plan 9,
of course), because we'd come to the conclusion that latency is the biggest problem
in current cellular systems.

	Sape

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-10 13:11           ` Sape Mullender
@ 2005-09-10 21:36             ` Francisco Ballesteros
  0 siblings, 0 replies; 11+ messages in thread
From: Francisco Ballesteros @ 2005-09-10 21:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Funny, I admit.

We saw the problem when combining multiple user-level file servers
(not fossil, but device drivers for X10, guis, and the like) within
the same dir,
specially on wireless at home.
Issuing requests in parallel as Russ suggested will help, for sure,
but if things
dont get too bad, the simplest user-level file server will be a good option.

We just happen to have the kernel changed, because I wanted to be able to say
"no overhead, but for the unions", which required a kernel to measure.
That´s why
I was a considering the kernel changes besides the user level alternative. Guess
I have an opportunity to experiment with a few things now that it´s going to be
a user program :-)

On 9/10/05, Sape Mullender <sape@plan9.bell-labs.com> wrote:
> > Beware of the latency.
> >
> > Plan B usage shows that latency is the biggest problem.
> 
> That's funny.  We've been working on a low-latency wireless comms protocol
> here at the Labs that we've called Plan B (our wireless emulator runs on Plan 9,
> of course), because we'd come to the conclusion that latency is the biggest problem
> in current cellular systems.
> 
>         Sape
> 
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] reliability and failing over.
  2005-09-09 22:10   ` [9fans] reliability and failing over Francisco Ballesteros
  2005-09-09 22:24     ` Russ Cox
@ 2005-09-10  1:22     ` Eric Van Hensbergen
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Van Hensbergen @ 2005-09-10  1:22 UTC (permalink / raw)
  To: Francisco Ballesteros, Fans of the OS Plan 9 from Bell Labs

On 9/9/05, Francisco Ballesteros <nemo@lsub.org> wrote:
> Funny. The 9p reliability project looks to me a lot like the redirfs
> that we played with before introducing the Plan B volumes into the kernel.
>

Yes, Gorka and I talked about this when he first started down the
path.  There were some differences that seemed important at the time,
but I can't recall what they were.

> 
> Probably, the increase in latency you are seeing is the one I'm going to
> see in volfs. The 2x penalty in performace is what one could expect, because
> you have twice the latency. However, the in-kernel implementation has no
> penalty at all, because the kernel can rewrite the mount tables.
> 
> Maybe we should talk about this.
> Eric? Russ? What do you say? Is it worth to pay the extra
> latency just to avoid a change (serious, I admit) in the kernel?
> 

Its something important to figure out.  I'll admit we are testing the
worst case scenerio here -- but the application we started the work
for (inter-partition communication) is very low-latency, and so its
desirable to eliminate as much overhead as possible.  Still, its worth
running local area network tests, but I'm going to go for gigabit
versus 100 mbit.  Gorka and I will work on this next week while we
finish up the recover paper.

Its quite likely I'll look at integrating something like this service
into my library-OS version of the 9P client and/or v9fs.  Still not
sure if it would make sense in Plan 9 proper or not.

        -eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-09-10 21:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-09 20:55 [9fans] sept. town hall meeting Francisco Ballesteros
2005-09-09 21:05 ` Uriel
2005-09-09 22:10   ` [9fans] reliability and failing over Francisco Ballesteros
2005-09-09 22:24     ` Russ Cox
2005-09-09 22:50       ` Gorka guardiola
2005-09-09 23:02         ` Francisco Ballesteros
2005-09-09 23:06           ` Russ Cox
2005-09-09 23:16             ` Francisco Ballesteros
2005-09-10 13:11           ` Sape Mullender
2005-09-10 21:36             ` Francisco Ballesteros
2005-09-10  1:22     ` Eric Van Hensbergen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).