9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] QTCTL?
@ 2007-10-31 18:40 Francisco J Ballesteros
  2007-10-31 18:56 ` Eric Van Hensbergen
  2007-10-31 19:42 ` erik quanstrom
  0 siblings, 2 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-10-31 18:40 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hi,

while hunting yet another bug in the octopus, I´ve been thinking that
one problem that we have in general, in Plan 9,
 is that there are files that behave like files, and files that
do not.

For example, append only files do not, offsets are ignored on writes.
ctl files are not, either. You write a ctl string, and reading the file reports
something else. Clone files are different files, each time they are open.

This is a problem when (like we do in the octopus) you try to cache files.
But it´s also a problem for things like tar and to whoever tries to use the
file as a plain one.

Why don´t add a QTCTL bit to Qid.type?
It would mean "this file does not behave like a regular file, do not cache and
handle with care).

I think the change can be incorporated without causing a nightmare and
it would make things more clean, regarding what can one expect from a file
after looking at its directory entry.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 18:40 [9fans] QTCTL? Francisco J Ballesteros
@ 2007-10-31 18:56 ` Eric Van Hensbergen
  2007-10-31 19:13   ` Charles Forsyth
  2007-10-31 19:42 ` erik quanstrom
  1 sibling, 1 reply; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-10-31 18:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/31/07, Francisco J Ballesteros <nemo@lsub.org> wrote:
>
> while hunting yet another bug in the octopus, I´ve been thinking that
> one problem that we have in general, in Plan 9,
>  is that there are files that behave like files, and files that
> do not.
>
> For example, append only files do not, offsets are ignored on writes.
> ctl files are not, either. You write a ctl string, and reading the file reports
> something else. Clone files are different files, each time they are open.
>
> This is a problem when (like we do in the octopus) you try to cache files.
> But it´s also a problem for things like tar and to whoever tries to use the
> file as a plain one.
>
> Why don´t add a QTCTL bit to Qid.type?
> It would mean "this file does not behave like a regular file, do not cache and
> handle with care).
>

IIRC, qid.version == 0 is used to mark synthetics (like ctl) for the
purposes of being marked as uncacheable and should be handled with
care.

          -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 18:56 ` Eric Van Hensbergen
@ 2007-10-31 19:13   ` Charles Forsyth
  2007-10-31 19:33     ` Eric Van Hensbergen
  2007-10-31 20:43     ` geoff
  0 siblings, 2 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-10-31 19:13 UTC (permalink / raw)
  To: 9fans

> IIRC, qid.version == 0 is used to mark synthetics (like ctl) for the
> purposes of being marked as uncacheable and should be handled with
> care.

i don't see that anywhere.  MCACHE allows things in a mounted space
to be cached; otherwise, i'd suppose not.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 19:13   ` Charles Forsyth
@ 2007-10-31 19:33     ` Eric Van Hensbergen
  2007-10-31 19:39       ` erik quanstrom
  2007-10-31 20:43     ` geoff
  1 sibling, 1 reply; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-10-31 19:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/31/07, Charles Forsyth <forsyth@terzarima.net> wrote:
> > IIRC, qid.version == 0 is used to mark synthetics (like ctl) for the
> > purposes of being marked as uncacheable and should be handled with
> > care.
>
> i don't see that anywhere.  MCACHE allows things in a mounted space
> to be cached; otherwise, i'd suppose not.
>
>

hurumph.  Don't know where i got that from - I tried to base my v9fs
cacheing stuff on cfs.c, but I don't see any qid.version==0 checks
there.  Then I thought perhaps it was from a conversation wtih Russ
when I was doing cacheing in v9fs -- but on searching my gmail he was
saying that it didn't always hold true -- so perhaps something new is
needed...

          -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 19:33     ` Eric Van Hensbergen
@ 2007-10-31 19:39       ` erik quanstrom
  0 siblings, 0 replies; 77+ messages in thread
From: erik quanstrom @ 2007-10-31 19:39 UTC (permalink / raw)
  To: 9fans

> hurumph.  Don't know where i got that from - I tried to base my v9fs
> cacheing stuff on cfs.c, but I don't see any qid.version==0 checks
> there.  Then I thought perhaps it was from a conversation wtih Russ
> when I was doing cacheing in v9fs -- but on searching my gmail he was
> saying that it didn't always hold true -- so perhaps something new is
> needed...

this is not true of devsd devices.  the version is used to deal with
removable devices.  if you have a cd open and the media is changed,
i/o to the device will return Echange.

raw /dev/aoe devices will do the same thing.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 18:40 [9fans] QTCTL? Francisco J Ballesteros
  2007-10-31 18:56 ` Eric Van Hensbergen
@ 2007-10-31 19:42 ` erik quanstrom
  2007-10-31 19:49   ` Eric Van Hensbergen
  1 sibling, 1 reply; 77+ messages in thread
From: erik quanstrom @ 2007-10-31 19:42 UTC (permalink / raw)
  To: 9fans

> Why don´t add a QTCTL bit to Qid.type?
> It would mean "this file does not behave like a regular file, do not cache and
> handle with care).

why are the current namespace conventions insufficient?
/mnt, /net, and /dev hold most, if not all, of the special files.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 19:42 ` erik quanstrom
@ 2007-10-31 19:49   ` Eric Van Hensbergen
  2007-10-31 20:03     ` erik quanstrom
  0 siblings, 1 reply; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-10-31 19:49 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/31/07, erik quanstrom <quanstro@quanstro.net> wrote:
> > Why don´t add a QTCTL bit to Qid.type?
> > It would mean "this file does not behave like a regular file, do not cache and
> > handle with care).
>
> why are the current namespace conventions insufficient?
> /mnt, /net, and /dev hold most, if not all, of the special files.
>

The dynamic nature of namespace works against such conventions.
Besides it would be nice to have a mechanism that could work in other
systems that use 9p.  File servers should be able to convey whether a
file is cache-able or not.

             -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 19:49   ` Eric Van Hensbergen
@ 2007-10-31 20:03     ` erik quanstrom
  2007-10-31 20:10       ` Latchesar Ionkov
                         ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: erik quanstrom @ 2007-10-31 20:03 UTC (permalink / raw)
  To: 9fans

> The dynamic nature of namespace works against such conventions.
> Besides it would be nice to have a mechanism that could work in other
> systems that use 9p.  File servers should be able to convey whether a
> file is cache-able or not.

i'm not sure i follow this argument.  plan 9 namespaces are dynamic.
one could put the network devices anywhere, but they are conventionally
put on /net.  there are no "regular" files in /net.

perhaps if you gave a concrete example of why conventions can't sort
this out it would make more sense to me.  (i'm slow.)

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:03     ` erik quanstrom
@ 2007-10-31 20:10       ` Latchesar Ionkov
  2007-10-31 20:12       ` Eric Van Hensbergen
  2007-10-31 20:17       ` Russ Cox
  2 siblings, 0 replies; 77+ messages in thread
From: Latchesar Ionkov @ 2007-10-31 20:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

You have a caching server (a separate machine) that caches files from  
servers that are far away. Some of the file servers it caches are  
synthetic. You mount one of them in /net. The server doesn't know that.

On Oct 31, 2007, at 2:03 PM, erik quanstrom wrote:

>> The dynamic nature of namespace works against such conventions.
>> Besides it would be nice to have a mechanism that could work in other
>> systems that use 9p.  File servers should be able to convey whether a
>> file is cache-able or not.
>
> i'm not sure i follow this argument.  plan 9 namespaces are dynamic.
> one could put the network devices anywhere, but they are  
> conventionally
> put on /net.  there are no "regular" files in /net.
>
> perhaps if you gave a concrete example of why conventions can't sort
> this out it would make more sense to me.  (i'm slow.)
>
> - erik
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:03     ` erik quanstrom
  2007-10-31 20:10       ` Latchesar Ionkov
@ 2007-10-31 20:12       ` Eric Van Hensbergen
  2007-10-31 20:17       ` Russ Cox
  2 siblings, 0 replies; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-10-31 20:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/31/07, erik quanstrom <quanstro@quanstro.net> wrote:
> > The dynamic nature of namespace works against such conventions.
> > Besides it would be nice to have a mechanism that could work in other
> > systems that use 9p.  File servers should be able to convey whether a
> > file is cache-able or not.
>
> i'm not sure i follow this argument.  plan 9 namespaces are dynamic.
> one could put the network devices anywhere, but they are conventionally
> put on /net.  there are no "regular" files in /net.
>
> perhaps if you gave a concrete example of why conventions can't sort
> this out it would make more sense to me.  (i'm slow.)
>

/net.alt (for one)

While there is value in having conventions for where certain
synthetics are bound (like /net), that doesn't mean that alternate
synthetics aren't located in arbitrary places.  Even if you use
conventional locations, these may be elsewhere when transitively
mounted /n/remote/net.
Then you have the fact that Inferno has additional synthetics (like
/cmd) that don't match Plan 9 conventions.  And people using p9p and
v9fs on Linux may use yet another set of conventions.

Hardcoding a set number of paths or having a set number of paths in a
configuration file feels wrong and its my opinion that going down that
path isn't the right solution.

        -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:03     ` erik quanstrom
  2007-10-31 20:10       ` Latchesar Ionkov
  2007-10-31 20:12       ` Eric Van Hensbergen
@ 2007-10-31 20:17       ` Russ Cox
  2007-10-31 20:29         ` Francisco J Ballesteros
  2 siblings, 1 reply; 77+ messages in thread
From: Russ Cox @ 2007-10-31 20:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Plan 9's default is not to cache,
making a "don't cache this" bit
unnecessary.  If the user explicitly
requests caching (by using cfs, say),
then he's responsible for making sure
it is appropriate.

If I tell the computer to cache /net,
that's not the computer's problem,
any more than if I bind /proc /net.

Since there's no coherence protocol
anyway, caching can't be done automatically.
It might give the right answer most of
the time, but it will screw up corner cases
and make the system more fragile.

This whole synthetic vs not mentality
is Unix brain-damange.  On Plan 9 there
is no distinction.  Everything is synthetic
(or everything is not, depending on your
point of view).

Russ


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:17       ` Russ Cox
@ 2007-10-31 20:29         ` Francisco J Ballesteros
  2007-10-31 20:48           ` Charles Forsyth
  0 siblings, 1 reply; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-10-31 20:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

It´s not "synthetic vs stored on disk".
It´s "behaves like a file vs does not".

For example, some files in omero may be large (images)
and it´s nice to cache them. But some files do not behave as files
(you open OTRUNC, rewrite, and might read something else from it later)

In the octopus we mount devices, yet we want to cache most
of their structure/data.

For a local area network, it´s ok to say "do not cache at all".
But IMHO, for a wide area network, or slow adsl lines, it´s not so ok.

Regarding coherency, you always have races, you have to reach
the server anyway and that takes time. But in many cases this is not
a problem. When it is, I agree that you have to use something else or
put something else within the file server involved.



On 10/31/07, Russ Cox <rsc@swtch.com> wrote:
> Plan 9's default is not to cache,
> making a "don't cache this" bit
> unnecessary.  If the user explicitly
> requests caching (by using cfs, say),
> then he's responsible for making sure
> it is appropriate.
>
> If I tell the computer to cache /net,
> that's not the computer's problem,
> any more than if I bind /proc /net.
>
> Since there's no coherence protocol
> anyway, caching can't be done automatically.
> It might give the right answer most of
> the time, but it will screw up corner cases
> and make the system more fragile.
>
> This whole synthetic vs not mentality
> is Unix brain-damange.  On Plan 9 there
> is no distinction.  Everything is synthetic
> (or everything is not, depending on your
> point of view).
>
> Russ
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 19:13   ` Charles Forsyth
  2007-10-31 19:33     ` Eric Van Hensbergen
@ 2007-10-31 20:43     ` geoff
  2007-10-31 21:32       ` Charles Forsyth
  2007-10-31 22:48       ` roger peppe
  1 sibling, 2 replies; 77+ messages in thread
From: geoff @ 2007-10-31 20:43 UTC (permalink / raw)
  To: 9fans

I remember something similar to what Eric remembers: qid.vers of zero
means `don't cache'.  It might not be written down; it may have just
been oral folklore at the labs.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:29         ` Francisco J Ballesteros
@ 2007-10-31 20:48           ` Charles Forsyth
  2007-10-31 21:23             ` Francisco J Ballesteros
  0 siblings, 1 reply; 77+ messages in thread
From: Charles Forsyth @ 2007-10-31 20:48 UTC (permalink / raw)
  To: 9fans

> It´s "behaves like a file vs does not".
> 
> ... But some files do not behave as files

that description seems to assume that there is a real, proper, honest-to-God
file (you know, that has the decency to be located on a proper disc somewhere)
that defines the "expected" behaviour.

a "file" in Plan 9 (or Inferno) is something you can name, open, and read and (perhaps) write.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:48           ` Charles Forsyth
@ 2007-10-31 21:23             ` Francisco J Ballesteros
  2007-10-31 21:40               ` Russ Cox
  0 siblings, 1 reply; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-10-31 21:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Not exactly.

But I assume that when you
echo a > f
cat f
would yield
a
and I expect a file with reported size "0" to have 0 bytes on it when read.

That´s (file) decency for me.

I know that most files do not have to behave that way, I´ve implemented some.
But the point it that it´s been more than once that I had to determine
if a file was
"decent" or not.

Isn´t this a real problem with a simple fix?
Why shouldn´t this be addressed.

I´d love to stand corrected, but I still think this is an actual problem.


On 10/31/07, Charles Forsyth <forsyth@terzarima.net> wrote:
> > It´s "behaves like a file vs does not".
> >
> > ... But some files do not behave as files
>
> that description seems to assume that there is a real, proper, honest-to-God
> file (you know, that has the decency to be located on a proper disc somewhere)
> that defines the "expected" behaviour.
>
> a "file" in Plan 9 (or Inferno) is something you can name, open, and read and (perhaps) write.
>
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:43     ` geoff
@ 2007-10-31 21:32       ` Charles Forsyth
  2007-10-31 22:48       ` roger peppe
  1 sibling, 0 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-10-31 21:32 UTC (permalink / raw)
  To: 9fans

> I remember something similar to what Eric remembers: qid.vers of zero
> means `don't cache'.  It might not be written down; it may have just
> been oral folklore at the labs.

it wasn't used by either cache.c or cfs, which seemed to be the main
candidates for cache management in practice.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 21:23             ` Francisco J Ballesteros
@ 2007-10-31 21:40               ` Russ Cox
  2007-10-31 22:11                 ` Charles Forsyth
                                   ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Russ Cox @ 2007-10-31 21:40 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> But I assume that when you
> echo a > f
> cat f
> would yield
> a
> and I expect a file with reported size "0" to have 0 bytes on it when read.
>
> That´s (file) decency for me.

But this isn't even true for disk files if someone
else or some other machine is writing to f around
the same time.

If I'm doing tail -f on a remote log file and tail -f
just does occasional reads at the end of the file,
then you will get the wrong answer, because once
the cache sees the eof it will never issue another
read.

It is a fundamental problem with implementing
caching atop a system that is not intended to be
cached.  Having a QTCTL bit (or a QTOKTOCACHE bit)
will not solve the problem.

Cfs is not magic.  It trades some of the reliability
of 9P for some performance.  It doesn't do a perfect
job.  If you choose to use cfs then you are accepting
those degradations in semantics, even for "disk files".

What you really need is a way to ask the server "can I
cache the following?" and have the server say yes or no
and then have some way to invalidate the cache, so that
you get coherent behavior, even in the above case.
We discussed various ways to add this to the protocol
but ultimately we didn't see any way that was simple
enough that the specification effort wasn't outweighed
by our not needing to solve the problem at that time.
(We did add QTAPPEND to fix one glaring cfs bug.)

By all means experiment with real caching protocols
using 9P.  Perhaps you will find a nice way to add it
and then 9P2010 can adopt it.  QTCTL isn't enough though:
it pushes your problems farther away but doesn't solve them.

Russ


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 21:40               ` Russ Cox
@ 2007-10-31 22:11                 ` Charles Forsyth
  2007-10-31 22:26                 ` Francisco J Ballesteros
  2007-11-01  6:21                 ` Bakul Shah
  2 siblings, 0 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-10-31 22:11 UTC (permalink / raw)
  To: 9fans

> and then 9P2010 can adopt it.  QTCTL isn't enough though:
> it pushes your problems farther away but doesn't solve them.

something similar was suggested some time ago (marking files as seekable or not)
so it's probably worthwhile rereading that discussion if you can find it.  i think someone even started
adding the marker to various synthetic files until ... (and it was all undone).


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 21:40               ` Russ Cox
  2007-10-31 22:11                 ` Charles Forsyth
@ 2007-10-31 22:26                 ` Francisco J Ballesteros
  2007-10-31 22:37                   ` Charles Forsyth
  2007-10-31 23:54                   ` erik quanstrom
  2007-11-01  6:21                 ` Bakul Shah
  2 siblings, 2 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-10-31 22:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>
> If I'm doing tail -f on a remote log file and tail -f
> just does occasional reads at the end of the file,
> then you will get the wrong answer, because once
> the cache sees the eof it will never issue another
> read.

If the file is "decent", the cache must still check out
that the file is up to date. It might not do so all the times
(as we do, to trade for performance, as you say). That means
that the cache would get further file data as soon as it sees
a new qid.vers for the file. And tail -f would still work.

However, for some indecent files ;), the cache may have problems
even if it trusts the file length as reported by the server or the qid.vers.

QTAPPEND is indeed something that says that file is weird, QTCL would
just signal the general case, not just a +a file.

I can do a quick experiment using Op just to see if by faking up some QTCTLs
in the Op server, the client may work with all files, even clone ones.
And see what
happens. I´m not seeking for coherency, I´d just like to be able to
cache what I can,
(keeping races as they are), to better tolerate latency.

thanks a lot for all the comments, btw.

> It is a fundamental problem with implementing
> caching atop a system that is not intended to be
> cached.  Having a QTCTL bit (or a QTOKTOCACHE bit)
> will not solve the problem.
>
> Cfs is not magic.  It trades some of the reliability
> of 9P for some performance.  It doesn't do a perfect
> job.  If you choose to use cfs then you are accepting
> those degradations in semantics, even for "disk files".
>
> What you really need is a way to ask the server "can I
> cache the following?" and have the server say yes or no
> and then have some way to invalidate the cache, so that
> you get coherent behavior, even in the above case.
> We discussed various ways to add this to the protocol
> but ultimately we didn't see any way that was simple
> enough that the specification effort wasn't outweighed
> by our not needing to solve the problem at that time.
> (We did add QTAPPEND to fix one glaring cfs bug.)
>
> By all means experiment with real caching protocols
> using 9P.  Perhaps you will find a nice way to add it
> and then 9P2010 can adopt it.  QTCTL isn't enough though:
> it pushes your problems farther away but doesn't solve them.
>
> Russ
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 22:26                 ` Francisco J Ballesteros
@ 2007-10-31 22:37                   ` Charles Forsyth
  2007-10-31 22:43                     ` Francisco J Ballesteros
  2007-10-31 23:32                     ` Eric Van Hensbergen
  2007-10-31 23:54                   ` erik quanstrom
  1 sibling, 2 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-10-31 22:37 UTC (permalink / raw)
  To: 9fans

> QTCL would ...

perhaps one of my points is that in the Plan 9/Inferno world, you'd
be better off marking the files you can cache, not trying to identify the ones
that you can't.  one reason is the practical one of having to touch perhaps two or
maybe three important servers instead of everything.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 22:37                   ` Charles Forsyth
@ 2007-10-31 22:43                     ` Francisco J Ballesteros
  2007-10-31 23:32                     ` Eric Van Hensbergen
  1 sibling, 0 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-10-31 22:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I was missing your point.
It seems much easier that way.
I shall try with QTDECENT instead of the other way around.


On 10/31/07, Charles Forsyth <forsyth@terzarima.net> wrote:
> > QTCL would ...
>
> perhaps one of my points is that in the Plan 9/Inferno world, you'd
> be better off marking the files you can cache, not trying to identify the ones
> that you can't.  one reason is the practical one of having to touch perhaps two or
> maybe three important servers instead of everything.
>
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 20:43     ` geoff
  2007-10-31 21:32       ` Charles Forsyth
@ 2007-10-31 22:48       ` roger peppe
  2007-10-31 23:35         ` erik quanstrom
  1 sibling, 1 reply; 77+ messages in thread
From: roger peppe @ 2007-10-31 22:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> remember something similar to what Eric remembers: qid.vers of zero
> means `don't cache'.  It might not be written down; it may have just
> been oral folklore at the labs.

when the 9p2000 man pages were initially posted to this list for discussion,
i made a suggestion to that effect, and i seem to recall rob
saying "not a bad idea, but we haven't done it yet".

it's still not done (or documented, at any rate), but i'm still not
sure it's a bad idea.

> this is not true of devsd devices.  the version is used to deal with
> removable devices.  if you have a cd open and the media is changed,
> i/o to the device will return Echange.

using qid.version to indicate the status of the underlying device
rather than of the file data seems to me like an abuse of the system.
surely a status file would be a better way of indicating media change?

> Why don´t add a QTCTL bit to Qid.type?

if we ignore concurrent usage  (which is, after all, a rare case),
one big issue is idempotency - can i read twice at
the same offset and get the same result; can i write two blocks
at different offsets out of order (a la fcp) and end up with the same file?

there are many possible ways that the read and write operations
can be used in the construction of interesting file systems, but
perhaps the semantics of "regular" files are sufficiently common and
useful that it would be
worth knowing whether a given file adheres to them.

those semantics being something like:
- read or write twice at the same offset will yield the same result
(modulo concurrent writers)
- read or write of several sequential items of data is the same
as one read or write of all the data.
- write, followed by a read at the same offset yields the same data
(modulo concurrent writers again)

so i guess i'd argue for a QTREGULAR (QTDATA?) bit rather than a QTCTL bit.
that way, we could start off by adding that bit to those files that
definitely observed the given semantics, and avoid arguing about
which files were or were not "control" files. (and qid.version==0 could
still be useful as a "treat this file as if it always had a new version number
signifier). (and QTAPPEND is still useful to signify a modification, but
not a complete discarding of the given semantics).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 22:37                   ` Charles Forsyth
  2007-10-31 22:43                     ` Francisco J Ballesteros
@ 2007-10-31 23:32                     ` Eric Van Hensbergen
  2007-10-31 23:41                       ` [V9fs-developer] " Charles Forsyth
       [not found]                       ` <606b6f003ae9f0ed3e8c3c5f90ddc720@terzarima.net>
  1 sibling, 2 replies; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-10-31 23:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: V9FS Developers

On 10/31/07, Charles Forsyth <forsyth@terzarima.net> wrote:
> > QTCL would ...
>
> perhaps one of my points is that in the Plan 9/Inferno world, you'd
> be better off marking the files you can cache, not trying to identify the ones
> that you can't.  one reason is the practical one of having to touch perhaps two or
> maybe three important servers instead of everything.
>

That makes a lot of sense -- particularly in a Plan 9 context.
It should be straightforward enough to modify appropriate "static"
content file servers to set this bit.

The current range of non-Plan9 servers (u9fs,spfs,etc.) present a bit
of difficulty in that there isn't a good way to transitively detect
this sort of thing when say a p9p server is mounted on Linux and then
re-exported, mapping it through the Linux VFS space.  But then I
suppose that's just a matter of not using the aforementioned tools to
export "special" files or file systems.  I've got some ideas to fix
this that I want to play with using Lucho's new
in-Linux-kernel-9p-server.  Of course, all of this is probably a
corner case that most people don't see anyways.

In the context of FastOS, dealing with thousands of nodes, we are
going to need to implement more sophisticated forms of caching.  Since
this is going to be pretty critical to even booting at larger scale,
its likely we'll be digging into this early next year.  We'll keep
folks in the loop as we get prototypes working.

             -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 22:48       ` roger peppe
@ 2007-10-31 23:35         ` erik quanstrom
  2007-11-01  9:29           ` roger peppe
  0 siblings, 1 reply; 77+ messages in thread
From: erik quanstrom @ 2007-10-31 23:35 UTC (permalink / raw)
  To: 9fans

> > remember something similar to what Eric remembers: qid.vers of zero
> > means `don't cache'.  It might not be written down; it may have just
> > been oral folklore at the labs.
> 
> when the 9p2000 man pages were initially posted to this list for discussion,
> i made a suggestion to that effect, and i seem to recall rob
> saying "not a bad idea, but we haven't done it yet".

here you argue that using the qid.version to infer something about the
file is a good idea

> 
> >> this is not true of devsd devices.  the version is used to deal with
>> removable devices.  if you have a cd open and the media is changed,
>> i/o to the device will return Echange.
> 
> using qid.version to indicate the status of the underlying device
> rather than of the file data seems to me like an abuse of the system.
> surely a status file would be a better way of indicating media change?

yet here you argue that using the qid.version to indicate that the medium
underlying sdXX/data has changed is an "abuse".

to be a bit picky, the qid.version doesn't indicate the status of the device,
it indicates how many times the media have changed.

it doesn't make sense for a process to blithly continue writing to the
new medium without getting an error.  

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [V9fs-developer] [9fans] QTCTL?
  2007-10-31 23:32                     ` Eric Van Hensbergen
@ 2007-10-31 23:41                       ` Charles Forsyth
       [not found]                       ` <606b6f003ae9f0ed3e8c3c5f90ddc720@terzarima.net>
  1 sibling, 0 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-10-31 23:41 UTC (permalink / raw)
  To: ericvh, 9fans; +Cc: v9fs-developer

> In the context of FastOS, dealing with thousands of nodes, we are
> going to need to implement more sophisticated forms of caching.  Since
> this is going to be pretty critical to even booting at larger scale,
> its likely we'll be digging into this early next year.  We'll keep
> folks in the loop as we get prototypes working.

i didn't think that would lead to requiring new bits as markers anywhere, though
in the protocol as such


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 22:26                 ` Francisco J Ballesteros
  2007-10-31 22:37                   ` Charles Forsyth
@ 2007-10-31 23:54                   ` erik quanstrom
  2007-11-01  0:03                     ` Charles Forsyth
  2007-11-01  1:25                     ` Eric Van Hensbergen
  1 sibling, 2 replies; 77+ messages in thread
From: erik quanstrom @ 2007-10-31 23:54 UTC (permalink / raw)
  To: 9fans

> If the file is "decent", the cache must still check out
> that the file is up to date. It might not do so all the times
> (as we do, to trade for performance, as you say). That means
> that the cache would get further file data as soon as it sees
> a new qid.vers for the file. And tail -f would still work.

the problem is that no files are "decent" as long as concurrent
access is allowed.  "control" files have the decency at least
to behave the same way all the time.

if one goes down the road of client-side caching, i think
concurrency issues need to be taken seriously.  otherwise
it's like having a multiprogramming kernel without locks.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 23:54                   ` erik quanstrom
@ 2007-11-01  0:03                     ` Charles Forsyth
  2007-11-01  1:25                     ` Eric Van Hensbergen
  1 sibling, 0 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01  0:03 UTC (permalink / raw)
  To: 9fans

> if one goes down the road of client-side caching, i think
> concurrency issues need to be taken seriously.  otherwise
> it's like having a multiprogramming kernel without locks.

it's probably somewhat worse. in the latter case the system as a whole
will probably fail quickly, badly.  with the former, it's possible the system will survive
and even that an application won't crash outright, but you'll see odd effects.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [V9fs-developer] [9fans] QTCTL?
       [not found]                       ` <606b6f003ae9f0ed3e8c3c5f90ddc720@terzarima.net>
@ 2007-11-01  1:13                         ` Eric Van Hensbergen
  0 siblings, 0 replies; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-11-01  1:13 UTC (permalink / raw)
  To: Charles Forsyth; +Cc: v9fs-developer, 9fans

On 10/31/07, Charles Forsyth <forsyth@terzarima.net> wrote:
> > In the context of FastOS, dealing with thousands of nodes, we are
> > going to need to implement more sophisticated forms of caching.  Since
> > this is going to be pretty critical to even booting at larger scale,
> > its likely we'll be digging into this early next year.  We'll keep
> > folks in the loop as we get prototypes working.
>
> i didn't think that would lead to requiring new bits as markers anywhere, though
> in the protocol as such
>

True - it was more of a general statement indicating we'd be looking
at the issues and probably experimenting with things like leases and
invalidations, etc.  I didn't really go into detail because we haven't
really discussed which paths we are going to explore.

            -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 23:54                   ` erik quanstrom
  2007-11-01  0:03                     ` Charles Forsyth
@ 2007-11-01  1:25                     ` Eric Van Hensbergen
  2007-11-01  1:44                       ` erik quanstrom
  2007-11-01  7:34                       ` Skip Tavakkolian
  1 sibling, 2 replies; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-11-01  1:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/31/07, erik quanstrom <quanstro@quanstro.net> wrote:
> > If the file is "decent", the cache must still check out
> > that the file is up to date. It might not do so all the times
> > (as we do, to trade for performance, as you say). That means
> > that the cache would get further file data as soon as it sees
> > a new qid.vers for the file. And tail -f would still work.
>
> the problem is that no files are "decent" as long as concurrent
> access is allowed.  "control" files have the decency at least
> to behave the same way all the time.
>

Sure - however, there is a case for loose caches as well. For example,
lots of remote file data is essentially read-only, or at the very
worst its updated very infrequently.  Brucee had
sessionfs, which although more specialized (I'm going to oversimplify
here Brzr, so don't shoot me), could essentially be thought of as
serving a snapshot of the system.  You could cache to your hearts
content because you'd always be reading from the same snapshot.  If
you ever wanted to roll the snapshot forward, you could blow away the
cache -- or for optimum safety, restart the entire node.  With such a
mechanism you could even keep the cache around on disk for long
periods of time (as long as the session was still exported by the file
server).

        -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01  1:25                     ` Eric Van Hensbergen
@ 2007-11-01  1:44                       ` erik quanstrom
  2007-11-01  2:15                         ` Eric Van Hensbergen
  2007-11-01  7:34                       ` Skip Tavakkolian
  1 sibling, 1 reply; 77+ messages in thread
From: erik quanstrom @ 2007-11-01  1:44 UTC (permalink / raw)
  To: 9fans

> Sure - however, there is a case for loose caches as well. For example,
> lots of remote file data is essentially read-only, or at the very
> worst its updated very infrequently.  Brucee had

i might be speaking out of school.  but i worry about the qualifiers
"essentially" and "very infrequently".  they tend not to scale.

what about drawing a sharp line?  these mounts are static and
cachable.  these are not and need coherency.  perhaps the
data that needs cache coherency doesn't need full file sematics.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01  1:44                       ` erik quanstrom
@ 2007-11-01  2:15                         ` Eric Van Hensbergen
  0 siblings, 0 replies; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-11-01  2:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On Wed, 31 Oct 2007 8:44 pm, erik quanstrom wrote:
>>  Sure - however, there is a case for loose caches as well. For example,
>>  lots of remote file data is essentially read-only, or at the very
>>  worst its updated very infrequently.  Brucee had
>
> i might be speaking out of school.  but i worry about the qualifiers
> "essentially" and "very infrequently".  they tend not to scale.
>
> what about drawing a sharp line?  these mounts are static and
> cachable.  these are not and need coherency.

Yes - sessionfs satisfied the first case, items falling into the second 
class were served from a normal 9p server (w/no cache).

>
>  perhaps the
> data that needs cache coherency doesn't need full file sematics.
>

I think they are two separate issues.

    -eric



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 21:40               ` Russ Cox
  2007-10-31 22:11                 ` Charles Forsyth
  2007-10-31 22:26                 ` Francisco J Ballesteros
@ 2007-11-01  6:21                 ` Bakul Shah
  2007-11-01 14:28                   ` Russ Cox
  2 siblings, 1 reply; 77+ messages in thread
From: Bakul Shah @ 2007-11-01  6:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> What you really need is a way to ask the server "can I
> cache the following?" and have the server say yes or no
> and then have some way to invalidate the cache, so that
> you get coherent behavior, even in the above case.
> We discussed various ways to add this to the protocol
> but ultimately we didn't see any way that was simple
> enough that the specification effort wasn't outweighed
> by our not needing to solve the problem at that time.

Do you recall what the issues were?

Wouldn't something like load-linked/store-conditional suffice
if the common case is a single writer?  When the client does
a "read-linked" call, the server sends an ID along with the
data. The client can then do a "write-conditional" by passing
the original ID and new data.  If the ID is not valid anymore
(if someone else wrote in the meantime) the write fails.  The
server doesn't have to keep any client state or inform anyone
about an invalid cache.  Of course, if any client fails to
follow this protocol things fall apart but at least well
behaved clients can get coherency. And this would work for
cases such as making changes on a disconnected laptop and
resyncing to the main server on the next connect.  You
wouldn't use this for synthetic files.  This ID can be as
simple as a file "generation" number incremented on each
write or crypto strong checksum.

As someone (Terje Mathiesen?) said all programming is an
exercise in caching.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01  1:25                     ` Eric Van Hensbergen
  2007-11-01  1:44                       ` erik quanstrom
@ 2007-11-01  7:34                       ` Skip Tavakkolian
  1 sibling, 0 replies; 77+ messages in thread
From: Skip Tavakkolian @ 2007-11-01  7:34 UTC (permalink / raw)
  To: 9fans

a side note, intro(5) says
	"The version is
          a version number for a file; typically, it is incremented
          every time the file is modified."

wouldn't that mean that for devices, the version should change
everytime you read them?

it's interesting that httpd (actually sendfd) already uses qid.path
and qid.vers to generate the entity tag (ETag) header, and pays
attention to if-none-match header.  perhaps Op's Tget can have a
similar parameter using path/version.

i keep wanting to go back to our proposal regarding Text/Rext
extension messages.  for caching, instead of a Tread a Text("read
if-modified") request is sent.  an advantage of Op is that Tget, for
example, combines walk/open/read/clunk into one request to optimize
for high latency networks.  Text("get /a/b/foo") could do the same.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-10-31 23:35         ` erik quanstrom
@ 2007-11-01  9:29           ` roger peppe
  2007-11-01 11:03             ` Eric Van Hensbergen
  2007-11-01 12:11             ` erik quanstrom
  0 siblings, 2 replies; 77+ messages in thread
From: roger peppe @ 2007-11-01  9:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/31/07, erik quanstrom <quanstro@quanstro.net> wrote:
> > using qid.version to indicate the status of the underlying device
> > rather than of the file data seems to me like an abuse of the system.
> > surely a status file would be a better way of indicating media change?
>
> yet here you argue that using the qid.version to indicate that the medium
> underlying sdXX/data has changed is an "abuse".
>
> to be a bit picky, the qid.version doesn't indicate the status of the device,
> it indicates how many times the media have changed.
>
> it doesn't make sense for a process to blithly continue writing to the
> new medium without getting an error.

i agree with that, and for read-only devices, using the version
on the data file to indicate media change seems fine. but for
writable devices, surely the version number should increment
once every time the device has been written? for writable devices,
if there was a status file giving some information about the media,
perhaps the version number on that would be a better place to
record media changes.

regarding QTDECENT (or whatever it might be called), i recently
came up against this. i've implemented most of a filesystem to allow
latency lowering - kind of similar to fcp, but allows naive clients to
gain the benefits, and does it for streaming files too. it uses a filesystem
at both the server and client sides - the client sends several requests
at once, and the server gathers them into a coherent order.
the problem being that the server needs to make a decision as to
what kind of semantics a given file supports - whether it has to
preserve record boundaries as it reads, for example, or what to
do if the reader does a seek (for a "decent" file, it can just discard
the read-ahead data - for others, it should probably yield an error).
i don't really want to teach this filesystem about which files are
"conventionally"
normal - and it would be nice to just run one instance for an entire exported
fs (accessed through another name, as brian stuart's example).


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01  9:29           ` roger peppe
@ 2007-11-01 11:03             ` Eric Van Hensbergen
  2007-11-01 11:19               ` Charles Forsyth
  2007-11-01 12:11             ` erik quanstrom
  1 sibling, 1 reply; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-11-01 11:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 11/1/07, roger peppe <rogpeppe@gmail.com> wrote:
> i don't really want to teach this filesystem about which files are
> "conventionally"
> normal - and it would be nice to just run one instance for an entire exported
> fs (accessed through another name, as brian stuart's example).
>

Yes - I think transitive mounts, the desire to be able to mount
pre-composed file systems, and even mixed mode synthetics (which have
both ctl files and cacheable data) all lean toward having a way of
identifying what should be  cache-able.  For instance,  my Libra
libraryOS environment currently only has a single channel with which
to mount all resources -- so it mounts the file system at the same
time as console and network files.  I imagine if we ran 9p directly on
top of one of the raw interconnect channels on Blue Gene we might be
in a similar situation (although we aren't currently looking at that).

      -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 11:03             ` Eric Van Hensbergen
@ 2007-11-01 11:19               ` Charles Forsyth
  0 siblings, 0 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01 11:19 UTC (permalink / raw)
  To: 9fans

i'd often mount each distinct underlying thing separately, not just
for cache control but to select the right type of connection for each,
or to get better bandwidth than could be achieved with just one connection.
if a service is aggregating some others, and the connection classes
are otherwise the same, then i'd have that service talk to the client
side cache services to provide them with its own cache constraints.
(and so on)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01  9:29           ` roger peppe
  2007-11-01 11:03             ` Eric Van Hensbergen
@ 2007-11-01 12:11             ` erik quanstrom
  1 sibling, 0 replies; 77+ messages in thread
From: erik quanstrom @ 2007-11-01 12:11 UTC (permalink / raw)
  To: 9fans

>> to be a bit picky, the qid.version doesn't indicate the status of the device,
>> it indicates how many times the media have changed.
>>
>> it doesn't make sense for a process to blithly continue writing to the
>> new medium without getting an error.
> 
> i agree with that, and for read-only devices, using the version
> on the data file to indicate media change seems fine. but for
> writable devices, surely the version number should increment
> once every time the device has been written? for writable devices,
> if there was a status file giving some information about the media,
> perhaps the version number on that would be a better place to
> record media changes.

i see the symmetry of what you're saying.  what would be the utility
of maintaining the version this way?  the version, as you describe it,
wouldn't survive reboot and, for network-attached storage, wouldn't be
coherent across machines.  i'm not sure that devices are either
read-only or read-write.  that might depend on the underlying
(hot-pluggable) medium.  and the device driver might not care.

i think it makes sense to use the medium changes (not connections, if
possible) to determine the version.  the marvell and aoe driver
consider a device changed if the serial# or number of sectors
change.that is something most io clients are interested in.  how many
times do you want to want to write a random subset of blocks to a
different device?

it does not seem too much of a stretch.  the stat.size field isn't the
"file size" of a stream (whatever that means).  so i think this is
well within the tradition.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01  6:21                 ` Bakul Shah
@ 2007-11-01 14:28                   ` Russ Cox
  2007-11-01 14:38                     ` erik quanstrom
                                       ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Russ Cox @ 2007-11-01 14:28 UTC (permalink / raw)
  To: 9fans

> Do you recall what the issues were?

The main proposal I remember was to give out read and write
tokens and then revoke them as appropriate.  The two big problems
with this were that it meant having spontaneous server->client->server
message exchange instead of the usual client->server->client
(a very large semantic break with existing 9P) and that once you
had granted a read token to a client the client could simply stop
responding (acknowledging the revocation) and slow down the
rest of the system.

I think that an ideal caching solution would have the following
properties.

1. For 9P servers that don't care to let their files be cached,
the effort in doing so should be minimal.  For example, perhaps
servers would simply respond to a Tcache message with Rerror,
like non-authenticating servers respond to Tauth with Rerror.
Anything more is not going to get implemented.

2. All 9P messages would still be client->server->client.
This fits with #1, but also excludes solutions that introduce
new server->client->server messages after a successful Tcache.

3. If a client that is caching some data stops responding, the rest
of the system can continue to function without it: slow clients
don't slow the entire system.

4. Except for timing, the cached behavior is identical to what 
you'd see in an uncached setting, not some relaxed semantics.
For example, suppose you adopted a model where each server
response could have some cache invalidations piggybacked on it.
This would provide a weak but precise consistency model in that
any cached behaviors observed interacting with that server
would be the same as shifting uncached behavior back in time a bit.
It could be made to appear that the machine was just a few seconds
behind the server, but otherwise fully consistent.  The problem with
this is when multiple machines are involved, and since Plan 9 is
a networked system, this happens.  For example, a common setup
is for one machine to spool mail into /mail/queue and then run
rx to another machine to kick the queue processor (the mail sender).
If the other machine is behaving like it's 5 seconds behind, then it
won't see the mail that just got spooled, making the rx kick worthless.

5. It is easy to get right.

#1 is trivial.  #2 and #3 are difficult and point to some kind of 
lease-based solution instead of tokens.  

#4 keeps us honest: weakened consistency like in my example
or in cfs(4) or in recover(4) might occasionally be useful, but it
will break important and subtle real-world cases and make the
system a lot more fragile.  If you pile up enough things that only
work 99% of the time, you very quickly end up with a crappy system.
(If that's what you want, might I suggest Linux?)

#5 is probably wishful thinking on my part.

> Wouldn't something like load-linked/store-conditional suffice
> if the common case is a single writer?  When the client does
> a "read-linked" call, the server sends an ID along with the
> data. The client can then do a "write-conditional" by passing
> the original ID and new data.  If the ID is not valid anymore
> (if someone else wrote in the meantime) the write fails.  The
> server doesn't have to keep any client state or inform anyone
> about an invalid cache.  Of course, if any client fails to
> follow this protocol things fall apart but at least well
> behaved clients can get coherency. And this would work for
> cases such as making changes on a disconnected laptop and
> resyncing to the main server on the next connect.  You
> wouldn't use this for synthetic files.  This ID can be as
> simple as a file "generation" number incremented on each
> write or crypto strong checksum.

This doesn't solve the problem of one client caching the file
contents and another writing to the file; how does the first
find out that the file has changed before it uses the cached 
contents again?

Russ


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 14:28                   ` Russ Cox
@ 2007-11-01 14:38                     ` erik quanstrom
  2007-11-01 14:41                     ` Charles Forsyth
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: erik quanstrom @ 2007-11-01 14:38 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 369 bytes --]

i haven't thought this through, but perhaps this would
be an easier problem if we didn't change 9p, but we changed
the model of the caching server.

the current proposals assume that the caching servers don't
know that other caches are out there.  alternatively,
the caching servers participate in a coherency protocol with
9p clients none-the-wiser.

- erik

[-- Attachment #2: Type: message/rfc822, Size: 6133 bytes --]

From: "Russ Cox" <rsc@swtch.com>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] QTCTL?
Date: Thu, 1 Nov 2007 10:28:48 -0400
Message-ID: <20071101142849.BA1A61E8C4C@holo.morphisms.net>

> Do you recall what the issues were?

The main proposal I remember was to give out read and write
tokens and then revoke them as appropriate.  The two big problems
with this were that it meant having spontaneous server->client->server
message exchange instead of the usual client->server->client
(a very large semantic break with existing 9P) and that once you
had granted a read token to a client the client could simply stop
responding (acknowledging the revocation) and slow down the
rest of the system.

I think that an ideal caching solution would have the following
properties.

1. For 9P servers that don't care to let their files be cached,
the effort in doing so should be minimal.  For example, perhaps
servers would simply respond to a Tcache message with Rerror,
like non-authenticating servers respond to Tauth with Rerror.
Anything more is not going to get implemented.

2. All 9P messages would still be client->server->client.
This fits with #1, but also excludes solutions that introduce
new server->client->server messages after a successful Tcache.

3. If a client that is caching some data stops responding, the rest
of the system can continue to function without it: slow clients
don't slow the entire system.

4. Except for timing, the cached behavior is identical to what 
you'd see in an uncached setting, not some relaxed semantics.
For example, suppose you adopted a model where each server
response could have some cache invalidations piggybacked on it.
This would provide a weak but precise consistency model in that
any cached behaviors observed interacting with that server
would be the same as shifting uncached behavior back in time a bit.
It could be made to appear that the machine was just a few seconds
behind the server, but otherwise fully consistent.  The problem with
this is when multiple machines are involved, and since Plan 9 is
a networked system, this happens.  For example, a common setup
is for one machine to spool mail into /mail/queue and then run
rx to another machine to kick the queue processor (the mail sender).
If the other machine is behaving like it's 5 seconds behind, then it
won't see the mail that just got spooled, making the rx kick worthless.

5. It is easy to get right.

#1 is trivial.  #2 and #3 are difficult and point to some kind of 
lease-based solution instead of tokens.  

#4 keeps us honest: weakened consistency like in my example
or in cfs(4) or in recover(4) might occasionally be useful, but it
will break important and subtle real-world cases and make the
system a lot more fragile.  If you pile up enough things that only
work 99% of the time, you very quickly end up with a crappy system.
(If that's what you want, might I suggest Linux?)

#5 is probably wishful thinking on my part.

> Wouldn't something like load-linked/store-conditional suffice
> if the common case is a single writer?  When the client does
> a "read-linked" call, the server sends an ID along with the
> data. The client can then do a "write-conditional" by passing
> the original ID and new data.  If the ID is not valid anymore
> (if someone else wrote in the meantime) the write fails.  The
> server doesn't have to keep any client state or inform anyone
> about an invalid cache.  Of course, if any client fails to
> follow this protocol things fall apart but at least well
> behaved clients can get coherency. And this would work for
> cases such as making changes on a disconnected laptop and
> resyncing to the main server on the next connect.  You
> wouldn't use this for synthetic files.  This ID can be as
> simple as a file "generation" number incremented on each
> write or crypto strong checksum.

This doesn't solve the problem of one client caching the file
contents and another writing to the file; how does the first
find out that the file has changed before it uses the cached 
contents again?

Russ

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 14:28                   ` Russ Cox
  2007-11-01 14:38                     ` erik quanstrom
@ 2007-11-01 14:41                     ` Charles Forsyth
  2007-11-01 15:26                     ` Sape Mullender
  2007-11-01 16:59                     ` Bakul Shah
  3 siblings, 0 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01 14:41 UTC (permalink / raw)
  To: 9fans

>with this were that it meant having spontaneous server->client->server
>message exchange instead of the usual client->server->client

> 2. All 9P messages would still be client->server->client.
> This fits with #1, but also excludes solutions that introduce
> new server->client->server messages after a successful Tcache.

for similar things, i typically have a file on which the client reads
messages from the service.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 14:28                   ` Russ Cox
  2007-11-01 14:38                     ` erik quanstrom
  2007-11-01 14:41                     ` Charles Forsyth
@ 2007-11-01 15:26                     ` Sape Mullender
  2007-11-01 15:51                       ` Latchesar Ionkov
  2007-11-01 16:59                     ` Bakul Shah
  3 siblings, 1 reply; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 15:26 UTC (permalink / raw)
  To: 9fans

> 2. All 9P messages would still be client->server->client.
> This fits with #1, but also excludes solutions that introduce
> new server->client->server messages after a successful Tcache.

And there is the quandrary.  Allowing server->client->server messages
(aka callbacks) complicate 9P beyond anything acceptable.  On the other
hand, these call-backs make the following possible:

1. Client obtains a lease to a file (say valid for exclusive access in the
   next five minutes)

2. Server needs the file for read or exclusive access by another client
   after one minute and wants the lease returned early.  It initiates
   a callback.

3. Clients flushes all data back to the server (i.e., it performs a series
   of writes

4. Clients responds to the callback

5. Server give s alease to another client.

This is the sequence of actions that maintains consistency for all parties
obeying the protocol.  Not obeying (e.g. ignoring callbacks, not writing back
dirty data) will slow the system down (server must wait for lease to expire)
or will just harm the client not obeying.  Leases are a really good idea, but
for the complexity of callbacks.

If we can have this functionality without callback, that would be really nice.

One could have only client-server-client calls like this:

Tcache	asks whether the server is prepared to cache
Rcache	makes lease available with parameters, Rerror says no.

Tlease	says, ok start my lease now (almost immediately follows Rache)
Rlease	lease expired or lease needs to be given back early

Tcache	done with old lease (may immediately ask for a new lease)
etc.

So Tcache serves two purposes: it gives up an old lease if one existed
and immediately asks for a new one if one is needed.

This might give all the functionality we need without using callbacks.
(Of course, the client still needs a proc waiting for that Rlease while
doing its reads and writes).

	Sape
	


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 15:26                     ` Sape Mullender
@ 2007-11-01 15:51                       ` Latchesar Ionkov
  2007-11-01 16:04                         ` ron minnich
                                           ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Latchesar Ionkov @ 2007-11-01 15:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On Nov 1, 2007, at 9:26 AM, Sape Mullender wrote:
> One could have only client-server-client calls like this:
>
> Tcache	asks whether the server is prepared to cache
> Rcache	makes lease available with parameters, Rerror says no.
>
> Tlease	says, ok start my lease now (almost immediately follows Rache)
> Rlease	lease expired or lease needs to be given back early
>
> Tcache	done with old lease (may immediately ask for a new lease)
> etc.
>
> So Tcache serves two purposes: it gives up an old lease if one existed
> and immediately asks for a new one if one is needed.
>
> This might give all the functionality we need without using callbacks.
> (Of course, the client still needs a proc waiting for that Rlease  
> while
> doing its reads and writes).

In the case of read cache (which is probably going to be used more  
often than write-cache), the client needs to send two RPC every time  
a writer modifies the cached file. What if Rlease doesn't necessary  
break the lease, but have an option (negotiated in Tcache) to let the  
client know that the file is changed without breaking the lease.

Thanks,
	Lucho
  


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 15:51                       ` Latchesar Ionkov
@ 2007-11-01 16:04                         ` ron minnich
  2007-11-01 16:16                           ` Latchesar Ionkov
                                             ` (3 more replies)
  2007-11-01 16:17                         ` Sape Mullender
  2007-11-01 16:58                         ` Sape Mullender
  2 siblings, 4 replies; 77+ messages in thread
From: ron minnich @ 2007-11-01 16:04 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Why not just have a file that a client reads that lets the client know
of changes to files.

client opens this server-provided file ("changes"? "dnotify"?)

Server agrees to send client info about all FIDS which client has
active that are changing. Form of the message?
fid[4]offset[8]len[4]

It's up to the client to figure out what to do.

if the client doesn't care, no extra server overhead.

no new T*, no callbacks (which i can tell you are horrible when you
get to bigger machines -- having an 'ls' take 30 minutes is no fun).
No leases.

The fact is we have loose consistency now, we just don't call it that.
Anytime you are running a file from a server, you have loose
consistency. It works ok in most cases.

ron


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:04                         ` ron minnich
@ 2007-11-01 16:16                           ` Latchesar Ionkov
  2007-11-01 16:21                           ` Sape Mullender
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Latchesar Ionkov @ 2007-11-01 16:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Leases are good for purposes other than caching, for example locking.  
I don't see much difference if the protocol is going to define a  
special filename or a new message. There are other small details that  
need to be solved -- the server and the client need to be extra  
careful that no events fall between the cracks (i.e. between a Rread  
and the subsequent Tread on the special file).

On Nov 1, 2007, at 10:04 AM, ron minnich wrote:

> Why not just have a file that a client reads that lets the client know
> of changes to files.
>
> client opens this server-provided file ("changes"? "dnotify"?)
>
> Server agrees to send client info about all FIDS which client has
> active that are changing. Form of the message?
> fid[4]offset[8]len[4]
>
> It's up to the client to figure out what to do.
>
> if the client doesn't care, no extra server overhead.
>
> no new T*, no callbacks (which i can tell you are horrible when you
> get to bigger machines -- having an 'ls' take 30 minutes is no fun).
> No leases.
>
> The fact is we have loose consistency now, we just don't call it that.
> Anytime you are running a file from a server, you have loose
> consistency. It works ok in most cases.
>
> ron


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 15:51                       ` Latchesar Ionkov
  2007-11-01 16:04                         ` ron minnich
@ 2007-11-01 16:17                         ` Sape Mullender
  2007-11-01 16:27                           ` Sape Mullender
  2007-11-01 16:58                         ` Sape Mullender
  2 siblings, 1 reply; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 16:17 UTC (permalink / raw)
  To: 9fans

> In the case of read cache (which is probably going to be used more  
> often than write-cache), the client needs to send two RPC every time  
> a writer modifies the cached file. What if Rlease doesn't necessary  
> break the lease, but have an option (negotiated in Tcache) to let the  
> client know that the file is changed without breaking the lease.

That breaks single-copy semantics:  A client may have acted on data after
it had been changed by somebody else.  Say, A and B are sharing the file.
B has a read-lease on the file, A obtains a write lease, modifies the file
and sends a message to A to read what was changed.  A reads the file (which
is still in the cache and has not been updated).  Meanwhile, just after B
obtained the write lease, the server notifies A that the lease is expiring
early, but this message travels slowly and doesn't arrive until the whole
exchange is over.

	Sape


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:04                         ` ron minnich
  2007-11-01 16:16                           ` Latchesar Ionkov
@ 2007-11-01 16:21                           ` Sape Mullender
  2007-11-01 16:58                             ` Francisco J Ballesteros
  2007-11-01 17:03                           ` Russ Cox
  2007-11-01 17:14                           ` Bakul Shah
  3 siblings, 1 reply; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 16:21 UTC (permalink / raw)
  To: 9fans

> Why not just have a file that a client reads that lets the client know
> of changes to files.

A bit better, but the comment I just made about breaking single-copy
semantics still hold.  The point is that merely notifying the client
isn't enough.  The server should wait for an acknowledgement to that
notification (which possibly doesn't arrive until after the client has
flushed its updates from the cache).

	Sape


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:17                         ` Sape Mullender
@ 2007-11-01 16:27                           ` Sape Mullender
  0 siblings, 0 replies; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 16:27 UTC (permalink / raw)
  To: 9fans

I Type too fast — got A and B mixed up.  Below is the fix:

> That breaks single-copy semantics:  A client may have acted on data after
> it had been changed by somebody else.  Say, A and B are sharing the file.
> B has a read-lease on the file, A obtains a write lease, modifies the file
> and sends a message to B to read what was changed.  B reads the file (which
> is still in the cache and has not been updated).  Meanwhile, just after A
> obtained the write lease, the server notifies B that the lease is expiring
> early, but this message travels slowly and doesn't arrive until the whole
> exchange is over.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 15:51                       ` Latchesar Ionkov
  2007-11-01 16:04                         ` ron minnich
  2007-11-01 16:17                         ` Sape Mullender
@ 2007-11-01 16:58                         ` Sape Mullender
  2 siblings, 0 replies; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 16:58 UTC (permalink / raw)
  To: 9fans

> In the case of read cache (which is probably going to be used more  
> often than write-cache), the client needs to send two RPC every time  
> a writer modifies the cached file. What if Rlease doesn't necessary  
> break the lease, but have an option (negotiated in Tcache) to let the  
> client know that the file is changed without breaking the lease.
> 
> Thanks,
> 	Lucho

Another point in this discussion:
1. Most files are not shared
2. Some files are read-shared
3. Very, very few files are read-write shared
(Satya did some research on this at CMU — quite some time ago)

Having said that, we do want correct semantics all the time,
especially for read-write sharing.
A file server can use heuristics to decide the time out for leases.
For example, it could always grant 10-minute leases to begin with.
Doesn't cost a thing unless the client refuses to return a lease early
(but clients will be rarely asked to do so).
With updates in the recent past, or with the first occurrence of
read-write sharing, lease times can be drastically reduced.

Note that for files not shared or read-shared, call backs do not happen,
so lease calls will occur at the rate of the lease time, which is
once every few minutes.  Big deal.

	Sape


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:21                           ` Sape Mullender
@ 2007-11-01 16:58                             ` Francisco J Ballesteros
  2007-11-01 17:11                               ` Charles Forsyth
                                                 ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-11-01 16:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I was thinking about something like

Tinval  (asks the server for invalidations for files seen)
Rinval (reports new invalidations)
Tinval (asks for further invals, and let the server know that Rinval
was seen by client)

This is similar to the "changes" file proposed above, but it´s simple, does not
require two new RPCs (a server would respond to a Tinval with an Rerror
(unknown request or whatever), is not an upcall (although behaves as one) and
may both let the client know which files changed and which cache
entries are invalid.

We did consider this, but having all terminals connected to slow links
to the central
fs means that all fs activity might be slowed down by the link with
worst latency.

However, my experience using this thing says that (at least in my
case) I´m using at most
one remote terminal at a time, or a bunch of well connected terminals.
Which might suggest
that this Tinval thing might pay.

Time to experiment, perhaps.


On 11/1/07, Sape Mullender <sape@plan9.bell-labs.com> wrote:
> > Why not just have a file that a client reads that lets the client know
> > of changes to files.
>
> A bit better, but the comment I just made about breaking single-copy
> semantics still hold.  The point is that merely notifying the client
> isn't enough.  The server should wait for an acknowledgement to that
> notification (which possibly doesn't arrive until after the client has
> flushed its updates from the cache).
>
>         Sape
>
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 14:28                   ` Russ Cox
                                       ` (2 preceding siblings ...)
  2007-11-01 15:26                     ` Sape Mullender
@ 2007-11-01 16:59                     ` Bakul Shah
  3 siblings, 0 replies; 77+ messages in thread
From: Bakul Shah @ 2007-11-01 16:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> > Do you recall what the issues were?
> 
> The main proposal I remember was to give out read and write
> tokens and then revoke them as appropriate.  The two big problems
> with this were that it meant having spontaneous server->client->server
> message exchange instead of the usual client->server->client
> (a very large semantic break with existing 9P) and that once you
> had granted a read token to a client the client could simply stop
> responding (acknowledging the revocation) and slow down the
> rest of the system.

Thanks.

> > Wouldn't something like load-linked/store-conditional suffice
> > if the common case is a single writer?  When the client does
> > a "read-linked" call, the server sends an ID along with the
> > data. The client can then do a "write-conditional" by passing
> > the original ID and new data.  If the ID is not valid anymore
> > (if someone else wrote in the meantime) the write fails.  The
> > server doesn't have to keep any client state or inform anyone
> > about an invalid cache.  Of course, if any client fails to
> > follow this protocol things fall apart but at least well
> > behaved clients can get coherency. And this would work for
> > cases such as making changes on a disconnected laptop and
> > resyncing to the main server on the next connect.  You
> > wouldn't use this for synthetic files.  This ID can be as
> > simple as a file "generation" number incremented on each
> > write or crypto strong checksum.
> 
> This doesn't solve the problem of one client caching the file
> contents and another writing to the file; how does the first
> find out that the file has changed before it uses the cached 
> contents again?

It can in effect find out if the file changed between two
calls to the server.

My thought was that there are many schemes for providing full
consistency, each with its own strengths and problems so may
be support for a specific scheme doesn't belong in 9p but if
it provides "relaxed" cosistency of LL/SC, various full
consistency schemes can be built on top of that.

So for example if you read a lockfile and it says `free' you
conditionally write `busy'.  If the write succeeds you have
exclusive access to the file being protected by this
lockfile. But if the read says `busy' or if your conditional
write fails, you try again. Cooperating user processes can
set up any other scheme as well such as shared read exclusive
write or a leased based one etc.

If version number accompanies every read/write one can
implement multi-versioned concurrency -- readers get a
consistent version but writers can only write the "head"
version.  It would be nice to be able to implement that but I
wouldn't want it builtin.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:04                         ` ron minnich
  2007-11-01 16:16                           ` Latchesar Ionkov
  2007-11-01 16:21                           ` Sape Mullender
@ 2007-11-01 17:03                           ` Russ Cox
  2007-11-01 17:12                             ` Sape Mullender
                                               ` (3 more replies)
  2007-11-01 17:14                           ` Bakul Shah
  3 siblings, 4 replies; 77+ messages in thread
From: Russ Cox @ 2007-11-01 17:03 UTC (permalink / raw)
  To: 9fans

> The fact is we have loose consistency now, we just don't call it that.
> Anytime you are running a file from a server, you have loose
> consistency. It works ok in most cases.

Because all reads and writes go through to the
server, all file system operations on a particular
server are globally ordered, *and* that global ordering
matches the actual sequence of events in the
physical world (because clients wait for R-messages).
That's a pretty strong consistency statement!

Any revocation-based system has to have the server
wait for an acknowledgement from the client.
If there is no wait, then between the time that the server
sends the "oops, stop caching this" and the client
processes it, the client might incorrectly use the
now-invalid data.  That's why a change file doesn't
provide the same consistency guarantees as pushing
all reads/writes to the file server.  To get those, revocations
fundamentally must wait for the client.  

It's also why this doesn't work:

> Tcache	asks whether the server is prepared to cache
> Rcache	makes lease available with parameters, Rerror says no.
> 
> Tlease	says, ok start my lease now (almost immediately follows Rache)
> Rlease	lease expired or lease needs to be given back early
> 
> Tcache	done with old lease (may immediately ask for a new lease)
> etc.

because the Rlease/Tcache sequence is a s->c->s
message.  If a client doesn't respond with the Tcache
to formally give up the lease, the server has no choice
but to wait.

If you are willing to assume that each machine has
a real-time clock that runs approximately at the
same rate (so that different machines agree on 
what 5 seconds means, but not necessarily what
time it is right now), then you can fix the above messages
by saying that the client lease is only good for a fixed
time period (say 5 seconds) from the time that the
client sent the Tlease.  Then the server can overestimate
the lease length as 5 seconds from when it sent the
Rlease, and everything is safe.  And if the server 
sends a Rlease and the client doesn't respond with
a Tcache to officially renounce the lease, the server
can just wait until Tlease + 5 seconds and go on.
But that means the client has to be renewing the
lease every 5 seconds (more frequently, actually).

Also, in the case where the lease is just expiring
but not being revoked, then you have to have some
mechanism for establishing the new lease before
the old one runs out.  If there is a time between
two leases when you don't hold any leases, then 
all your cached data becomes invalid.

The following works:

Tnewlease	asks for a new lease
Rnewlease	grants the lease, for n seconds starting at time of Tnewlease

Trenewlease	asks to renew the lease
Rrenewlease	grants the renewal for n seconds starting at time of Trenewlease

Now if the server needs to revoke the lease, it just 
refuses to renew and waits until the current lease expires.

You can add a pseudo-callback to speed up revocation
with a cooperative client:

Tneeditback	offers to give lease back to server early
Rneeditback	says I accept your offer, please do

Tdroplease	gives lease back
Rdroplease	says okay I got it (not really necessary)

The lease ends when the client sends Tdroplease, 
*not* when the server sends Rneeditback.  It can't end
at Rneeditback for the same reason change files don't work.
And this can *only* be an optimization, because it 
depends on the client sending Tdroplease.  To get 
something that works in the presence of misbehaved
clients you have to be able to fall back on the 
"wait it out" strategy.

One could, of course, use a different protocol with 
a 9P front end.  That's okay for clients, but you'd still
have to teach the back-end server (i.e. fossil) to speak
the other protocol directly in order to get any guarantees.
(If 9P doesn't cut it then anything that's just in front of
(not in place of) a 9P server can't solve the problem.)

Russ


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:58                             ` Francisco J Ballesteros
@ 2007-11-01 17:11                               ` Charles Forsyth
  2007-11-01 17:11                                 ` Francisco J Ballesteros
  2007-11-01 17:13                               ` Sape Mullender
  2007-11-01 17:38                               ` ron minnich
  2 siblings, 1 reply; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01 17:11 UTC (permalink / raw)
  To: 9fans

> This is similar to the "changes" file proposed above, but it´s simple, does not
> require two new RPCs (a server would respond to a Tinval with an Rerror
> (unknown request or whatever), is not an upcall (although behaves as one) and
> may both let the client know which files changed and which cache
> entries are invalid.

the advantage of the changes file is that it requires no new rpcs at all
so you can do it today


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:11                               ` Charles Forsyth
@ 2007-11-01 17:11                                 ` Francisco J Ballesteros
  0 siblings, 0 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-11-01 17:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I suppose you could implement the same

Tinval
Rinval
Tinval ....

protocol just by issuing sequential reads on a changes file, but you´d have to
modify both the server and the client even if it´s by using a file and
not by including
a new transaction in 9p. I admit that good thing of using a file is
that 9p remains untouched.

On 11/1/07, Charles Forsyth <forsyth@terzarima.net> wrote:
> > This is similar to the "changes" file proposed above, but it´s simple, does not
> > require two new RPCs (a server would respond to a Tinval with an Rerror
> > (unknown request or whatever), is not an upcall (although behaves as one) and
> > may both let the client know which files changed and which cache
> > entries are invalid.
>
> the advantage of the changes file is that it requires no new rpcs at all
> so you can do it today
>
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:03                           ` Russ Cox
@ 2007-11-01 17:12                             ` Sape Mullender
  2007-11-01 17:35                               ` erik quanstrom
  2007-11-01 17:13                             ` Charles Forsyth
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 17:12 UTC (permalink / raw)
  To: 9fans

> It's also why this doesn't work:
> 
>> Tcache	asks whether the server is prepared to cache
>> Rcache	makes lease available with parameters, Rerror says no.
>> 
>> Tlease	says, ok start my lease now (almost immediately follows Rache)
>> Rlease	lease expired or lease needs to be given back early
>> 
>> Tcache	done with old lease (may immediately ask for a new lease)
>> etc.
> 
> because the Rlease/Tcache sequence is a s->c->s
> message.  If a client doesn't respond with the Tcache
> to formally give up the lease, the server has no choice
> but to wait.

Correct.  And if the server *does* wait in that case, single-copy
semantics are maintained.  My assumption is that clients will, in
general, be well-behaved and use Tcache to allow the server to reuse
the file earlier than indicated in the lease.

Any maliciousness on the part of clients in this scheme would result
in (possibly one-time only) temporary denial of service to users
sharing a file; such users are not usually maliciously inclined.


Indeed, the Rlease/Tcache sequence forms a s->c->s interaction
and the second half (c->s) is a necessary one for synchronizing
the client's release of the file to the next client's access to
that file.

This discussion reminds me of the distributed file system discussion
raging in the eighties.  I'm showing my age (and nothing has changed,
but much has been forgotten).

	Sape


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:03                           ` Russ Cox
  2007-11-01 17:12                             ` Sape Mullender
@ 2007-11-01 17:13                             ` Charles Forsyth
  2007-11-01 17:16                               ` Charles Forsyth
  2007-11-01 17:52                             ` Eric Van Hensbergen
  2007-11-01 18:00                             ` Latchesar Ionkov
  3 siblings, 1 reply; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01 17:13 UTC (permalink / raw)
  To: 9fans

> That's why a change file doesn't
> provide the same consistency guarantees as pushing
> all reads/writes to the file server.  To get those, revocations

sorry: i was assuming that when needed the client (or rather a cache control agent on the client)
would acknowledge by writing back on the file.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:58                             ` Francisco J Ballesteros
  2007-11-01 17:11                               ` Charles Forsyth
@ 2007-11-01 17:13                               ` Sape Mullender
  2007-11-01 17:38                               ` ron minnich
  2 siblings, 0 replies; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 17:13 UTC (permalink / raw)
  To: 9fans

> Tinval  (asks the server for invalidations for files seen)
> Rinval (reports new invalidations)
> Tinval (asks for further invals, and let the server know that Rinval
> was seen by client)

That'll work.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:04                         ` ron minnich
                                             ` (2 preceding siblings ...)
  2007-11-01 17:03                           ` Russ Cox
@ 2007-11-01 17:14                           ` Bakul Shah
  3 siblings, 0 replies; 77+ messages in thread
From: Bakul Shah @ 2007-11-01 17:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Why not just have a file that a client reads that lets the client know
> of changes to files.
> 
> client opens this server-provided file ("changes"? "dnotify"?)
> 
> Server agrees to send client info about all FIDS which client has
> active that are changing. Form of the message?
> fid[4]offset[8]len[4]

A changefile would be useful in many ways.  You can implement
Unix's select(2) or poll(2).  You can discover when devices
disappear or reappear for auto configuration. You can grab
new email as soon as it gets delivered to your mbox etc.
Change propagation is very handy.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:13                             ` Charles Forsyth
@ 2007-11-01 17:16                               ` Charles Forsyth
  2007-11-01 17:20                                 ` Charles Forsyth
  0 siblings, 1 reply; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01 17:16 UTC (permalink / raw)
  To: 9fans

> sorry: i was assuming that when needed the client (or rather a cache control agent on the client)
> would acknowledge by writing back on the file.

actually any service-specific agent on the client, it's not just for cache control,
which is one reason i like it more than cache-specific things actually in the protocol.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:16                               ` Charles Forsyth
@ 2007-11-01 17:20                                 ` Charles Forsyth
  0 siblings, 0 replies; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01 17:20 UTC (permalink / raw)
  To: 9fans

>> sorry: i was assuming that when needed the client (or rather an agent on the client)
>> would acknowledge by writing back on the file.

indeed in some cases a subsequent read might be defined to acknowledge the items
previously read.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:12                             ` Sape Mullender
@ 2007-11-01 17:35                               ` erik quanstrom
  2007-11-01 18:36                                 ` erik quanstrom
  0 siblings, 1 reply; 77+ messages in thread
From: erik quanstrom @ 2007-11-01 17:35 UTC (permalink / raw)
  To: 9fans

> Any maliciousness on the part of clients in this scheme would result
> in (possibly one-time only) temporary denial of service to users
> sharing a file; such users are not usually maliciously inclined.

it doesn't take malice.  1 faulty client will do.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 16:58                             ` Francisco J Ballesteros
  2007-11-01 17:11                               ` Charles Forsyth
  2007-11-01 17:13                               ` Sape Mullender
@ 2007-11-01 17:38                               ` ron minnich
  2007-11-01 17:56                                 ` Francisco J Ballesteros
  2 siblings, 1 reply; 77+ messages in thread
From: ron minnich @ 2007-11-01 17:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

right, but before the experiments start in earnest, see what they're
doing at lustre.org, in nfs v3, etc.

Much of this discussion is familiar.

<shameless plug> you can also see what I did in mnfs ca. 1992, if you
promise to ignore the use I put it to (DSM). I implemented invalidates
for shared pages. It took, like 15 minutes to implement. It required
that I run an nfs server on each client, however, and it worked
because NFS blocks on a node have a global name: <fhandle>:<offset>.
So the server tracked who had what pages, and an invalidate was
actually a simple RPC from servers to clients. yes, this broke the
c-s-c model, but hey ... I like it better than leases, personally.

But neither this nor leases seems to scale terribly well to 4096 or
more clients.

ron


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:03                           ` Russ Cox
  2007-11-01 17:12                             ` Sape Mullender
  2007-11-01 17:13                             ` Charles Forsyth
@ 2007-11-01 17:52                             ` Eric Van Hensbergen
  2007-11-01 18:00                             ` Latchesar Ionkov
  3 siblings, 0 replies; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-11-01 17:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 11/1/07, Russ Cox <rsc@swtch.com> wrote:
>
> One could, of course, use a different protocol with
> a 9P front end.  That's okay for clients, but you'd still
> have to teach the back-end server (i.e. fossil) to speak
> the other protocol directly in order to get any guarantees.
> (If 9P doesn't cut it then anything that's just in front of
> (not in place of) a 9P server can't solve the problem.)
>

Sorry, could you clarify what you mean by this?

         -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:38                               ` ron minnich
@ 2007-11-01 17:56                                 ` Francisco J Ballesteros
  2007-11-01 18:01                                   ` Francisco J Ballesteros
                                                     ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-11-01 17:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

We did look at nfs v3 and lbfs before implementing op.

nfs v3 at least tried to address latency and not just bandwidth as lbfs,
but it seemed to use more RPCs than needed for some tasks (I don´t remember
now, but have that written down somewhere).



On 11/1/07, ron minnich <rminnich@gmail.com> wrote:
> right, but before the experiments start in earnest, see what they're
> doing at lustre.org, in nfs v3, etc.
>
> Much of this discussion is familiar.
>
> <shameless plug> you can also see what I did in mnfs ca. 1992, if you
> promise to ignore the use I put it to (DSM). I implemented invalidates
> for shared pages. It took, like 15 minutes to implement. It required
> that I run an nfs server on each client, however, and it worked
> because NFS blocks on a node have a global name: <fhandle>:<offset>.
> So the server tracked who had what pages, and an invalidate was
> actually a simple RPC from servers to clients. yes, this broke the
> c-s-c model, but hey ... I like it better than leases, personally.
>
> But neither this nor leases seems to scale terribly well to 4096 or
> more clients.
>
> ron
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:03                           ` Russ Cox
                                               ` (2 preceding siblings ...)
  2007-11-01 17:52                             ` Eric Van Hensbergen
@ 2007-11-01 18:00                             ` Latchesar Ionkov
  2007-11-01 18:03                               ` Francisco J Ballesteros
  3 siblings, 1 reply; 77+ messages in thread
From: Latchesar Ionkov @ 2007-11-01 18:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

The 5 seconds lease might work in the local network case, but not  
caching at all is going to work out pretty well too. What if you want  
to cache over Internet and you round-trip is 3-4 seconds :)

On Nov 1, 2007, at 11:03 AM, Russ Cox wrote:

>> The fact is we have loose consistency now, we just don't call it  
>> that.
>> Anytime you are running a file from a server, you have loose
>> consistency. It works ok in most cases.
>
> Because all reads and writes go through to the
> server, all file system operations on a particular
> server are globally ordered, *and* that global ordering
> matches the actual sequence of events in the
> physical world (because clients wait for R-messages).
> That's a pretty strong consistency statement!
>
> Any revocation-based system has to have the server
> wait for an acknowledgement from the client.
> If there is no wait, then between the time that the server
> sends the "oops, stop caching this" and the client
> processes it, the client might incorrectly use the
> now-invalid data.  That's why a change file doesn't
> provide the same consistency guarantees as pushing
> all reads/writes to the file server.  To get those, revocations
> fundamentally must wait for the client.
>
> It's also why this doesn't work:
>
>> Tcache	asks whether the server is prepared to cache
>> Rcache	makes lease available with parameters, Rerror says no.
>>
>> Tlease	says, ok start my lease now (almost immediately follows Rache)
>> Rlease	lease expired or lease needs to be given back early
>>
>> Tcache	done with old lease (may immediately ask for a new lease)
>> etc.
>
> because the Rlease/Tcache sequence is a s->c->s
> message.  If a client doesn't respond with the Tcache
> to formally give up the lease, the server has no choice
> but to wait.
>
> If you are willing to assume that each machine has
> a real-time clock that runs approximately at the
> same rate (so that different machines agree on
> what 5 seconds means, but not necessarily what
> time it is right now), then you can fix the above messages
> by saying that the client lease is only good for a fixed
> time period (say 5 seconds) from the time that the
> client sent the Tlease.  Then the server can overestimate
> the lease length as 5 seconds from when it sent the
> Rlease, and everything is safe.  And if the server
> sends a Rlease and the client doesn't respond with
> a Tcache to officially renounce the lease, the server
> can just wait until Tlease + 5 seconds and go on.
> But that means the client has to be renewing the
> lease every 5 seconds (more frequently, actually).
>
> Also, in the case where the lease is just expiring
> but not being revoked, then you have to have some
> mechanism for establishing the new lease before
> the old one runs out.  If there is a time between
> two leases when you don't hold any leases, then
> all your cached data becomes invalid.
>
> The following works:
>
> Tnewlease	asks for a new lease
> Rnewlease	grants the lease, for n seconds starting at time of  
> Tnewlease
>
> Trenewlease	asks to renew the lease
> Rrenewlease	grants the renewal for n seconds starting at time of  
> Trenewlease
>
> Now if the server needs to revoke the lease, it just
> refuses to renew and waits until the current lease expires.
>
> You can add a pseudo-callback to speed up revocation
> with a cooperative client:
>
> Tneeditback	offers to give lease back to server early
> Rneeditback	says I accept your offer, please do
>
> Tdroplease	gives lease back
> Rdroplease	says okay I got it (not really necessary)
>
> The lease ends when the client sends Tdroplease,
> *not* when the server sends Rneeditback.  It can't end
> at Rneeditback for the same reason change files don't work.
> And this can *only* be an optimization, because it
> depends on the client sending Tdroplease.  To get
> something that works in the presence of misbehaved
> clients you have to be able to fall back on the
> "wait it out" strategy.
>
> One could, of course, use a different protocol with
> a 9P front end.  That's okay for clients, but you'd still
> have to teach the back-end server (i.e. fossil) to speak
> the other protocol directly in order to get any guarantees.
> (If 9P doesn't cut it then anything that's just in front of
> (not in place of) a 9P server can't solve the problem.)
>
> Russ
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:56                                 ` Francisco J Ballesteros
@ 2007-11-01 18:01                                   ` Francisco J Ballesteros
  2007-11-01 18:52                                   ` Eric Van Hensbergen
  2007-11-01 23:24                                   ` ron minnich
  2 siblings, 0 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-11-01 18:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

One thing that Op taught us was that it´s quite nice to add changes in
the protocol
just by NOT changing the protocol, and putting special servers-clients
as gateways.

We could implement a export for caching clients (and a caching client) so that
the export implements the Tinval-Rinval thing, and serves the caching clients.
The export process may export its own namespace (perhaps just with the main
fileserver mounted) and the caching clients would behave as 9p servers to others
in their machines. This worked just fine in Inferno and could work if
we try it in Plan 9.

Also, I´m now thinking that we could take advantage of these new
xport-cache links
to avoid some RPCs. To maintain 9p semantics we could not use op in there, but
perhaps there´s something in the middle. I have to give this all more thought.

Any comment on this, though?


On 11/1/07, Francisco J Ballesteros <nemo@lsub.org> wrote:
> We did look at nfs v3 and lbfs before implementing op.
>
> nfs v3 at least tried to address latency and not just bandwidth as lbfs,
> but it seemed to use more RPCs than needed for some tasks (I don´t remember
> now, but have that written down somewhere).
>
>
>
> On 11/1/07, ron minnich <rminnich@gmail.com> wrote:
> > right, but before the experiments start in earnest, see what they're
> > doing at lustre.org, in nfs v3, etc.
> >
> > Much of this discussion is familiar.
> >
> > <shameless plug> you can also see what I did in mnfs ca. 1992, if you
> > promise to ignore the use I put it to (DSM). I implemented invalidates
> > for shared pages. It took, like 15 minutes to implement. It required
> > that I run an nfs server on each client, however, and it worked
> > because NFS blocks on a node have a global name: <fhandle>:<offset>.
> > So the server tracked who had what pages, and an invalidate was
> > actually a simple RPC from servers to clients. yes, this broke the
> > c-s-c model, but hey ... I like it better than leases, personally.
> >
> > But neither this nor leases seems to scale terribly well to 4096 or
> > more clients.
> >
> > ron
> >
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 18:00                             ` Latchesar Ionkov
@ 2007-11-01 18:03                               ` Francisco J Ballesteros
  2007-11-01 18:08                                 ` Latchesar Ionkov
  0 siblings, 1 reply; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-11-01 18:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

But 5 seconds would be enough to be convinced that a client is not properly
responding to invalidation requests, and cease all its leases. Why make other
clients wait for more? I mean, assuming a central FS and clients
connected on star
to it.

On 11/1/07, Latchesar Ionkov <lionkov@lanl.gov> wrote:
> The 5 seconds lease might work in the local network case, but not
> caching at all is going to work out pretty well too. What if you want
> to cache over Internet and you round-trip is 3-4 seconds :)
>
> On Nov 1, 2007, at 11:03 AM, Russ Cox wrote:
>
> >> The fact is we have loose consistency now, we just don't call it
> >> that.
> >> Anytime you are running a file from a server, you have loose
> >> consistency. It works ok in most cases.
> >
> > Because all reads and writes go through to the
> > server, all file system operations on a particular
> > server are globally ordered, *and* that global ordering
> > matches the actual sequence of events in the
> > physical world (because clients wait for R-messages).
> > That's a pretty strong consistency statement!
> >
> > Any revocation-based system has to have the server
> > wait for an acknowledgement from the client.
> > If there is no wait, then between the time that the server
> > sends the "oops, stop caching this" and the client
> > processes it, the client might incorrectly use the
> > now-invalid data.  That's why a change file doesn't
> > provide the same consistency guarantees as pushing
> > all reads/writes to the file server.  To get those, revocations
> > fundamentally must wait for the client.
> >
> > It's also why this doesn't work:
> >
> >> Tcache       asks whether the server is prepared to cache
> >> Rcache       makes lease available with parameters, Rerror says no.
> >>
> >> Tlease       says, ok start my lease now (almost immediately follows Rache)
> >> Rlease       lease expired or lease needs to be given back early
> >>
> >> Tcache       done with old lease (may immediately ask for a new lease)
> >> etc.
> >
> > because the Rlease/Tcache sequence is a s->c->s
> > message.  If a client doesn't respond with the Tcache
> > to formally give up the lease, the server has no choice
> > but to wait.
> >
> > If you are willing to assume that each machine has
> > a real-time clock that runs approximately at the
> > same rate (so that different machines agree on
> > what 5 seconds means, but not necessarily what
> > time it is right now), then you can fix the above messages
> > by saying that the client lease is only good for a fixed
> > time period (say 5 seconds) from the time that the
> > client sent the Tlease.  Then the server can overestimate
> > the lease length as 5 seconds from when it sent the
> > Rlease, and everything is safe.  And if the server
> > sends a Rlease and the client doesn't respond with
> > a Tcache to officially renounce the lease, the server
> > can just wait until Tlease + 5 seconds and go on.
> > But that means the client has to be renewing the
> > lease every 5 seconds (more frequently, actually).
> >
> > Also, in the case where the lease is just expiring
> > but not being revoked, then you have to have some
> > mechanism for establishing the new lease before
> > the old one runs out.  If there is a time between
> > two leases when you don't hold any leases, then
> > all your cached data becomes invalid.
> >
> > The following works:
> >
> > Tnewlease     asks for a new lease
> > Rnewlease     grants the lease, for n seconds starting at time of
> > Tnewlease
> >
> > Trenewlease   asks to renew the lease
> > Rrenewlease   grants the renewal for n seconds starting at time of
> > Trenewlease
> >
> > Now if the server needs to revoke the lease, it just
> > refuses to renew and waits until the current lease expires.
> >
> > You can add a pseudo-callback to speed up revocation
> > with a cooperative client:
> >
> > Tneeditback   offers to give lease back to server early
> > Rneeditback   says I accept your offer, please do
> >
> > Tdroplease    gives lease back
> > Rdroplease    says okay I got it (not really necessary)
> >
> > The lease ends when the client sends Tdroplease,
> > *not* when the server sends Rneeditback.  It can't end
> > at Rneeditback for the same reason change files don't work.
> > And this can *only* be an optimization, because it
> > depends on the client sending Tdroplease.  To get
> > something that works in the presence of misbehaved
> > clients you have to be able to fall back on the
> > "wait it out" strategy.
> >
> > One could, of course, use a different protocol with
> > a 9P front end.  That's okay for clients, but you'd still
> > have to teach the back-end server (i.e. fossil) to speak
> > the other protocol directly in order to get any guarantees.
> > (If 9P doesn't cut it then anything that's just in front of
> > (not in place of) a 9P server can't solve the problem.)
> >
> > Russ
> >
>
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 18:03                               ` Francisco J Ballesteros
@ 2007-11-01 18:08                                 ` Latchesar Ionkov
  2007-11-01 18:16                                   ` erik quanstrom
  2007-11-01 18:19                                   ` Francisco J Ballesteros
  0 siblings, 2 replies; 77+ messages in thread
From: Latchesar Ionkov @ 2007-11-01 18:08 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

The problem is that the clients with higher latencies badly need to  
be able to cache. And the ones with better latencies can afford not  
caching :)

On Nov 1, 2007, at 12:03 PM, Francisco J Ballesteros wrote:

> But 5 seconds would be enough to be convinced that a client is not  
> properly
> responding to invalidation requests, and cease all its leases. Why  
> make other
> clients wait for more? I mean, assuming a central FS and clients
> connected on star
> to it.
>
> On 11/1/07, Latchesar Ionkov <lionkov@lanl.gov> wrote:
>> The 5 seconds lease might work in the local network case, but not
>> caching at all is going to work out pretty well too. What if you want
>> to cache over Internet and you round-trip is 3-4 seconds :)
>>
>> On Nov 1, 2007, at 11:03 AM, Russ Cox wrote:
>>
>>>> The fact is we have loose consistency now, we just don't call it
>>>> that.
>>>> Anytime you are running a file from a server, you have loose
>>>> consistency. It works ok in most cases.
>>>
>>> Because all reads and writes go through to the
>>> server, all file system operations on a particular
>>> server are globally ordered, *and* that global ordering
>>> matches the actual sequence of events in the
>>> physical world (because clients wait for R-messages).
>>> That's a pretty strong consistency statement!
>>>
>>> Any revocation-based system has to have the server
>>> wait for an acknowledgement from the client.
>>> If there is no wait, then between the time that the server
>>> sends the "oops, stop caching this" and the client
>>> processes it, the client might incorrectly use the
>>> now-invalid data.  That's why a change file doesn't
>>> provide the same consistency guarantees as pushing
>>> all reads/writes to the file server.  To get those, revocations
>>> fundamentally must wait for the client.
>>>
>>> It's also why this doesn't work:
>>>
>>>> Tcache       asks whether the server is prepared to cache
>>>> Rcache       makes lease available with parameters, Rerror says no.
>>>>
>>>> Tlease       says, ok start my lease now (almost immediately  
>>>> follows Rache)
>>>> Rlease       lease expired or lease needs to be given back early
>>>>
>>>> Tcache       done with old lease (may immediately ask for a new  
>>>> lease)
>>>> etc.
>>>
>>> because the Rlease/Tcache sequence is a s->c->s
>>> message.  If a client doesn't respond with the Tcache
>>> to formally give up the lease, the server has no choice
>>> but to wait.
>>>
>>> If you are willing to assume that each machine has
>>> a real-time clock that runs approximately at the
>>> same rate (so that different machines agree on
>>> what 5 seconds means, but not necessarily what
>>> time it is right now), then you can fix the above messages
>>> by saying that the client lease is only good for a fixed
>>> time period (say 5 seconds) from the time that the
>>> client sent the Tlease.  Then the server can overestimate
>>> the lease length as 5 seconds from when it sent the
>>> Rlease, and everything is safe.  And if the server
>>> sends a Rlease and the client doesn't respond with
>>> a Tcache to officially renounce the lease, the server
>>> can just wait until Tlease + 5 seconds and go on.
>>> But that means the client has to be renewing the
>>> lease every 5 seconds (more frequently, actually).
>>>
>>> Also, in the case where the lease is just expiring
>>> but not being revoked, then you have to have some
>>> mechanism for establishing the new lease before
>>> the old one runs out.  If there is a time between
>>> two leases when you don't hold any leases, then
>>> all your cached data becomes invalid.
>>>
>>> The following works:
>>>
>>> Tnewlease     asks for a new lease
>>> Rnewlease     grants the lease, for n seconds starting at time of
>>> Tnewlease
>>>
>>> Trenewlease   asks to renew the lease
>>> Rrenewlease   grants the renewal for n seconds starting at time of
>>> Trenewlease
>>>
>>> Now if the server needs to revoke the lease, it just
>>> refuses to renew and waits until the current lease expires.
>>>
>>> You can add a pseudo-callback to speed up revocation
>>> with a cooperative client:
>>>
>>> Tneeditback   offers to give lease back to server early
>>> Rneeditback   says I accept your offer, please do
>>>
>>> Tdroplease    gives lease back
>>> Rdroplease    says okay I got it (not really necessary)
>>>
>>> The lease ends when the client sends Tdroplease,
>>> *not* when the server sends Rneeditback.  It can't end
>>> at Rneeditback for the same reason change files don't work.
>>> And this can *only* be an optimization, because it
>>> depends on the client sending Tdroplease.  To get
>>> something that works in the presence of misbehaved
>>> clients you have to be able to fall back on the
>>> "wait it out" strategy.
>>>
>>> One could, of course, use a different protocol with
>>> a 9P front end.  That's okay for clients, but you'd still
>>> have to teach the back-end server (i.e. fossil) to speak
>>> the other protocol directly in order to get any guarantees.
>>> (If 9P doesn't cut it then anything that's just in front of
>>> (not in place of) a 9P server can't solve the problem.)
>>>
>>> Russ
>>>
>>
>>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 18:08                                 ` Latchesar Ionkov
@ 2007-11-01 18:16                                   ` erik quanstrom
  2007-11-01 18:19                                   ` Francisco J Ballesteros
  1 sibling, 0 replies; 77+ messages in thread
From: erik quanstrom @ 2007-11-01 18:16 UTC (permalink / raw)
  To: 9fans

> The problem is that the clients with higher latencies badly need to  
> be able to cache. And the ones with better latencies can afford not  
> caching :)

the irony is, the higher the latency, the greater the cost of syncronization.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 18:08                                 ` Latchesar Ionkov
  2007-11-01 18:16                                   ` erik quanstrom
@ 2007-11-01 18:19                                   ` Francisco J Ballesteros
  2007-11-01 18:35                                     ` Sape Mullender
  1 sibling, 1 reply; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-11-01 18:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I know, I think I was not clear, sorry.

The point is that, referring to the
Tinval-Rinval-Tinval-...
sequence, a server could afford to consider its Rinval acknowledged
if the client happens not to respond (by issuing another Tinval)
within 5 seconds.
That would "freeze" clients for at most 5 seconds when a client fails
to respond,
but that should not happen often, and it would not make other clients slow.

Also, regarding
> the irony is, the higher the latency, the greater the cost of syncronization.

if we consider that this would happen only for rw files, and that rd files would
be considered as leased up to the next Rinval mentioning them, the cost would
not probably be too high. But I won´t actually know before
implementing and trying it.


On 11/1/07, Latchesar Ionkov <lionkov@lanl.gov> wrote:
> The problem is that the clients with higher latencies badly need to
> be able to cache. And the ones with better latencies can afford not
> caching :)
>
> On Nov 1, 2007, at 12:03 PM, Francisco J Ballesteros wrote:
>
> > But 5 seconds would be enough to be convinced that a client is not
> > properly
> > responding to invalidation requests, and cease all its leases. Why
> > make other
> > clients wait for more? I mean, assuming a central FS and clients
> > connected on star
> > to it.
> >
> > On 11/1/07, Latchesar Ionkov <lionkov@lanl.gov> wrote:
> >> The 5 seconds lease might work in the local network case, but not
> >> caching at all is going to work out pretty well too. What if you want
> >> to cache over Internet and you round-trip is 3-4 seconds :)
> >>
> >> On Nov 1, 2007, at 11:03 AM, Russ Cox wrote:
> >>
> >>>> The fact is we have loose consistency now, we just don't call it
> >>>> that.
> >>>> Anytime you are running a file from a server, you have loose
> >>>> consistency. It works ok in most cases.
> >>>
> >>> Because all reads and writes go through to the
> >>> server, all file system operations on a particular
> >>> server are globally ordered, *and* that global ordering
> >>> matches the actual sequence of events in the
> >>> physical world (because clients wait for R-messages).
> >>> That's a pretty strong consistency statement!
> >>>
> >>> Any revocation-based system has to have the server
> >>> wait for an acknowledgement from the client.
> >>> If there is no wait, then between the time that the server
> >>> sends the "oops, stop caching this" and the client
> >>> processes it, the client might incorrectly use the
> >>> now-invalid data.  That's why a change file doesn't
> >>> provide the same consistency guarantees as pushing
> >>> all reads/writes to the file server.  To get those, revocations
> >>> fundamentally must wait for the client.
> >>>
> >>> It's also why this doesn't work:
> >>>
> >>>> Tcache       asks whether the server is prepared to cache
> >>>> Rcache       makes lease available with parameters, Rerror says no.
> >>>>
> >>>> Tlease       says, ok start my lease now (almost immediately
> >>>> follows Rache)
> >>>> Rlease       lease expired or lease needs to be given back early
> >>>>
> >>>> Tcache       done with old lease (may immediately ask for a new
> >>>> lease)
> >>>> etc.
> >>>
> >>> because the Rlease/Tcache sequence is a s->c->s
> >>> message.  If a client doesn't respond with the Tcache
> >>> to formally give up the lease, the server has no choice
> >>> but to wait.
> >>>
> >>> If you are willing to assume that each machine has
> >>> a real-time clock that runs approximately at the
> >>> same rate (so that different machines agree on
> >>> what 5 seconds means, but not necessarily what
> >>> time it is right now), then you can fix the above messages
> >>> by saying that the client lease is only good for a fixed
> >>> time period (say 5 seconds) from the time that the
> >>> client sent the Tlease.  Then the server can overestimate
> >>> the lease length as 5 seconds from when it sent the
> >>> Rlease, and everything is safe.  And if the server
> >>> sends a Rlease and the client doesn't respond with
> >>> a Tcache to officially renounce the lease, the server
> >>> can just wait until Tlease + 5 seconds and go on.
> >>> But that means the client has to be renewing the
> >>> lease every 5 seconds (more frequently, actually).
> >>>
> >>> Also, in the case where the lease is just expiring
> >>> but not being revoked, then you have to have some
> >>> mechanism for establishing the new lease before
> >>> the old one runs out.  If there is a time between
> >>> two leases when you don't hold any leases, then
> >>> all your cached data becomes invalid.
> >>>
> >>> The following works:
> >>>
> >>> Tnewlease     asks for a new lease
> >>> Rnewlease     grants the lease, for n seconds starting at time of
> >>> Tnewlease
> >>>
> >>> Trenewlease   asks to renew the lease
> >>> Rrenewlease   grants the renewal for n seconds starting at time of
> >>> Trenewlease
> >>>
> >>> Now if the server needs to revoke the lease, it just
> >>> refuses to renew and waits until the current lease expires.
> >>>
> >>> You can add a pseudo-callback to speed up revocation
> >>> with a cooperative client:
> >>>
> >>> Tneeditback   offers to give lease back to server early
> >>> Rneeditback   says I accept your offer, please do
> >>>
> >>> Tdroplease    gives lease back
> >>> Rdroplease    says okay I got it (not really necessary)
> >>>
> >>> The lease ends when the client sends Tdroplease,
> >>> *not* when the server sends Rneeditback.  It can't end
> >>> at Rneeditback for the same reason change files don't work.
> >>> And this can *only* be an optimization, because it
> >>> depends on the client sending Tdroplease.  To get
> >>> something that works in the presence of misbehaved
> >>> clients you have to be able to fall back on the
> >>> "wait it out" strategy.
> >>>
> >>> One could, of course, use a different protocol with
> >>> a 9P front end.  That's okay for clients, but you'd still
> >>> have to teach the back-end server (i.e. fossil) to speak
> >>> the other protocol directly in order to get any guarantees.
> >>> (If 9P doesn't cut it then anything that's just in front of
> >>> (not in place of) a 9P server can't solve the problem.)
> >>>
> >>> Russ
> >>>
> >>
> >>
>
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 18:19                                   ` Francisco J Ballesteros
@ 2007-11-01 18:35                                     ` Sape Mullender
  2007-11-01 19:09                                       ` Charles Forsyth
  0 siblings, 1 reply; 77+ messages in thread
From: Sape Mullender @ 2007-11-01 18:35 UTC (permalink / raw)
  To: 9fans

> I know, I think I was not clear, sorry.
> 
> The point is that, referring to the
> Tinval-Rinval-Tinval-...
> sequence, a server could afford to consider its Rinval acknowledged
> if the client happens not to respond (by issuing another Tinval)
> within 5 seconds.
> That would "freeze" clients for at most 5 seconds when a client fails
> to respond,
> but that should not happen often, and it would not make other clients slow.
> 
> Also, regarding
>> the irony is, the higher the latency, the greater the cost of syncronization.
> 
> if we consider that this would happen only for rw files, and that rd files would
> be considered as leased up to the next Rinval mentioning them, the cost would
> not probably be too high. But I won´t actually know before
> implementing and trying it.

There's a funny obsession in this discussion with optimal performance
in the least common scenarios.
Let me reiterate:
1. 90% (a symbolic number) of files are not shared
2. Of the 10% remaining, 90% of files are read-shared
3. Of the 1 % remaining, 90% of clients are well-behaved
4. In the 0.1% remaining, only the first request in a series
   of requests issued by a badly behaving client will result in
   making the server wait for the lease timeout (in all subsequent cases,
   the server just won't give that client a lease or an extremely short
   one)
5. That leaves 0.01%of files.  Optimize away, guys.

	Sape


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:35                               ` erik quanstrom
@ 2007-11-01 18:36                                 ` erik quanstrom
  0 siblings, 0 replies; 77+ messages in thread
From: erik quanstrom @ 2007-11-01 18:36 UTC (permalink / raw)
  To: 9fans

sorry i didn't see that email for a bit.  i'm at home working on
loading the dump (up to 25 feb 2006).  and i got distracted taking a look
at what i needed to implement the last bit of functionality on my list.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:56                                 ` Francisco J Ballesteros
  2007-11-01 18:01                                   ` Francisco J Ballesteros
@ 2007-11-01 18:52                                   ` Eric Van Hensbergen
  2007-11-01 19:29                                     ` Francisco J Ballesteros
  2007-11-01 23:24                                   ` ron minnich
  2 siblings, 1 reply; 77+ messages in thread
From: Eric Van Hensbergen @ 2007-11-01 18:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 11/1/07, Francisco J Ballesteros <nemo@lsub.org> wrote:
> We did look at nfs v3 and lbfs before implementing op.
>
> nfs v3 at least tried to address latency and not just bandwidth as lbfs,
> but it seemed to use more RPCs than needed for some tasks (I don´t remember
> now, but have that written down somewhere).
>

IIRC, NFSv4 has more lease negotiation stuff as well as compound operations.
Of course its like a 500 page spec and looks to be more of an example
of what not to do IMHO...

      -eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 19:09                                       ` Charles Forsyth
@ 2007-11-01 19:07                                         ` erik quanstrom
  0 siblings, 0 replies; 77+ messages in thread
From: erik quanstrom @ 2007-11-01 19:07 UTC (permalink / raw)
  To: 9fans

>> There's a funny obsession in this discussion with optimal performance
>> in the least common scenarios.
> 
> but surely that's absolutely typical of most such discussions in computing
> and perhaps sports.

oddly, it's the same with spam.

- erik


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 18:35                                     ` Sape Mullender
@ 2007-11-01 19:09                                       ` Charles Forsyth
  2007-11-01 19:07                                         ` erik quanstrom
  0 siblings, 1 reply; 77+ messages in thread
From: Charles Forsyth @ 2007-11-01 19:09 UTC (permalink / raw)
  To: 9fans

> There's a funny obsession in this discussion with optimal performance
> in the least common scenarios.

but surely that's absolutely typical of most such discussions in computing
and perhaps sports.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 18:52                                   ` Eric Van Hensbergen
@ 2007-11-01 19:29                                     ` Francisco J Ballesteros
  0 siblings, 0 replies; 77+ messages in thread
From: Francisco J Ballesteros @ 2007-11-01 19:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Sorry. My fault. I meant v4.

On 11/1/07, Eric Van Hensbergen <ericvh@gmail.com> wrote:
> On 11/1/07, Francisco J Ballesteros <nemo@lsub.org> wrote:
> > We did look at nfs v3 and lbfs before implementing op.
> >
> > nfs v3 at least tried to address latency and not just bandwidth as lbfs,
> > but it seemed to use more RPCs than needed for some tasks (I don´t remember
> > now, but have that written down somewhere).
> >
>
> IIRC, NFSv4 has more lease negotiation stuff as well as compound operations.
> Of course its like a 500 page spec and looks to be more of an example
> of what not to do IMHO...
>
>       -eric
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
  2007-11-01 17:56                                 ` Francisco J Ballesteros
  2007-11-01 18:01                                   ` Francisco J Ballesteros
  2007-11-01 18:52                                   ` Eric Van Hensbergen
@ 2007-11-01 23:24                                   ` ron minnich
  2 siblings, 0 replies; 77+ messages in thread
From: ron minnich @ 2007-11-01 23:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 11/1/07, Francisco J Ballesteros <nemo@lsub.org> wrote:
> We did look at nfs v3 and lbfs before implementing op.
>
> nfs v3 at least tried to address latency and not just bandwidth as lbfs,
> but it seemed to use more RPCs than needed for some tasks (I don´t remember
> now, but have that written down somewhere).

it's all new and better in nfs v4.1 and lustre!

more features!

ron


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [9fans] QTCTL?
@ 2007-11-01  4:21 Brian L. Stuart
  0 siblings, 0 replies; 77+ messages in thread
From: Brian L. Stuart @ 2007-11-01  4:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> If the file is "decent", the cache must still check out
> that the file is up to date. It might not do so all the times
> (as we do, to trade for performance, as you say). That means
> that the cache would get further file data as soon as it sees
> a new qid.vers for the file. And tail -f would still work.

This is really starting to sound familiar.  A couple years ago,
I played around with a caching file system for laptops.  They
always bugged me, because they didn't play nice in a network with
a file server.  So it's meant to operate the same way whether
connected to the file server or not.  On reads, it checks the
file as described above, if connected to the file server.  When
connected, it implements a write-through cache, but records
writes when not connected.  When it connects again, it plays
back the writes, after a little optimizing.  And if the file
has already changed while we were disconnected, I flag it as
a conflict and invoke the powers of a human.

My first playing was in Plan 9, and then with fuse on Linux.
I've been using that version as the primary file system
for my home directory, caching a file server at home and
one at work.  It's not really ready for prime time, but it's
weaknesses are ones that I can live with at the moment.
So I haven't touched it in a while.

But to the question at hand, I never really thought in terms
of it being used for files that didn't connect to persistent
storage.  I always figured the persistent and the non-persistent
would be mostly in different directory trees, so I just ignored
the problem.  Another simplifying short cut was to not try
to implement the network import in it.  It just presents a
cache of one directory tree at another name.  So the authoritative
version is accessed through a normal remote mount.  It makes
some things simpler, but it also keeps it from seeing the
protocol traffic with the server.  In this scenario, it seems
the place to keep a cachability flag would be in the file
metadata, rather than in the protocol itself.  But then again,
my motivation wasn't to improve latency tolerance.

Of course, Russ is right that we can't be sure of coherence
without a way for the server to invalidate the cache of a
file.  But in my case, that wouldn't be available while
disconnected, and the way I use it, conflicts are more
likely when disconnected than while connected.  So I decided
to detect the conflict, rather than prevent it.  That meant
I didn't have to make any changes to the server side and
I could take the shortcut to just map one tree to another.

So this might not add anything to the discussion, but
it'd be nice to account for a wide range of caching
applications.

BLS


^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2007-11-01 23:24 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-31 18:40 [9fans] QTCTL? Francisco J Ballesteros
2007-10-31 18:56 ` Eric Van Hensbergen
2007-10-31 19:13   ` Charles Forsyth
2007-10-31 19:33     ` Eric Van Hensbergen
2007-10-31 19:39       ` erik quanstrom
2007-10-31 20:43     ` geoff
2007-10-31 21:32       ` Charles Forsyth
2007-10-31 22:48       ` roger peppe
2007-10-31 23:35         ` erik quanstrom
2007-11-01  9:29           ` roger peppe
2007-11-01 11:03             ` Eric Van Hensbergen
2007-11-01 11:19               ` Charles Forsyth
2007-11-01 12:11             ` erik quanstrom
2007-10-31 19:42 ` erik quanstrom
2007-10-31 19:49   ` Eric Van Hensbergen
2007-10-31 20:03     ` erik quanstrom
2007-10-31 20:10       ` Latchesar Ionkov
2007-10-31 20:12       ` Eric Van Hensbergen
2007-10-31 20:17       ` Russ Cox
2007-10-31 20:29         ` Francisco J Ballesteros
2007-10-31 20:48           ` Charles Forsyth
2007-10-31 21:23             ` Francisco J Ballesteros
2007-10-31 21:40               ` Russ Cox
2007-10-31 22:11                 ` Charles Forsyth
2007-10-31 22:26                 ` Francisco J Ballesteros
2007-10-31 22:37                   ` Charles Forsyth
2007-10-31 22:43                     ` Francisco J Ballesteros
2007-10-31 23:32                     ` Eric Van Hensbergen
2007-10-31 23:41                       ` [V9fs-developer] " Charles Forsyth
     [not found]                       ` <606b6f003ae9f0ed3e8c3c5f90ddc720@terzarima.net>
2007-11-01  1:13                         ` Eric Van Hensbergen
2007-10-31 23:54                   ` erik quanstrom
2007-11-01  0:03                     ` Charles Forsyth
2007-11-01  1:25                     ` Eric Van Hensbergen
2007-11-01  1:44                       ` erik quanstrom
2007-11-01  2:15                         ` Eric Van Hensbergen
2007-11-01  7:34                       ` Skip Tavakkolian
2007-11-01  6:21                 ` Bakul Shah
2007-11-01 14:28                   ` Russ Cox
2007-11-01 14:38                     ` erik quanstrom
2007-11-01 14:41                     ` Charles Forsyth
2007-11-01 15:26                     ` Sape Mullender
2007-11-01 15:51                       ` Latchesar Ionkov
2007-11-01 16:04                         ` ron minnich
2007-11-01 16:16                           ` Latchesar Ionkov
2007-11-01 16:21                           ` Sape Mullender
2007-11-01 16:58                             ` Francisco J Ballesteros
2007-11-01 17:11                               ` Charles Forsyth
2007-11-01 17:11                                 ` Francisco J Ballesteros
2007-11-01 17:13                               ` Sape Mullender
2007-11-01 17:38                               ` ron minnich
2007-11-01 17:56                                 ` Francisco J Ballesteros
2007-11-01 18:01                                   ` Francisco J Ballesteros
2007-11-01 18:52                                   ` Eric Van Hensbergen
2007-11-01 19:29                                     ` Francisco J Ballesteros
2007-11-01 23:24                                   ` ron minnich
2007-11-01 17:03                           ` Russ Cox
2007-11-01 17:12                             ` Sape Mullender
2007-11-01 17:35                               ` erik quanstrom
2007-11-01 18:36                                 ` erik quanstrom
2007-11-01 17:13                             ` Charles Forsyth
2007-11-01 17:16                               ` Charles Forsyth
2007-11-01 17:20                                 ` Charles Forsyth
2007-11-01 17:52                             ` Eric Van Hensbergen
2007-11-01 18:00                             ` Latchesar Ionkov
2007-11-01 18:03                               ` Francisco J Ballesteros
2007-11-01 18:08                                 ` Latchesar Ionkov
2007-11-01 18:16                                   ` erik quanstrom
2007-11-01 18:19                                   ` Francisco J Ballesteros
2007-11-01 18:35                                     ` Sape Mullender
2007-11-01 19:09                                       ` Charles Forsyth
2007-11-01 19:07                                         ` erik quanstrom
2007-11-01 17:14                           ` Bakul Shah
2007-11-01 16:17                         ` Sape Mullender
2007-11-01 16:27                           ` Sape Mullender
2007-11-01 16:58                         ` Sape Mullender
2007-11-01 16:59                     ` Bakul Shah
2007-11-01  4:21 Brian L. Stuart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).