Re: [9fans] Acme mailreader - now: User mode filesystems in linux

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
@ 2004-12-17 15:31 bmaroshe
  0 siblings, 0 replies; 18+ messages in thread
From: bmaroshe @ 2004-12-17 15:31 UTC (permalink / raw)
  To: Russ Cox, Fans of the OS Plan 9 from Bell Labs

> You're stuck with the operating system you have,
> not the operating system you'd like to have.  If one were
> designing the system from scratch one could always do
> better.  Sadly this Coda discussion is about how to deal with
> what's already available on Unix.
So in your opinion it's better to design systems from scratch than to try to bring good ideas to kludgey ancient systems?

boris



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader
@ 2004-12-16 15:08 David Leimbach
  2004-12-16 23:22 ` geoff
  0 siblings, 1 reply; 18+ messages in thread
From: David Leimbach @ 2004-12-16 15:08 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

hmmm not sure where this all came from on this thread of discussion :).

On Wed, 15 Dec 2004 16:24:47 -0800, geoff@collyer.net <geoff@collyer.net> wrote:
> OS X is in no sense a micro-kernel.  The OS X kernel is huge:
> 
>         ; size /mach_kernel
>         __TEXT  __DATA  __OBJC  others  dec     hex
>         3022848 458752  0       643984  4125584 3ef390
> 
> and consists of a heavily-hacked Mach (3, I believe) kernel and a
> FreeBSD kernel (with bits from other BSDs), combined into a single
> kernel and running in a single address space.  The BSD kernel does not
> run in user mode.  Remember that Mach was, as far as I know, the
> largest ``micro-kernel'' ever produced, larger than most or all of its
> contemporary ``macro-kernels'', so that some of us called it a
> ``Machro-kernel''.

It's an OSF Mach3 with "optimizations" :).  The kernel is really
nothing like FreeBSD.  It's more like the BSD from NeXTStep with some
Free/Open/Net BSD stuff hacked in.  Also you are forgetting IOKit [the
C++ framework for device drivers].

The Apple marketing team is just putting rubbish on the internet when
it comes to claiming things are based on FreeBSD 5.

In fairness, some of the userland applications and command line tools
are, in fact, from FreeBSD but the amount of FreeBSD in XNU [the
kernel] and Darwin is exaggerated.  Porting things from FreeBSD 5,
however to Mac OS X is quite painful because you have to deal with
IOKit and the hardly FreeBSD-like bsd kernel portion.

> 
> I haven't looked very hard (one could check out the mount_* sources
> from the Darwin CVS servers), but mount(2) doesn't seem to have much
> that's new, except for union mounts, which surprised me.  I suspect
> that most of the mount_* commands either invoke kernel machinery
> (through the ``type'' argument to mount) or pretend to be NFS servers.
> I've never yet seen a (l)unix system other than late Research Unix
> that made user-mode file servers relatively easy and painless to write
> (though I'd love to be shown a counter-example!).  Of course, since
> many (l)unix systems only allow the super-user to mount anything,
> their maintainers may not see much utility in user-mode file servers.
> It's sort of a cascade of vision-failures.

Maybe because people don't know why Plan 9 is better than Unix they
thing Unix is "the way".  Religion often overrides common sense.  Do
we need more plan 9 "missionaries"? [probably]

DragonflyBSD is working on making the VFS a message passing layer
instead of a system call layer so doing something like 9p is probably
already in their grand scheme of development.

http://www.dragonflybsd.org/goals/vfsmodel.cgi

This doesn't help Mac OS X of course.

> 
> Also, /sys/src/cmd/upas/README is a little dated:
> 
>         --rw-rw-r-- M 5174 sys sys 1041 Dec 11  1999 README
> 
> I'm not sure if it pre-dates upas/fs, but it describes how to port the
> parts of upas that don't rely on Plan 9 facilities (transport more
> than reading).  I ported Plan 9's upas back to Unix while at the labs
> (and also translated it into limbo), but some parts (e.g., upas/fs)
> didn't have an obvious implementation, other than painfully pretending
> to be an NFS server, at least at the time.
> 

Might be interesting to see how DragonFlyBSD has come along and if
it's possible to implement upas/fs with whatever they've done.

Again this doesn't really help Mac OS X.

I just think it's interesting.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader
  2004-12-16 15:08 [9fans] Acme mailreader David Leimbach
@ 2004-12-16 23:22 ` geoff
  2004-12-17  4:55   ` [9fans] Acme mailreader - now: User mode filesystems in linux Martin C.Atkins
  0 siblings, 1 reply; 18+ messages in thread
From: geoff @ 2004-12-16 23:22 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 2028 bytes --]

I worked at Apple in the BSD group.

The XNU non-Mach code was clearly some BSD kernel and I don't really
care which.  My colleagues told me it started out with NetBSD but that
that was eventually dwarfed by FreeBSD with contributions from
elsewhere.  While I was there, there was talk of dragging in code from
the latest FreeBSD, notably the FFS with soft updates; I'm pretty sure
that happened.  Given that the group was (and probably still is)
headed by Jordan Hubbard of FreeBSD fame, I suspect that they're
continuing to pull in FreeBSD code and it isn't just hype.

Note too that the XNU BSD code, measured in source lines, is almost
exactly as huge as the Mach code, so the volume of *BSD code in XNU is
not, in my opinion, exaggerated: it is (or was in 2002) half the
kernel source.  (I don't remember which side of the fence IOKit was
counted against.)

Yes, the XNU kernel details are different from a stock BSD kernel.  It
co-exists with Mach, after all.  Porting graphical applications to
native OS X (avoiding X11) is a pain too; Apple do a lot of things
their own way, inheriting baggage from the pre-Unix Mac OS and
NextStep (netinfo is just the French spelling of `Yellow Pages', ugh).

Nevertheless, I stand by my statement that OS X is in no sense a
micro-kernel, and that user-mode file servers will not (as a result of
access to a micro-kernel) be easier to implement on OS X than on other
(l)unixes.

However, Martin Atkins has revealed the mystery kernel agent: coda.
Apparently it's somewhat specialised but lets user-mode file servers
catch opens and closes.

Anyone in (l)unixland for a filesystem switch?  Research Unix had one
~20 years ago, so it should be mouldy (er, mature) enough to be
acceptable to (l)unixland.  Throw in mounts by ordinary users and use
of 9P as an unifying filesystem protocol (now pretty well aged in Plan
9), and it becomes possible to push lots of code and some hacks out of
the kernel, while permitting some new and interesting work.

[-- Attachment #2: Type: message/rfc822, Size: 6546 bytes --]

From: David Leimbach <leimy2k@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu>
Subject: Re: [9fans] Acme mailreader
Date: Thu, 16 Dec 2004 07:08:34 -0800
Message-ID: <3e1162e6041216070874f424e5@mail.gmail.com>

hmmm not sure where this all came from on this thread of discussion :).

On Wed, 15 Dec 2004 16:24:47 -0800, geoff@collyer.net <geoff@collyer.net> wrote:
> OS X is in no sense a micro-kernel.  The OS X kernel is huge:
> 
>         ; size /mach_kernel
>         __TEXT  __DATA  __OBJC  others  dec     hex
>         3022848 458752  0       643984  4125584 3ef390
> 
> and consists of a heavily-hacked Mach (3, I believe) kernel and a
> FreeBSD kernel (with bits from other BSDs), combined into a single
> kernel and running in a single address space.  The BSD kernel does not
> run in user mode.  Remember that Mach was, as far as I know, the
> largest ``micro-kernel'' ever produced, larger than most or all of its
> contemporary ``macro-kernels'', so that some of us called it a
> ``Machro-kernel''.

It's an OSF Mach3 with "optimizations" :).  The kernel is really
nothing like FreeBSD.  It's more like the BSD from NeXTStep with some
Free/Open/Net BSD stuff hacked in.  Also you are forgetting IOKit [the
C++ framework for device drivers].

The Apple marketing team is just putting rubbish on the internet when
it comes to claiming things are based on FreeBSD 5.

In fairness, some of the userland applications and command line tools
are, in fact, from FreeBSD but the amount of FreeBSD in XNU [the
kernel] and Darwin is exaggerated.  Porting things from FreeBSD 5,
however to Mac OS X is quite painful because you have to deal with
IOKit and the hardly FreeBSD-like bsd kernel portion.

> 
> I haven't looked very hard (one could check out the mount_* sources
> from the Darwin CVS servers), but mount(2) doesn't seem to have much
> that's new, except for union mounts, which surprised me.  I suspect
> that most of the mount_* commands either invoke kernel machinery
> (through the ``type'' argument to mount) or pretend to be NFS servers.
> I've never yet seen a (l)unix system other than late Research Unix
> that made user-mode file servers relatively easy and painless to write
> (though I'd love to be shown a counter-example!).  Of course, since
> many (l)unix systems only allow the super-user to mount anything,
> their maintainers may not see much utility in user-mode file servers.
> It's sort of a cascade of vision-failures.

Maybe because people don't know why Plan 9 is better than Unix they
thing Unix is "the way".  Religion often overrides common sense.  Do
we need more plan 9 "missionaries"? [probably]

DragonflyBSD is working on making the VFS a message passing layer
instead of a system call layer so doing something like 9p is probably
already in their grand scheme of development.

http://www.dragonflybsd.org/goals/vfsmodel.cgi

This doesn't help Mac OS X of course.

> 
> Also, /sys/src/cmd/upas/README is a little dated:
> 
>         --rw-rw-r-- M 5174 sys sys 1041 Dec 11  1999 README
> 
> I'm not sure if it pre-dates upas/fs, but it describes how to port the
> parts of upas that don't rely on Plan 9 facilities (transport more
> than reading).  I ported Plan 9's upas back to Unix while at the labs
> (and also translated it into limbo), but some parts (e.g., upas/fs)
> didn't have an obvious implementation, other than painfully pretending
> to be an NFS server, at least at the time.
> 

Might be interesting to see how DragonFlyBSD has come along and if
it's possible to implement upas/fs with whatever they've done.

Again this doesn't really help Mac OS X.

I just think it's interesting.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-16 23:22 ` geoff
@ 2004-12-17  4:55   ` Martin C.Atkins
  2004-12-17  9:54     ` Martin C.Atkins
  2004-12-17 15:44     ` Ronald G. Minnich
  0 siblings, 2 replies; 18+ messages in thread
From: Martin C.Atkins @ 2004-12-17  4:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hi all,

On Thu, 16 Dec 2004 15:22:59 -0800 geoff@collyer.net wrote:
>..
> However, Martin Atkins has revealed the mystery kernel agent: coda.
> Apparently it's somewhat specialised but lets user-mode file servers
> catch opens and closes.

I was going to write a longer message today, anyway :-).

Yes, the kernel-mode agent I use is the Coda filesystem driver, which
has been in the stock linux kernel for several years.

For those that don't already know: Coda is a remote filesystem that
copes (more or less well) with disconnection from, and reconnection
to the fileserver. Thus allowing clients to continue work in the
disconnected state. I'm not sure how successful it was at this - I've
never tried it - but it sounds like an interesting goal. This goal is
also shared by Intermezzo, which was (also) started by Peter Braam -
so presumably he felt Coda could be improved upon.

However, judging by the News pages on their web sites, more seems to
be happening with Coda, than with Intermezzo, recently.

Anyway, the interesting thing about both these systems, from the
point of view of this discussion, is that the real work is done in a
user-mode agent, which communicates with a kernel-mode stub driver.
In Coda the user-mode agent is, for some obscure reason, called Venus.

Thus it was only necessary to reverse-engineer the kernel module <->
Venus protocol. This was surprisingly easy: it is documented in some
detail on the Coda website, and Pavel Machek's podfuk had already
worked out some of the more obscure details (but this wasn't a
general-purpose library, didn't do some things I wanted, and was I
thought, messy). The Coda driver is stable enough that I don't
remember any hangs/etc, even while I was going through the
trial-and-error process!

The user-mode agent simply opens /dev/cfsN, for some N, and reads and
writes messages down to the kernel module. My library for this is, as
I said, about 1400 lines of Python. Trivial fileserver applications
can be as small as 10-20 lines, and faulty fileservers (or library)
never crash/hang the kernel (which is how things should be! :-). It
is also possible to kill the fileserver, and restart with minimal
side effects.

It would be easy to make libraries for other languages.

One curiosity, which is both an advantage, and a disadvantage
depending on what you want to do: The user-mode agent is not involved
in - does not even see - reads and writes. When a client opens a
file, the user-mode agent makes a file somewhere containing the
contents of the "virtual" file, opens it, and writes the file
descriptor down into the kernel. The kernel module returns this file
descriptor (more or less) to the client who reads and writes it as a
normal file, with no intervention of Coda. When the client closes the
file, the kernel driver tells the user-mode agent, which deals with
the (possibly new) contents of the file, and might then remove it
from the local filesystem.

The advantage of this is that reads/writes happen at the same speed
as reads and writes to local files. The disadvantages are that the
open has to make a local copy of the entire contents of the file -
even if it is very big - and can't process individual writes as
commands, as in common in Plan 9. The user-mode agent might also
have to read the file to work out what changed.

However, you can rather easily process open+write+close as a command
to the user-mode agent, or have a file whose contents are different
every time it is opened. (So you do open+read+close, to read a status
value, for example)

Ideally, I'd like the processing of open to be able to decide whether
to send a file descriptor down into the kernel, or to receive
read/write messages - this seems to have been in previous versions of
Coda - such is life!

Re: Intermezzo - as Ron pointed out, it's kernel driver could also
possibly be used for this purpose. However, it uses hooks into an
underlying journalled filessystem - a requirement that I couldn't
easily satisfy back when I started this work, and I wasn't sure that
the kernel-user space interface was so easy to apply to my purpose.
However, it might allow one to avoid the disadvantages mentioned in
the last paragraph. 

I've been meaning to opensource the library for a while now - but
I'd like to clean it up in a few places, before a proper release.
This interest might be the spur to make me get around to it...

>...
> 9), and it becomes possible to push lots of code and some hacks out of
> the kernel, while permitting some new and interesting work.

No disagreements there!

Martin
-- 
Martin C. Atkins			martin_ml@parvat.com
Parvat Infotech Private Limited		http://www.parvat.com{/,/martin}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17  4:55   ` [9fans] Acme mailreader - now: User mode filesystems in linux Martin C.Atkins
@ 2004-12-17  9:54     ` Martin C.Atkins
  2004-12-17 10:22       ` geoff
                         ` (3 more replies)
  2004-12-17 15:44     ` Ronald G. Minnich
  1 sibling, 4 replies; 18+ messages in thread
From: Martin C.Atkins @ 2004-12-17  9:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Of course, nothing I wrote in the previous message avoids (l)unix's
limitations regarding mount, etc. Nor does it allow users to write
their own fileservers (without opening up various security holes, and
having root access).

However, thinking over lunch, I realised that there is a way of doing
something quite nice (in the linux sense, if not the Plan 9 sense!) with
what we already have.

On login, each user starts a user-filesystem-daemon, which uses setuid
to create a /dev/cfsN, if necessary, opens it to start serving, and
mounts it on a conventional place: $HOME/mnt, say.

When the user runs a program that wants to serve a filesystem, it attaches to
the daemon, which creates $HOME/mnt/servicename, and forwards all
requests for this directory hierarchy to the program. Replies are
adjusted to be consistent with security - setuid bits removed, ownership
forced to the user, etc., and sent back to the kernel.
When the program terminates, the daemon cleans up, and removes servicename.

Advantages over just hacking mount to allow anyone to mount anything on $HOME/mnt:

1) We don't need a /dev/cfsN for every filesystem, just one for each
concurrent user. Also client fileserving programs don't have to
compete to allocate /dev/cfsN's - which would need some sort of
setuid - only the daemons do this.

2) The user-filesystem-daemon only has to run as root during initialisation,
everything else runs as the user.

3) The user-filesystem-daemon can enforce file ownership (as the user) in
the served directory hierarchy. It can also force off setuid bits, etc.
Furthermore, users can only attach their fileservers to their own daemons!
(A bit like per-process mount tables - of course, linux has this already, but
not in a very user-friendly form)

4) The user-filesystem-daemon can clean up when/if a fileserver crashes.

5) Knowledge of the Coda protocol could be limited to the daemon, and a
higher-level protocol used with the "real" fileservers. Thus we could move to
other kernel mechanisms (e.g. fuse) if/when they become available.

6) All user filesystems are under $HOME/mnt - symlinks could be used
from elsewhere. (or is this a disadvantage?)

Disadvantages:
1) greater overhead - each fileserving message has to make the extra hops
from daemon to fileserver program, and back.

2) Complexity?

3) Others...?

The same approach would probably also work (better, easier?) with
p9fs - but some of the advantages might already have been solved
there, in other ways.

What do people think?

Martin
-- 
Martin C. Atkins			martin_ml@parvat.com
Parvat Infotech Private Limited		http://www.parvat.com{/,/martin}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17  9:54     ` Martin C.Atkins
@ 2004-12-17 10:22       ` geoff
  2004-12-17 10:45         ` Martin C.Atkins
                           ` (3 more replies)
  2004-12-17 13:41       ` Derek Fawcus
                         ` (2 subsequent siblings)
  3 siblings, 4 replies; 18+ messages in thread
From: geoff @ 2004-12-17 10:22 UTC (permalink / raw)
  To: 9fans

Someone at the 9bof claimed that at least one of the BSDs already
permits users to mount things on any directory for which they have
write permission.  I suspect that the policy actually needs to be a
little stricter than that; you don't want people mounting
(system-wide) on /tmp.  Perhaps any directory that you own would make
more sense.  But we also heard that the maintainers of at least one of
the other BSDs or Linux have a religious aversion to users mounting
anything.  Certainly one would want to think through the interactions
of set-id and user mounts.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17 10:22       ` geoff
@ 2004-12-17 10:45         ` Martin C.Atkins
  2004-12-17 11:42         ` Andy Newman
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Martin C.Atkins @ 2004-12-17 10:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, 17 Dec 2004 02:22:22 -0800 geoff@collyer.net wrote:
> ...  But we also heard that the maintainers of at least one of
> the other BSDs or Linux have a religious aversion to users mounting
> anything.  ...

What was it you called it?... "a cascade of vision-failures." Very apt.

Martin
-- 
Martin C. Atkins			martin_ml@parvat.com
Parvat Infotech Private Limited		http://www.parvat.com{/,/martin}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17 10:22       ` geoff
  2004-12-17 10:45         ` Martin C.Atkins
@ 2004-12-17 11:42         ` Andy Newman
  2004-12-17 15:57           ` Ronald G. Minnich
  2004-12-17 12:30         ` Latchesar Ionkov
  2004-12-17 15:55         ` Ronald G. Minnich
  3 siblings, 1 reply; 18+ messages in thread
From: Andy Newman @ 2004-12-17 11:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

geoff@collyer.net wrote:
> Someone at the 9bof claimed that at least one of the BSDs already
> permits users to mount things

FreeBSD does, don't know about the others.

> Perhaps any directory that you own would make more sense.

Comments taken from mount (in vfs_syscalls.c)...

    /*
     * If the user is not root, ensure that they own the directory 
     * onto which we are attempting to mount.
     */

> Certainly one would want to think through the interactions
> of set-id and user mounts.

    /*
     * Silently enforce MNT_NOSUID and MNT_NODEV for non-root users
     */


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17 11:42         ` Andy Newman
@ 2004-12-17 15:57           ` Ronald G. Minnich
  0 siblings, 0 replies; 18+ messages in thread
From: Ronald G. Minnich @ 2004-12-17 15:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs



On Fri, 17 Dec 2004, Andy Newman wrote:

> > Certainly one would want to think through the interactions
> > of set-id and user mounts.
> 
>     /*
>      * Silently enforce MNT_NOSUID and MNT_NODEV for non-root users
>      */


that's what I did on 2.0.36, and the fact that you could not express the 
suid and other such nasty bits in the attributes of a 9p file made it 
pretyt easy to do that :-)

ron


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17 10:22       ` geoff
  2004-12-17 10:45         ` Martin C.Atkins
  2004-12-17 11:42         ` Andy Newman
@ 2004-12-17 12:30         ` Latchesar Ionkov
  2004-12-17 15:55         ` Ronald G. Minnich
  3 siblings, 0 replies; 18+ messages in thread
From: Latchesar Ionkov @ 2004-12-17 12:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

If you combine the restriction for mounting filesystems only on directories
you have write access to, with the (enforced) creation of private namespace
that Linux allows, mounting on /tmp is not a problem anymore.

	Lucho

On Fri, Dec 17, 2004 at 02:22:22AM -0800, geoff@collyer.net said:
> Someone at the 9bof claimed that at least one of the BSDs already
> permits users to mount things on any directory for which they have
> write permission.  I suspect that the policy actually needs to be a
> little stricter than that; you don't want people mounting
> (system-wide) on /tmp.  Perhaps any directory that you own would make
> more sense.  But we also heard that the maintainers of at least one of
> the other BSDs or Linux have a religious aversion to users mounting
> anything.  Certainly one would want to think through the interactions
> of set-id and user mounts.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17 10:22       ` geoff
                           ` (2 preceding siblings ...)
  2004-12-17 12:30         ` Latchesar Ionkov
@ 2004-12-17 15:55         ` Ronald G. Minnich
  3 siblings, 0 replies; 18+ messages in thread
From: Ronald G. Minnich @ 2004-12-17 15:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, 17 Dec 2004 geoff@collyer.net wrote:

> Someone at the 9bof claimed that at least one of the BSDs already
> permits users to mount things on any directory for which they have
> write permission. 

Russ Cox mentioned that NetBSD (??) allows you to mount on a directory 
you own. You have to own it. 

Still lotsa danger here. 

Linux has a religious aversion to users mounting things, and has since at 
least 1996 when I started asking them about this for a very early version 
of v9fs. They did not object to v9fs per se, just the concept of user 
mounts!

ron

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17  9:54     ` Martin C.Atkins
  2004-12-17 10:22       ` geoff
@ 2004-12-17 13:41       ` Derek Fawcus
  2004-12-17 14:42       ` Karl Magdsick
  2004-12-18  0:13       ` Tim Newsham
  3 siblings, 0 replies; 18+ messages in thread
From: Derek Fawcus @ 2004-12-17 13:41 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Dec 17, 2004 at 03:24:56PM +0530, Martin C.Atkins wrote:
> 2) The user-filesystem-daemon only has to run as root during initialisation,
> everything else runs as the user.
> 
> 3) The user-filesystem-daemon can enforce file ownership (as the user) in
> the served directory hierarchy. It can also force off setuid bits, etc.
> Furthermore, users can only attach their fileservers to their own daemons!
> (A bit like per-process mount tables - of course, linux has this already, but
> not in a very user-friendly form)

So while it's running,  I can use gdb to attach to it and get around any security
it's trying to enforce (turn setuid back on,  change ownership to root,  etc).

DF


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17  9:54     ` Martin C.Atkins
  2004-12-17 10:22       ` geoff
  2004-12-17 13:41       ` Derek Fawcus
@ 2004-12-17 14:42       ` Karl Magdsick
  2004-12-17 14:56         ` Russ Cox
  2004-12-18  0:13       ` Tim Newsham
  3 siblings, 1 reply; 18+ messages in thread
From: Karl Magdsick @ 2004-12-17 14:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> On login, each user starts a user-filesystem-daemon, which uses setuid
> to create a /dev/cfsN, if necessary, opens it to start serving, and
> mounts it on a conventional place: $HOME/mnt, say.

Why are the setuid meta-daemons needed?  You'll want security/sanity
checks within the kernel code anyway, so the main reason for the
setuid meta-daemons is creating(??/destroying??) device files.  It
seems that you could get around this by using one device file for
everyone, which brings us to...

Why N different devices?  The kernel can distinguish between open
filehandles for /dev/cfs and use an array, tree, or map of structs to
keep track of which file handle goes with which mount point.  (Yay,
flyweight design pattern!)  This prevents nastiness with trying to
find an unused device, creating devices on demand, and "garbage
collecting" unused device files.  (You wouldn't want a trillion unused
devices sitting around.)  The kernel can free associated structs when
filehandles close.  Having the meta-daemons garbage collecting unused
device files seems like trouble.

> When the user runs a program that wants to serve a filesystem, it attaches to
> the daemon, which creates $HOME/mnt/servicename, and forwards all
> requests for this directory hierarchy to the program. 

As mentioned in another post, you can prevent device files and setuid
executibles in non-root owned filesystems and allow any user process
to mount a fs on any mount point they own.

The kernel needs its own cleanup code (what if one of the per-user
filesystem meta-daemons crashes?) and security/sanity checks (only
allowing root-owned processes to open filehandles to the device).

Granted, using a map of N structs and 1 device instead of N devices
each with 1 struct does increase kernel complexity, but only by a
small amount.  The kernel needs its own security/sanity checks and
cleanup code even in the case of per-user setuid filesystem
meta-daemons.  Setuid per-user filesystem meta-daemons seem like they
would greatly increase user-space complexity (and duplicate
functionality that MUST also be in the kernel) and only minimally
reduce kernel complexity.

Don't get me wrong, I'm a big fan of micro/nanokernels.  I dual booted
BeOS 5 and L4-Linux on my desktop, back when the latest port to L4
user-space was Linux 2.2.(??20??).  I got a warm fuzzy feeling every
time my BeOS flakey 3Com NIC driver crashed and BeOS asked me if I
wanted to restart the network driver.  Buggy drivers weren't so fun,
but a system that easily recovers from driver crashes is just cool! 
(I had a UDFS CD that would cause fs driver crashes in both Linux
(kernel panic) and Win2k (BSOD).  I really wished for user-space fs
drivers on "normal" systems.)  After a few days of 2 L4-Linux system
lock-ups per day, I went back to using Linux as a regular kernel. 
(The diagnostic counters kept spinning, so it seemed that the Linux
server had locked up, not L4, but all of my processes depended on the
Linux server anyway.)

In any case, I'd love to see user-space filesystem (and device)
drivers on mainstream OSes.

I don't have much knowledge of/experience with Plan9, but I've read
that the system is designed so that it's very easy to port drivers
between user-space and kernel-space.  Is this correct?  In a
"standard" setup, how many of the drivers are (mostly) in user-space?

-Karl

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17 14:42       ` Karl Magdsick
@ 2004-12-17 14:56         ` Russ Cox
  0 siblings, 0 replies; 18+ messages in thread
From: Russ Cox @ 2004-12-17 14:56 UTC (permalink / raw)
  To: Karl Magdsick, Fans of the OS Plan 9 from Bell Labs

> [Lots of good arguments]

You're stuck with the operating system you have,
not the operating system you'd like to have.  If one were
designing the system from scratch one could always do
better.  Sadly this Coda discussion is about how to deal with
what's already available on Unix.

> I don't have much knowledge of/experience with Plan9, but I've read
> that the system is designed so that it's very easy to port drivers
> between user-space and kernel-space.  Is this correct?  

No, it's not.  It's not hard (I can't think of any system-level
programming task I'd characterize as "hard" using Plan 9)
but it's not trivial either.

> In a "standard" setup, how many of the drivers are (mostly) in user-space?

Anything that touches hardware is typically in the kernel, though
in the case of particularly complicated hardware (like vga and usb),
the kernel part just makes it possible for user-mode programs to
get at the hardware and do the complicated stuff.

On the other hand, file system drivers are typically outside the
kernel, and the network and graphics devices have moved back
and forth a few times.

Russ

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17  9:54     ` Martin C.Atkins
                         ` (2 preceding siblings ...)
  2004-12-17 14:42       ` Karl Magdsick
@ 2004-12-18  0:13       ` Tim Newsham
  2004-12-18  0:13         ` boyd, rounin
  3 siblings, 1 reply; 18+ messages in thread
From: Tim Newsham @ 2004-12-18  0:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> However, thinking over lunch, I realised that there is a way of doing
> something quite nice (in the linux sense, if not the Plan 9 sense!) with
> what we already have.
>
> On login, each user starts a user-filesystem-daemon, which uses setuid
> to create a /dev/cfsN, if necessary, opens it to start serving, and
> mounts it on a conventional place: $HOME/mnt, say.
[...]

I think it would be better to just implement the v9fs protocol
and let users mount it in a similar way as nfs.  The v9fs layer
in the kernel would simply communicate the kernel filesystem requests
over some pipe to another machine (or the same machine) in much
the same way as nfs requests get sent out.  The layer could support
options (or strictly enforce) to disable setuid bits and/or file
ownership.  A setuid mounting utility that enforced any no-setuid
options could allow users to perform mounts.  Userland filesystems
could be implemented by providing a service that adheres to
the v9fs.

Adding a seperate userland demon that proxies a filesystem
protocol (probably the same one) through to the kernel seems
like an uneeded layer of complexity.

It seems like there are a lot of projects out there that have
interest in providing userland filesystems.  They typically use
nfs because its the easiest vector into the existing infrastructure.
V9fs would definitely fill a useful niche.

> 2) The user-filesystem-daemon only has to run as root during initialisation,
> everything else runs as the user.

A network daemon talking v9fs shouldn't impose any ownership
restrictions.  The restrictions should be imposed at mount time.

> 3) The user-filesystem-daemon can enforce file ownership (as the user) in
> the served directory hierarchy. It can also force off setuid bits, etc.
> Furthermore, users can only attach their fileservers to their own daemons!
> (A bit like per-process mount tables - of course, linux has this already, but
> not in a very user-friendly form)

This can be done in-kernel.  In fact, there are already options in
nfs to neuter suid bits, device nodes and provide user mappings.

> 5) Knowledge of the Coda protocol could be limited to the daemon, and a
> higher-level protocol used with the "real" fileservers. Thus we could move to
> other kernel mechanisms (e.g. fuse) if/when they become available.

V9fs already provides a useful protocol.

> 6) All user filesystems are under $HOME/mnt - symlinks could be used
> from elsewhere. (or is this a disadvantage?)

Restrictions as to where mount points could be placed could be
put in any setuid binary that mediates user mounts.

Tim N.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-18  0:13       ` Tim Newsham
@ 2004-12-18  0:13         ` boyd, rounin
  2004-12-18  3:49           ` Ronald G. Minnich
  0 siblings, 1 reply; 18+ messages in thread
From: boyd, rounin @ 2004-12-18  0:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I think it would be better to just implement the v9fs protocol
> and let users mount it in a similar way as nfs.

been there, done that.  nfs is no fun.  you have to get right up into the vfs layer.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-18  0:13         ` boyd, rounin
@ 2004-12-18  3:49           ` Ronald G. Minnich
  2004-12-23 16:04             ` boyd, rounin
  0 siblings, 1 reply; 18+ messages in thread
From: Ronald G. Minnich @ 2004-12-18  3:49 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 18 Dec 2004, boyd, rounin wrote:

> > I think it would be better to just implement the v9fs protocol
> > and let users mount it in a similar way as nfs.
> 
> been there, done that.  nfs is no fun.  you have to get right up into
> the vfs layer.

ah, boyd, you gotta read what he's saying cause it's what we want.  All
Tim is saying is "let's just go 9p2000 from Linux VFS layer to userland
servers". This is good. And since v9fs does that, and it's basically there 
on 2.6, it's the right way to go. 

ron

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-18  3:49           ` Ronald G. Minnich
@ 2004-12-23 16:04             ` boyd, rounin
  0 siblings, 0 replies; 18+ messages in thread
From: boyd, rounin @ 2004-12-23 16:04 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> ah, boyd, you gotta read what he's saying cause it's what we want.

that's what i meant.  i wasn't being explicit enough.  if you stick into the VFS
(which i looked at once for the ULTRIX GFS, but it was too horrible) then
everything is cool.

btw: in brussels on my way back from Twente.  gotta find me a prepaid WiFi login.  i have some photos i'll stick 'em up some point.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17  4:55   ` [9fans] Acme mailreader - now: User mode filesystems in linux Martin C.Atkins
  2004-12-17  9:54     ` Martin C.Atkins
@ 2004-12-17 15:44     ` Ronald G. Minnich
  2004-12-18 12:35       ` Martin C.Atkins
  1 sibling, 1 reply; 18+ messages in thread
From: Ronald G. Minnich @ 2004-12-17 15:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, 17 Dec 2004, Martin C.Atkins wrote:

> 
> For those that don't already know: Coda is a remote filesystem that
> copes (more or less well) with disconnection from, and reconnection
> to the fileserver. Thus allowing clients to continue work in the
> disconnected state. I'm not sure how successful it was at this - I've
> never tried it - but it sounds like an interesting goal. This goal is
> also shared by Intermezzo, which was (also) started by Peter Braam -
> so presumably he felt Coda could be improved upon.

As peter used to put it, 5KLOC (intermezzo) was in his mind better than 
500KLOC (coda). He brought both file systems to fruition. 

> However, judging by the News pages on their web sites, more seems to
> be happening with Coda, than with Intermezzo, recently.

yeah, intermezzo limped along for 5 years, never quite worked, then died. 
But the kernel->user interface of intermezzo is perfectly usable. Sounds 
like you've gotten far with coda, so this is just an FYI.

I only know a bit about this because I did a lot of work with imezzo early 
in the game, and got to the point where I could boot a linux node with 
imezzo as the root file system. That was interesting.

ron

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [9fans] Acme mailreader - now: User mode filesystems in linux
  2004-12-17 15:44     ` Ronald G. Minnich
@ 2004-12-18 12:35       ` Martin C.Atkins
  0 siblings, 0 replies; 18+ messages in thread
From: Martin C.Atkins @ 2004-12-18 12:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Gosh - lots of comments..

On Fri, 17 Dec 2004 08:44:52 -0700 (MST) "Ronald G. Minnich" <rminnich@lanl.gov> wrote:
> > so presumably he felt Coda could be improved upon.
> 
> As peter used to put it, 5KLOC (intermezzo) was in his mind better than 
> 500KLOC (coda). He brought both file systems to fruition. 

Good reason! But the kernel modules were only a small fraction of
that total size, right?

> yeah, intermezzo limped along for 5 years, never quite worked, then died. 
> But the kernel->user interface of intermezzo is perfectly usable. Sounds 
> like you've gotten far with coda, so this is just an FYI.

Pity abut Intermezzo - trying it (for it's designed purpose) was on
my todo list...

Thanks for your assessment of the Intermezzo kernal->user interface.
I've often wondered if I could have been used it for my driver
itstead of coda, but like you say, I've gotten thus far with Coda...

> in the game, and got to the point where I could boot a linux node with 
> imezzo as the root file system. That was interesting.

Sounds fun!

On Fri, 17 Dec 2004 13:41:09 +0000 Derek Fawcus <dfawcus@cisco.com> wrote:
> > 3) The user-filesystem-daemon can enforce file ownership (as the user) in
> > the served directory hierarchy. It can also force off setuid bits, etc.
> > Furthermore, users can only attach their fileservers to their own daemons!
> > (A bit like per-process mount tables - of course, linux has this already, but
> > not in a very user-friendly form)
> 
> So while it's running,  I can use gdb to attach to it and get around any security
> it's trying to enforce (turn setuid back on,  change ownership to root,  etc).

Good point! I guess it would have to stay setuid to avoid that - pity. Is there
another way of stopping gdb, et al? I see that EPERM on ptrace can be caused if
the debuggee is already being debugged. Is there any way that a program can
ptrace itself, just to stop anyone else from debugging it? (1/2 :-))

On Fri, 17 Dec 2004 09:42:15 -0500 Karl Magdsick <kmagnum@gmail.com> wrote:
> > On login, each user starts a user-filesystem-daemon, which uses setuid
> > to create a /dev/cfsN, if necessary, opens it to start serving, and
> > mounts it on a conventional place: $HOME/mnt, say.
> 
> Why are the setuid meta-daemons needed?  You'll want security/sanity
> checks within the kernel code anyway, so the main reason for the
> setuid meta-daemons is creating(??/destroying??) device files.  It
> seems that you could get around this by using one device file for
> everyone, which brings us to...
> 
> Why N different devices?  The kernel can distinguish between open
> filehandles for /dev/cfs and use an array, tree, or map of structs to
> [ plus more good points]

Two reasons:
1) I was trying to see how far we could get without having to change *anything*
in the stock linux kernel distribution. This makes distributing the result
much easier...
2) When you mount a coda device, you are talking to the fileserver that
opened that coda device. If you multiplexed the device, you would have
to have some other way of saying *which* user-mode fileserver you were
trying to mount.

Otherwise, I agree.

> As mentioned in another post, you can prevent device files and setuid
> executibles in non-root owned filesystems and allow any user process
> to mount a fs on any mount point they own.

Yes, and these mechanisms are [probably best.

> The kernel needs its own cleanup code (what if one of the per-user
> filesystem meta-daemons crashes?) and security/sanity checks (only

If the per-user filesystem meta-daemons crashes, then, like I said,
with coda, the kernel is pretty safe already. However I'd like the user's
view of things to be cleaned up too.

and:
On Fri, 17 Dec 2004 14:13:35 -1000 (HST) Tim Newsham <newsham@lava.net> wrote:
> I think it would be better to just implement the v9fs protocol
> and let users mount it in a similar way as nfs.  The v9fs layer

In the future, I totally agree. Unfortunately my requirement was for
something that would work with all the linux systems already out
there.

> Adding a seperate userland demon that proxies a filesystem
> protocol (probably the same one) through to the kernel seems
> like an uneeded layer of complexity.

So it now seems - I was just thinking aloud, and the result was
interesting, and informative for me.

> It seems like there are a lot of projects out there that have
> interest in providing userland filesystems.  They typically use
> nfs because its the easiest vector into the existing infrastructure.
> V9fs would definitely fill a useful niche.

Yes. The lack of close messages (until recently) in nfs, certainly
restricted its usefulness for this purpose though.

> V9fs already provides a useful protocol.

No disagreements there, however there have already been discussions
in other threads pointing out that V9fs doesn't (yet?) have messages
for linux things that Plan 9 doesn't have - such as symlinks.

> Restrictions as to where mount points could be placed could be
> put in any setuid binary that mediates user mounts.

I was trying to get around the restriction that user mounts could
only be under $HOME/mnt, imposed by my scheme. Given access to a more
general mount, I don't see any reason to restrict it unnecessarily!

Thanks all for interesting feedback!

Martin

-- 
Martin C. Atkins			martin_ml@parvat.com
Parvat Infotech Private Limited		http://www.parvat.com{/,/martin}

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-12-23 16:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-17 15:31 [9fans] Acme mailreader - now: User mode filesystems in linux bmaroshe
  -- strict thread matches above, loose matches on Subject: below --
2004-12-16 15:08 [9fans] Acme mailreader David Leimbach
2004-12-16 23:22 ` geoff
2004-12-17  4:55   ` [9fans] Acme mailreader - now: User mode filesystems in linux Martin C.Atkins
2004-12-17  9:54     ` Martin C.Atkins
2004-12-17 10:22       ` geoff
2004-12-17 10:45         ` Martin C.Atkins
2004-12-17 11:42         ` Andy Newman
2004-12-17 15:57           ` Ronald G. Minnich
2004-12-17 12:30         ` Latchesar Ionkov
2004-12-17 15:55         ` Ronald G. Minnich
2004-12-17 13:41       ` Derek Fawcus
2004-12-17 14:42       ` Karl Magdsick
2004-12-17 14:56         ` Russ Cox
2004-12-18  0:13       ` Tim Newsham
2004-12-18  0:13         ` boyd, rounin
2004-12-18  3:49           ` Ronald G. Minnich
2004-12-23 16:04             ` boyd, rounin
2004-12-17 15:44     ` Ronald G. Minnich
2004-12-18 12:35       ` Martin C.Atkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).