From mboxrd@z Thu Jan  1 00:00:00 1970
Message-Id: <200207110200.WAA26141@math.psu.edu>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] blanks in file names 
In-Reply-To: Your message of "Wed, 10 Jul 2002 19:01:29 EDT."
             <200207102301.g6AN1UV00727@dave2.dave.tj> 
From: Dan Cross <cross@math.psu.edu>
Date: Wed, 10 Jul 2002 22:00:41 -0400
Topicbox-Message-UUID: c8b900b4-eaca-11e9-9e20-41e7f4b1d025

> > I don't think it would be simpler; I think it would be more
> > complicated.  You're replacing a simple, textual representation of an
> > object with a binary representation; you have to have some way to do
> > canonicalization in the common case, but even that path is thrwat with
> > danger.
> 
> Manipulating text with all sorts of dynamic buffers is substantially
> more complicated than simply replacing a node in a linked list.
> The canonicalization is all being done by the kernel, or a library.

How could this possibly be in the kernel?  After all, you're talking
about changing the interface to open a file; I pass a file name via
some mechanism to a user level application that wants to call open on
it.  What's it supposed to do?  Does the shell now pass a linked list
as an argument to main somehow?  How does the system know that it's a
file?  Do we have to replace the argument vector with some more complex
representation that encapsulates type information (e.g., this argument
is a file, this next one is a string, etc)?  Does the shell change to
represent file names as lists?  Does the user suffer the indignation of
having to specify a list of path components to represent a file?  Or do
we provide a canonicalization library for shell arguments, in which
case, you have the exact same problem as supporting spaces now, since
most programs are going to expect to get file name arguments in the
canonical representation?  If you do that, who calls it?  The shell or
the library?

I for one am going to be *very* unhappy if I have to type:

	cat ('' 'usr' 'cross' 'file')

Instead of:

	cat /usr/cross/file

Or do you make every program that wants to open a file call a function
to canonicalize a filename into the internal format before it calls
open?

> > But they change an already well-established interface.  Have you
> > thought through the implications of this, in all their macabre glory?
> > What you propose--changing the most basic interface for opening a file
> > in a system where everything looks more or less like a file--has huge
> > implications.  And all this just to support a strange edge-case, which
> > is adequately solved by substitutions in the filename.  Sure, it's not
> > perfect in some weird pathological case, but how often is this going to
> > come up in practice?  Remember: Optimize for the common case.
> 
> Optimization for the common case is good, but creating a system where the
> uncommon case will cause major mayhem at the system level is evidence
> of a very unclean approach.  (When you consider the reasoning behind
> the problem (namely, spaces and slashes in filenames kill our ability
> to seperate nodes easily), it makes perfect sense that our solution
> isn't very clean.  The only clean solution is to restore the ancient
> UNIX ideal of being able to easily seperate nodes.  In other words,
> either kill spaces altogether and damn interoperability, or promote
> spaces to full citizenship.)

But Plan 9 can handle this.

One of the beautiful things about Plan 9 is that it provides a solution
that's workable with little effort.  The various substitution file
systems provide a workable solution without introducing any additional
complexity.  If you want a total--100% complete--solution, then a
`urlifyfs' can be written that uses URL escaping as a canonical
representation, or something similar.  The system interface doesn't
have to be changed, though.  *That* is the mark of a clean system
design.

The Unix `ideal' was eliminated because it was overly complex, without
a commensurate gain in functionality.  Besides, the inode system didn't
really fit in well with the idea of 9p.

> > > There's plenty of experience with other systems working on linked lists
> > > (including a huge amount of kernel code in my Linux box that I'm typing
> > > from, ATM).  Most of the problems with linked lists have been pretty
> > > well documented, by now.
> > 
> > It's the huge amount of kernel code that Plan 9 is trying to avoid.
> 
> String manipulation is more complex than linked list manipulation.

No, it's really not.  Consider passing a linked list as an argument to
a function you're calling, versus passing an argument vector of
strings.  How do you do that?  Do you muck with all the C startup code
to make sure you get the linking and so right in such a way that the
list is in a contiguous memory block so it doesn't get stomped by the
image read by exec?  Do you pass each node in the list to main as a
seperate string in the argument vector?  If so, how do you tell when
a file name ends and another begins?  Do we introduce some convention
for delineating the beginning and ends of a filename in a list
representation, effectively creating a protocol that every program has
to follow to take a filename as an argument?  Surely the former option
is significantly easier....

Consider a possible canonicalization routine that might be used in
a substitution FS:

char *
canonical(char *str)
{
	char	*p, *s, *t;
 
	if (str == nil || (p = malloc(2 * strlen(str) + 1)) == nil) {
		return(nil);
	}
	for (s = str, t = p; *s != '\0'; s++, t++) {
		if (isspace(*s)) {
			*t++ = '+';	/*  Or whatever.  */
			*t = '2';
			continue;
		}
		*t = *s;
	}
	if ((s = realloc(p, strlen(p) + 1)) == nil) {
		free(p);
	}
 
	return(s);
}

That's pretty straight-forward; just inserting into a linked list
would be just as hard.  Doing so in a contiguous memory block would
be, I think harder (you'd have to step over the list, keep a count
of how much memory you needed, then allocate the list, copy each
node and set the links.  That's a pain).

> > Being forced to conform to a lot of external interfaces *will* kill the
> > system.
> 
> I don't dispute that point, but the interface I propose is most unlike
> any other interface currently known to man (not trying to conform to any
> external interface).  I'm simply pointing out that failing to provide
> at least a 1-1 mapping with capabilities that are already widely used
> in external systems that must interoperate with ours *will* kill us.

Well, if you *really* want 100% 1 to 1 mappings, use the URL encoding
others have mentioned, or something similar.  As it is, it seems that
this mostly works; about 80% of what's needed is there.

> > Besides, the point Nemo was trying to make umpteen posts ago was that,
> > yes, you can roll back changes using the dump filesystem, which gives
> > you temporal mobility.  He is right.
> 
> You can do a lot of things if you're prepared to get involved in the
> functions that your OS should be doing automatically.  Try running an FTP
> mirror to a busy site that way, though, and you'll quickly discover why
> automation is a good thing.  The worst part about our system is that the
> "solution" you eventually find for an FTP mirror will be useless on an
> HTTP proxy.  When "solutions" need to be modified for each individual
> application, you know that the system isn't clean.

Yesterday is a wonderful tool, and can be scripted to do whatever you
want.  Eg, copying all files that changed on June 14th back to the
cache isn't very diffcult.

I don't see what running a big FTP mirror has to do with it.  netlib is
a big FTP site; it runs on Plan 9.  Maybe it's not a mirror, but so what?
I also don't see how you can't leverage whatever you did for FTP with
HTTP.  The substitution-style FS gives you a *lot* of flexibility in this
area.

	- Dan C.