9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Dave <dave@dave.tj>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] blanks in file names
Date: Thu, 11 Jul 2002 02:14:59 -0400	[thread overview]
Message-ID: <200207110614.g6B6ExM18574@dave2.dave.tj> (raw)
In-Reply-To: <200207110200.WAA26141@math.psu.edu>

Reply inline:

 - Dave

Dan Cross wrote:
> 
> > > I don't think it would be simpler; I think it would be more
> > > complicated.  You're replacing a simple, textual representation of an
> > > object with a binary representation; you have to have some way to do
> > > canonicalization in the common case, but even that path is thrwat with
> > > danger.
> > 
> > Manipulating text with all sorts of dynamic buffers is substantially
> > more complicated than simply replacing a node in a linked list.
> > The canonicalization is all being done by the kernel, or a library.
> 
> How could this possibly be in the kernel?  After all, you're talking
> about changing the interface to open a file; I pass a file name via
> some mechanism to a user level application that wants to call open on
> it.  What's it supposed to do?  Does the shell now pass a linked list
> as an argument to main somehow?  How does the system know that it's a
> file?  Do we have to replace the argument vector with some more complex
> representation that encapsulates type information (e.g., this argument
> is a file, this next one is a string, etc)?  Does the shell change to
> represent file names as lists?  Does the user suffer the indignation of
> having to specify a list of path components to represent a file?  Or do
> we provide a canonicalization library for shell arguments, in which
> case, you have the exact same problem as supporting spaces now, since
> most programs are going to expect to get file name arguments in the
> canonical representation?  If you do that, who calls it?  The shell or
> the library?
That's an interesting point I didn't quite consider ... we'll have to
change the exec interface a lot more than I suspected at first glance.
I wasn't planning that until much later, because it'll require very
fundamental changes to the shell.  (/me hates proposing incremental
changes, because they invariably depend on other fundamental changes in
order for people to see their utility.)

> 
> I for one am going to be *very* unhappy if I have to type:
> 
> 	cat ('' 'usr' 'cross' 'file')
> 
> Instead of:
> 
> 	cat /usr/cross/file
> 
> Or do you make every program that wants to open a file call a function
> to canonicalize a filename into the internal format before it calls
> open?
My image of a shell is a user interface.  It should translate
all output from programs into a format that's easy for a human to
understand, and should offer to translate data entered by the user
from an easy-for-a-human-to-input format into the machine format.
If you want to print /"usr"/"cross"/"file", you should be able to
type something like "cat /usr/cross/file" and have the shell translate
that into the collection of lists (/usr/bin/cat and /usr/cross/file)
required for the underlying system.  The shell should also translate
the output of the ls command, for instance, so it prints filenames
in an easy-for-humans-to-understand format.  The ls command, though,
should only print filenames in an easy-for-machine-to-understand
format.  Basically, the shell is the bidirectional translator between
computer-speak and human-speak.  That's it's raison d'étre (spelling?).
Getting the kernel away from plain text doesn't mean getting the shell
away from plain text.  The shell can choose to support any method(s)
it wants to represent filenames in an "easy-for-machine-to-understand"
format, since it'll be converting the filenames into linked lists for
the kernel.  Utilities like find or ls or whatever output filenames in
a format that your shell can read.  (I envision an rc file supplied by
the shell to let other programs know what formats it supports.)

> 
> > > But they change an already well-established interface.  Have you
> > > thought through the implications of this, in all their macabre glory?
> > > What you propose--changing the most basic interface for opening a file
> > > in a system where everything looks more or less like a file--has huge
> > > implications.  And all this just to support a strange edge-case, which
> > > is adequately solved by substitutions in the filename.  Sure, it's not
> > > perfect in some weird pathological case, but how often is this going to
> > > come up in practice?  Remember: Optimize for the common case.
> > 
> > Optimization for the common case is good, but creating a system where the
> > uncommon case will cause major mayhem at the system level is evidence
> > of a very unclean approach.  (When you consider the reasoning behind
> > the problem (namely, spaces and slashes in filenames kill our ability
> > to seperate nodes easily), it makes perfect sense that our solution
> > isn't very clean.  The only clean solution is to restore the ancient
> > UNIX ideal of being able to easily seperate nodes.  In other words,
> > either kill spaces altogether and damn interoperability, or promote
> > spaces to full citizenship.)
> 
> But Plan 9 can handle this.
> 
> One of the beautiful things about Plan 9 is that it provides a solution
> that's workable with little effort.  The various substitution file
> systems provide a workable solution without introducing any additional
> complexity.  If you want a total--100% complete--solution, then a
> `urlifyfs' can be written that uses URL escaping as a canonical
> representation, or something similar.  The system interface doesn't
> have to be changed, though.  *That* is the mark of a clean system
> design.
The only way to have the urlifyfs concept providing a 100% complete
solution is to use it as the default filesystem for your own stuff.
The reason?  imagine downloading a file "blah%apos;" from an FTP server;
now, you download a file "blah'" from an FTP server (which your urlifyfs
faithfully translates into "blah%apos;" without realizing that it's
destroying a different file).  Guess what?  You've just clobbered your
original.  Now, if you're going to use urlifyfs for your own stuff
on your Plan 9 system, you're going to have to deal with the same
shell-interaction issues that my system has to deal with.  The only
difference is that my system doesn't break if somebody forgets to use
urlifyfs on a new filesystem, because my system moves text representation
of filenames over to the shell, where it belongs, rather than dumping
that burden on a filesystem translation hack.

> 
> The Unix `ideal' was eliminated because it was overly complex, without
> a commensurate gain in functionality.  Besides, the inode system didn't
> really fit in well with the idea of 9p.
> 
> > > > There's plenty of experience with other systems working on linked lists
> > > > (including a huge amount of kernel code in my Linux box that I'm typing
> > > > from, ATM).  Most of the problems with linked lists have been pretty
> > > > well documented, by now.
> > > 
> > > It's the huge amount of kernel code that Plan 9 is trying to avoid.
> > 
> > String manipulation is more complex than linked list manipulation.
> 
> No, it's really not.  Consider passing a linked list as an argument to
> a function you're calling, versus passing an argument vector of
> strings.  How do you do that?  Do you muck with all the C startup code
> to make sure you get the linking and so right in such a way that the
> list is in a contiguous memory block so it doesn't get stomped by the
> image read by exec?  Do you pass each node in the list to main as a
> seperate string in the argument vector?  If so, how do you tell when
> a file name ends and another begins?  Do we introduce some convention
> for delineating the beginning and ends of a filename in a list
> representation, effectively creating a protocol that every program has
> to follow to take a filename as an argument?  Surely the former option
> is significantly easier....
This is only true with our current exec family, which has been essentially
carried over unchanged from UNIX.  It's based on strings, not on lists.
IMHO, arguments should be objects.  Those objects can be filenames,
options with or without arguments of their own, subcommands, just plain
strings, etc.  This makes arguments a lot more representitive of what
they actually are, and eliminates the need for complex argument-handling
libraries.  Obviously, this whole change can be totally transparent to
the user, because his shell is doing the necessary translations back
and forth.  However, you get an extremely powerful system as the payoff,
a system that makes it rather easy to reimplement all our current syscalls
as tiny library functions, possibly in an emulation library.

> 
> Consider a possible canonicalization routine that might be used in
> a substitution FS:
> 
> char *
> canonical(char *str)
> {
> 	char	*p, *s, *t;
>  
> 	if (str == nil || (p = malloc(2 * strlen(str) + 1)) == nil) {
> 		return(nil);
> 	}
> 	for (s = str, t = p; *s != '\0'; s++, t++) {
> 		if (isspace(*s)) {
> 			*t++ = '+';	/*  Or whatever.  */
> 			*t = '2';
> 			continue;
> 		}
> 		*t = *s;
> 	}
> 	if ((s = realloc(p, strlen(p) + 1)) == nil) {
> 		free(p);
> 	}
>  
> 	return(s);
> }
> 
> That's pretty straight-forward; just inserting into a linked list
> would be just as hard.  Doing so in a contiguous memory block would
> be, I think harder (you'd have to step over the list, keep a count
> of how much memory you needed, then allocate the list, copy each
> node and set the links.  That's a pain).
strlen() is an expensive operation.  realloc() sucks in a multithreaded
environment.  To top it all off, that algorithm doesn't take into account
the expansion which is ABSOLUTELY NECESSARY in order to achieve 100%
coverage.  (If you're not going to achieve a 1-1 mapping, it's silly to
even bother with this.)  Also, I'd like to mention again that I'm not
asking the kernel to allocate memory.  The userland program provides
a block of memory, and the kernel manipulates that block, returning an
error if the block is too small.

> 
> > > Being forced to conform to a lot of external interfaces *will* kill the
> > > system.
> > 
> > I don't dispute that point, but the interface I propose is most unlike
> > any other interface currently known to man (not trying to conform to any
> > external interface).  I'm simply pointing out that failing to provide
> > at least a 1-1 mapping with capabilities that are already widely used
> > in external systems that must interoperate with ours *will* kill us.
> 
> Well, if you *really* want 100% 1 to 1 mappings, use the URL encoding
> others have mentioned, or something similar.  As it is, it seems that
> this mostly works; about 80% of what's needed is there.
URL encoding _will_ work if it's implemented right (except for the
uncleanliness I mentioned above, and some more problems I mention below).
However, using URL encoding makes the resulting system just as ugly as
the one I'm proposing from the user's perspective, but much much uglier
from a system perspective.

> 
> > > Besides, the point Nemo was trying to make umpteen posts ago was that,
> > > yes, you can roll back changes using the dump filesystem, which gives
> > > you temporal mobility.  He is right.
> > 
> > You can do a lot of things if you're prepared to get involved in the
> > functions that your OS should be doing automatically.  Try running an FTP
> > mirror to a busy site that way, though, and you'll quickly discover why
> > automation is a good thing.  The worst part about our system is that the
> > "solution" you eventually find for an FTP mirror will be useless on an
> > HTTP proxy.  When "solutions" need to be modified for each individual
> > application, you know that the system isn't clean.
> 
> Yesterday is a wonderful tool, and can be scripted to do whatever you
> want.  Eg, copying all files that changed on June 14th back to the
> cache isn't very diffcult.
Yesterday can't be used to update the relative references in all the
README files in the FTP archives to the urlified versions.

> 
> I don't see what running a big FTP mirror has to do with it.  netlib is
> a big FTP site; it runs on Plan 9.  Maybe it's not a mirror, but so what?
Since it's not a mirror, it doesn't have to contend with all the spaceful
filenames you find in the non-Plan9 world.

> I also don't see how you can't leverage whatever you did for FTP with
> HTTP.  The substitution-style FS gives you a *lot* of flexibility in this
> area.
What you did in FTP was scanning the README files for references.
What you do in HTTP is updating all the href and src attributes in HTML
files (and hope none of the JAVA programs have embedded URLs that you
can't change at all), so you don't get broken links everywhere.

...unless you want to implement the transformation/detransformation
code in the FTP and HTTP servers, as well ... in which case your
system becomes one step worse than my system, because you have
transformation/detransformation code in two places on your system :-(

> 
> 	- Dan C.
> 



  reply	other threads:[~2002-07-11  6:14 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-07-08  8:59 Fco.J.Ballesteros
     [not found] ` <Fco.J.Ballesteros@Jul>
2002-07-08 20:18   ` Dave
2002-07-09 15:23   ` Dave
2002-07-09 15:29   ` [9fans] acme question + diffs for kfs, fs and plumbing Dave
2002-07-10 15:57   ` Dave
2002-07-10 16:02   ` [9fans] blanks in file names Dave
2002-07-10 20:59     ` FJ Ballesteros
2002-07-10 21:51       ` Dave
2002-07-10 22:22         ` Dan Cross
2002-07-10 23:01           ` Dave
2002-07-11  2:00             ` Dan Cross
2002-07-11  6:14               ` Dave [this message]
2002-07-11  6:38                 ` Lucio De Re
2002-07-14 18:00                   ` Dave
2002-07-11 13:14                 ` arisawa
2002-07-12 12:28                   ` arisawa
2002-07-11 16:23                 ` Dan Cross
2002-07-11 10:43             ` Ish Rattan
2002-07-14 18:49               ` Dave
  -- strict thread matches above, loose matches on Subject: below --
2002-07-15  4:03 Geoff Collyer
2002-07-15 14:53 ` Jack Johnson
2002-07-21 19:52   ` Dave
2002-07-21 19:47 ` Dave
2002-07-14 18:28 rob pike, esq.
2002-07-11 23:56 okamoto
2002-07-11  8:39 Geoff Collyer
2002-07-14 18:13 ` Dave
2002-07-11  5:50 okamoto
2002-07-11  1:41 anothy
     [not found] ` <"anothy@cosym.net"@Jul>
2002-07-11  6:47   ` Dave
2002-07-10 18:27 David Gordon Hogan
2002-07-10 20:56 ` arisawa
2002-07-10  8:00 Fco.J.Ballesteros
2002-07-10  7:57 [9fans] acme question + diffs for kfs, fs and plumbing Fco.J.Ballesteros
2002-07-09  8:42 Fco.J.Ballesteros
2002-07-09  9:28 ` Lucio De Re
2002-07-09 11:23   ` andrey mirtchovski
2002-07-09 12:05     ` matt
2002-07-09  7:54 [9fans] blanks in file names Fco.J.Ballesteros
2002-07-09  7:50 [9fans] acme question + diffs for kfs, fs and plumbing Fco.J.Ballesteros
2002-07-09  8:15 ` Lucio De Re
2002-07-09  8:42   ` arisawa
2002-07-09  9:21     ` Lucio De Re
2002-07-09  9:43       ` arisawa
2002-07-09 10:36         ` Lucio De Re
2002-07-09 10:54           ` matt
2002-07-09 11:01             ` Liberating the filename (Was: [9fans] acme question + diffs for kfs, fs and plumbing) Lucio De Re
2002-07-09 11:07               ` arisawa
2002-07-11 14:57                 ` Liberating the filename (Was: [9fans] acme question + diffs forkfs, Douglas A. Gwyn
2002-07-09  8:22 ` [9fans] acme question + diffs for kfs, fs and plumbing arisawa
2002-07-09  1:08 [9fans] blanks in file names okamoto
2002-07-08 23:19 David Gordon Hogan
2002-07-08 23:30 ` Dave
2002-07-08 20:22 rob pike, esq.
2002-07-08 21:21 ` Dave
2002-07-08 23:27   ` Dan Cross
2002-07-08 23:30     ` Dan Cross
2002-07-08 12:18 forsyth
     [not found] ` <"forsyth@caldo.demon.co.uk"@Jul>
2002-07-08 20:42   ` Dave
2002-07-08  0:38 Scott Schwartz
2002-07-07  5:59 Geoff Collyer
2002-07-05 19:21 David Gordon Hogan
2002-07-05 19:52 ` Jim Choate
2002-07-05 20:10 ` Mark Bitting
2002-07-05 18:26 Sape Mullender
2002-07-05 18:23 David Gordon Hogan
2002-07-05  1:21 okamoto
     [not found] <20020703160003.27491.58783.Mailman@psuvax1.cse.psu.edu>
2002-07-04 23:35 ` Andrew Simmons
2002-07-04 22:42   ` Sam
2002-07-04 22:44     ` Sam
2002-07-08 16:14   ` ozan s yigit
2002-07-04 12:26 Fco.J.Ballesteros
2002-07-04 12:20 forsyth
2002-07-04 11:37 Fco.J.Ballesteros
2002-07-04 11:36 rog
2002-07-04  9:50 Fco.J.Ballesteros
2002-07-04  9:41 forsyth
2002-07-04  8:31 Fco.J.Ballesteros
2002-07-04  8:22 forsyth
2002-07-04  7:53 Fco.J.Ballesteros
2002-07-04  7:47 Fco.J.Ballesteros
2002-07-04  6:34 forsyth
2002-07-04  7:39 ` Lucio De Re
2002-07-04  9:32   ` Nikolai SAOUKH
2002-07-03  8:00 Fco.J.Ballesteros
2002-07-03 12:00 ` Lucio De Re
2002-07-03 19:39   ` rob pike, esq.
2002-07-07  4:02     ` Dave
2002-07-07  5:17       ` arisawa
     [not found]         ` <"arisawa@ar.aichi-u.ac.jp"@Jul>
2002-07-07  5:38           ` Dave
2002-07-07  6:04             ` arisawa
2002-07-07  7:16               ` arisawa
2002-07-07 16:11           ` Dave
2002-07-07 16:12           ` Dave
2002-07-09 15:31           ` [9fans] acme question + diffs for kfs, fs and plumbing Dave
2002-07-09 22:15             ` arisawa
2002-07-10 21:58           ` [9fans] blanks in file names Dave
2002-07-10 22:38             ` arisawa
2002-07-10 22:42             ` [9fans] " Jim Choate
2002-07-11  5:08               ` Dave
2002-07-11  5:10           ` [9fans] " Dave
2002-07-14 18:32           ` Dave
2002-07-14 18:51             ` Jim Choate
2002-07-14 23:27             ` arisawa
2002-07-08  9:48       ` Boyd Roberts
2002-07-08 20:22         ` Dave
2002-07-09  8:24           ` Boyd Roberts
2002-07-09 15:25             ` Dave
2002-07-08 23:05         ` Berry Kercheval
2002-07-02 18:14 rog
2002-07-02 23:08 ` Dan Cross
2002-07-02 11:09 forsyth
2002-07-02 11:53 ` matt
2002-07-02 13:29   ` Boyd Roberts
2002-07-02 14:57     ` FJ Ballesteros
2002-07-02 16:23       ` Lucio De Re
2002-07-03 19:21       ` rob pike, esq.
2002-07-03 14:31         ` FJ Ballesteros
2002-07-02 18:28   ` plan9
2002-07-03 13:54     ` arisawa
2002-07-03 14:24       ` FJ Ballesteros
2002-07-03 19:40       ` rob pike, esq.
2002-07-03 22:10         ` arisawa
2002-07-04  8:30       ` Ralph Corderoy
2002-07-02  9:53 Fco.J.Ballesteros

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200207110614.g6B6ExM18574@dave2.dave.tj \
    --to=dave@dave.tj \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).