From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <200207110200.WAA26141@math.psu.edu> To: 9fans@cse.psu.edu Subject: Re: [9fans] blanks in file names In-Reply-To: Your message of "Wed, 10 Jul 2002 19:01:29 EDT." <200207102301.g6AN1UV00727@dave2.dave.tj> From: Dan Cross Date: Wed, 10 Jul 2002 22:00:41 -0400 Topicbox-Message-UUID: c8b900b4-eaca-11e9-9e20-41e7f4b1d025 > > I don't think it would be simpler; I think it would be more > > complicated. You're replacing a simple, textual representation of an > > object with a binary representation; you have to have some way to do > > canonicalization in the common case, but even that path is thrwat with > > danger. > > Manipulating text with all sorts of dynamic buffers is substantially > more complicated than simply replacing a node in a linked list. > The canonicalization is all being done by the kernel, or a library. How could this possibly be in the kernel? After all, you're talking about changing the interface to open a file; I pass a file name via some mechanism to a user level application that wants to call open on it. What's it supposed to do? Does the shell now pass a linked list as an argument to main somehow? How does the system know that it's a file? Do we have to replace the argument vector with some more complex representation that encapsulates type information (e.g., this argument is a file, this next one is a string, etc)? Does the shell change to represent file names as lists? Does the user suffer the indignation of having to specify a list of path components to represent a file? Or do we provide a canonicalization library for shell arguments, in which case, you have the exact same problem as supporting spaces now, since most programs are going to expect to get file name arguments in the canonical representation? If you do that, who calls it? The shell or the library? I for one am going to be *very* unhappy if I have to type: cat ('' 'usr' 'cross' 'file') Instead of: cat /usr/cross/file Or do you make every program that wants to open a file call a function to canonicalize a filename into the internal format before it calls open? > > But they change an already well-established interface. Have you > > thought through the implications of this, in all their macabre glory? > > What you propose--changing the most basic interface for opening a file > > in a system where everything looks more or less like a file--has huge > > implications. And all this just to support a strange edge-case, which > > is adequately solved by substitutions in the filename. Sure, it's not > > perfect in some weird pathological case, but how often is this going to > > come up in practice? Remember: Optimize for the common case. > > Optimization for the common case is good, but creating a system where the > uncommon case will cause major mayhem at the system level is evidence > of a very unclean approach. (When you consider the reasoning behind > the problem (namely, spaces and slashes in filenames kill our ability > to seperate nodes easily), it makes perfect sense that our solution > isn't very clean. The only clean solution is to restore the ancient > UNIX ideal of being able to easily seperate nodes. In other words, > either kill spaces altogether and damn interoperability, or promote > spaces to full citizenship.) But Plan 9 can handle this. One of the beautiful things about Plan 9 is that it provides a solution that's workable with little effort. The various substitution file systems provide a workable solution without introducing any additional complexity. If you want a total--100% complete--solution, then a `urlifyfs' can be written that uses URL escaping as a canonical representation, or something similar. The system interface doesn't have to be changed, though. *That* is the mark of a clean system design. The Unix `ideal' was eliminated because it was overly complex, without a commensurate gain in functionality. Besides, the inode system didn't really fit in well with the idea of 9p. > > > There's plenty of experience with other systems working on linked lists > > > (including a huge amount of kernel code in my Linux box that I'm typing > > > from, ATM). Most of the problems with linked lists have been pretty > > > well documented, by now. > > > > It's the huge amount of kernel code that Plan 9 is trying to avoid. > > String manipulation is more complex than linked list manipulation. No, it's really not. Consider passing a linked list as an argument to a function you're calling, versus passing an argument vector of strings. How do you do that? Do you muck with all the C startup code to make sure you get the linking and so right in such a way that the list is in a contiguous memory block so it doesn't get stomped by the image read by exec? Do you pass each node in the list to main as a seperate string in the argument vector? If so, how do you tell when a file name ends and another begins? Do we introduce some convention for delineating the beginning and ends of a filename in a list representation, effectively creating a protocol that every program has to follow to take a filename as an argument? Surely the former option is significantly easier.... Consider a possible canonicalization routine that might be used in a substitution FS: char * canonical(char *str) { char *p, *s, *t; if (str == nil || (p = malloc(2 * strlen(str) + 1)) == nil) { return(nil); } for (s = str, t = p; *s != '\0'; s++, t++) { if (isspace(*s)) { *t++ = '+'; /* Or whatever. */ *t = '2'; continue; } *t = *s; } if ((s = realloc(p, strlen(p) + 1)) == nil) { free(p); } return(s); } That's pretty straight-forward; just inserting into a linked list would be just as hard. Doing so in a contiguous memory block would be, I think harder (you'd have to step over the list, keep a count of how much memory you needed, then allocate the list, copy each node and set the links. That's a pain). > > Being forced to conform to a lot of external interfaces *will* kill the > > system. > > I don't dispute that point, but the interface I propose is most unlike > any other interface currently known to man (not trying to conform to any > external interface). I'm simply pointing out that failing to provide > at least a 1-1 mapping with capabilities that are already widely used > in external systems that must interoperate with ours *will* kill us. Well, if you *really* want 100% 1 to 1 mappings, use the URL encoding others have mentioned, or something similar. As it is, it seems that this mostly works; about 80% of what's needed is there. > > Besides, the point Nemo was trying to make umpteen posts ago was that, > > yes, you can roll back changes using the dump filesystem, which gives > > you temporal mobility. He is right. > > You can do a lot of things if you're prepared to get involved in the > functions that your OS should be doing automatically. Try running an FTP > mirror to a busy site that way, though, and you'll quickly discover why > automation is a good thing. The worst part about our system is that the > "solution" you eventually find for an FTP mirror will be useless on an > HTTP proxy. When "solutions" need to be modified for each individual > application, you know that the system isn't clean. Yesterday is a wonderful tool, and can be scripted to do whatever you want. Eg, copying all files that changed on June 14th back to the cache isn't very diffcult. I don't see what running a big FTP mirror has to do with it. netlib is a big FTP site; it runs on Plan 9. Maybe it's not a mirror, but so what? I also don't see how you can't leverage whatever you did for FTP with HTTP. The substitution-style FS gives you a *lot* of flexibility in this area. - Dan C.