From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dave <dave@dave.tj>
Subject: Re: [9fans] blanks in file names
In-reply-to: <1515958.1025710742@GOLD>
To: 9fans@cse.psu.edu
Message-id: <200207070402.g6742Ch06066@dave2.dave.tj>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Date: Sun,  7 Jul 2002 00:02:10 -0400
Topicbox-Message-UUID: c4047bde-eaca-11e9-9e20-41e7f4b1d025

How about representing paths internally as connected structures (linked lists, if you will), each of which identifies a particular node.  The method of representation of a given node can be quite flexible (allowing node completion - or as I would term it, cannonization - to be done by the kernel), and user programs could establish their own conventions for dealing with user input and user output (which seems logical to me, since the kernel really isn't intended to be a human user interface, only a program "user" interface).

Here's an off-the-top-of-my-head example to illustrate what I mean.  I'm sure we can come up with a much better system if we all think about this a bit, and figure out how it can be improved (or how it can be replaced with something even better - the purpose of this example is simply to get us thinking in a particular direction that I believe is quite promising):

struct node_t {
 char* name; size_t name_l; /* the node name, and the length of the name */
 /* An error should probably be returned by the kernel if hint is ambiguous. */
 char* hint; /* a regexp that can be fed to attempt to "autocomplete" */
 size_t inode; /* another unique identifier for a node - should we allow it? */
 node_t* next_node;
};

/* canonize(2) converts a node_t chain into its canonical form, expanding all
 *  hints into the corresponding names.  If there are any possible
 *  canonizations and dst is not NULL, dst is filled with node_t chains
 *  representing all the possible canonizations of the node_t chain given.
 * PARAMETERS
 *  src is the node_t chain that you're attempting to canonize.  It's probably
 *   obtained by parsing the input to our shell, and deciding how to interpret
 *   it.
 *  On calling canonize(2), dst should point to a memory block large enough to
 *   accomodate *dst_l node_t structs.  canonize(2) will use the memory block to
 *   store all the nodes of all the node_t chains it creates.
 *  On return from canonize(2), dst_l will be changed to the number of node_t
 *   chains representing possible canonizations (i.e., it's a value-result
 *   parameter), and the first dst_l locations in dst will contain the starting
 *   nodes of the node_t chains representing the possible canonizations.
 * RETURN VALUE
 *  0 on success
 *  -1 on failure, with errno set appropriately
 *  Multiple matches is counted as an error condition, since canonize(2) failed
 *   to canonize the node_t chain.
 *  If dst is too small, dst will be filled as much as possible, but an error
 *   will still be returned.
 * NOTES
 *  Having the self-referencing pointer in node_t refer to the previous node
 *   instead of the next node would allow canonize(2) to save lots of buffer
 *   space in dst for ambiguities that occur deep inside a long node_t chain.
 *   However, I don't believe we should do that, since (a) ambiguities that
 *   occur relatively early in a long node_t chain allow canonize(2) to save
 *   lots of buffer space in dst in the current implementation; and more
 *   importantly, (b) any other routine in the kernel that deals with node_t
 *   chains will have to walk the entire linked list before processing anything
 *   if we implement that change, since the pointer it'll see won't be to the
 *   start of the "pathname," but rather to the end.
 */
int canonize (const node_t* src, node_t dst[], size_t* dst_l);

int new_open (const node_t* path, int flags, int mode);
/* The old open(2) can instead become open(3), with a rather trivial
 *  implementation:
 * int open (const char* path, int flags, int mode) {
 *  split path on unescaped '/' chars;
 *  create a node_t for each of the components obtained above;
 *  link the nodes together into a chain;
 *  return openv(our_node_t_chain,flags,mode);
 * };
 */

/* Here's one final example - the reason I resist the name openv(2): */
int new_execve (const node_t* path, char* const argv[], char* const envp[]);
/* The "v" here refers to a different aspect of the function.  I don't like
 *  overloading the meaning, because we'd have to call our new execve(2)
 *  execvev(2) or something, and that'd be a little insane, IMHO.  That's why I
 *  chose the "new_" prefix.
 * In reality, I'd much rather see the old open(2) be renamed to old_open(2),
 *  and have the new_open(2) be renamed to just plain open(2), damning backward
 *  compatibility (and likewise with the exec(2) family, as well as the stat(2)
 *  call, etc.).  Programs that expect open(2) to be the old open and don't feel
 *  like asking their authors for a rewrite can simply use a runtime library to
 *  rename old_open(2) to open(2) and hide our new open(2) entirely (since
 *  they're obviously not going to use it, anyway).  However, I'm afraid this
 *  type of drastic change may catch some people off-guard.
 */

Dave Cohen <dave@dave.tj>


rob pike, esq. wrote:
> 
> You guys are all arguing about system stuff but it's the
> *user interface* that you're really arguing about, and
> breaking.  You are opening a can of worms you will never
> get closed again.  Change space! Change the file delimiter!
> The shell will never recover.  The system will break.
> I will mourn.
> 
> -rob
>