9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-10 14:49 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-10 14:49 UTC (permalink / raw)


From: geoff@plan9.bell-labs.com

>I'll second td's comments and note that he described your application
>as `one weird little application'; NNTP, IMAP and CIFS are network
>protocols, not applications, and their implementations may or may not

I also listed the "applications"

>>the application to live with the OS.  Remember INN, CYRUS and SAMBA
>>are applications that already exist...

>NNTP may not be little but it's certainly weird or at least
>irrelevant.

Not to the internet.  Not yet.

>             I normally prefer not to talk about my shadey past, but
>as senior author of C News and inventor of nov, the common

Thanks for C News.  I was an early adopter and helped to create dbz.

>news-overview database used by the reader software, I've seen a lot of
>netnews and its growth over time, and it's hard to see why anyone
>would want to read netnews any more (I quit years ago).  The signal in
>the noise is so faint it's almost undetectable and the volume is
>ludicrously high.  If you did want to receive a very small subset of

The volume is there because it is *used*.  News is different now,
things change.

>netnews, you'd surely be better off with forsyth's plan 9 netnews
>implementation than with INN and its CERT alert.  As may be obvious,

I've never seen it.  Is it available somewhere?

>I've always felt that NNTP is poorly suited to news transport and news
>reading.

humm...

>The Internet needs fewer but better protocols.  One that it doesn't
>need is NNTP.  (SMTP is another, and I'm tackling it first.)  For a
>lot of things, my preference is to use filesystem protocols like 9P or
>Styx.  Even NFS is a better protocol than NNTP for reading news.  For
>example, it wasn't necessary to change NFS nor issue a revision of the
>NFS RFC(s) when nov was invented; both were necessary for NNTP (the

Ok, why didn't you and the other news movers and shakers do something
about it?  Could it be that not everybody likes NFS?  Sure, you could
implement a NFS server to optimize news type accesses, but it dictates
certain underlying OS support (e.g. file links, 255 character file
names, etc.)

Don't you find 9P (or Styx) a little restrictive regarding the length
of file names (ignore the OS for now, just look at the protocol.)  Since
the protocol allows 8k blocks, why not add a length indicator (like
count[2] in Twrite) that allows ~8k for long descriptive file names.
There is precedence, Twrite and Tread have offset[8] (64bits) even
though the OS only uses 32 bits of it.

>news readers had to change either way to exploit nov).  This business
>of inventing a new protocol (and RFC or six) every time someone has an

[snip explanation of the current crazy world of RFCs]

Commercial interests now drive the RFC process.  For example, INN is
free but to participate you are "supposed" to pgp verify newgroup and
rmgroup messages.  How many news servers on the internet do you think
have RSA and PGP licenses?

[end of rant]

I spent $350 on two books and a CD that introduce some interesting
concepts in computing.  The authors published the information in
hopes that the ideas would catch on and grow.  From the Plan9 FAQ
(http://plan9.bell-labs.com/plan9/faq.html)  "... to succeed it
must be used ...".  I'm trying to demonstrate that it can be used
for applications that are considered "hard" at scale.  At first
I'm going to provide a free NNTP server with a few hundred gigabytes
of storage at the end of a DS3 connected to the internet, for research.
If I can keep it together and it functions well, I will try (yet again)
to get a reasonable license from Lucent that allows me to charge
customers to connect to the server.  If all that happens I will then
add SMTP, POP, IMAP (including IMAP access to NEWS) and HTTP servers.
This is an attempt to start a business of providing robust and cost
effective server platforms to the internet.

You may not like me mucking with the core of the system, but if
that is what it takes...

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14 15:18 Russ
  0 siblings, 0 replies; 29+ messages in thread
From: Russ @ 1998-04-14 15:18 UTC (permalink / raw)


> There's been some mention of Styx in this discussion. Is there anywhere I can
> get an up-to-date specification of Styx (and the other parts of Inferno) without
> paying? I'm interested particularly in what changes, if any, there have been
> since 9p (assuming, as I do, that Styx is more-or-less 9p).

Sure.  Go download the inferno docs off http://inferno.lucent.com (register to
get the free binary distribution and then download just the documents).





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14 15:16 Tom
  0 siblings, 0 replies; 29+ messages in thread
From: Tom @ 1998-04-14 15:16 UTC (permalink / raw)


> An alternative (taken by a friend of mine in his modified version of
> the other rc) is to make the shell glob more carefully. So the file
> called "Hello World" globbed as Hello' 'World.

Uhh, the real rc always had this property.  If the ersatz product failed
to do this, the perpetrator wasn't paying attention.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14 13:50 Rob
  0 siblings, 0 replies; 29+ messages in thread
From: Rob @ 1998-04-14 13:50 UTC (permalink / raw)


Leaving the null byte in made the specification and the code easier.
The Unix thing of 14 bytes with a null sometimes was a) harder and
b) gotten wrong by a lot of programs.

-rob




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14 13:35 Elliott.Hughes
  0 siblings, 0 replies; 29+ messages in thread
From: Elliott.Hughes @ 1998-04-14 13:35 UTC (permalink / raw)


> > One thing I don't get is why "Fields that contain names are 28-byte
> > strings (including a terminal NUL (zero) byte)". Why send the zero,
> > if it's always zero and it's always there? Why not use these 28 bytes
> > as a sort of persistant fid?
> 
> If you don't send the zero, how do you indicate filenames less
> than 27 bytes?  How would "x.c" get transmitted?

As { 'x', '.', 'c', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }.
Not sure I counted that right, but you get the idea, I'm sure!

> I suppose you
> could increase the name length by one by adding the hack that
> there's an implicit zero on byte 29 that isn't transmitted,
> but it's an ugly hack for a questionable cause (one byte?) and
> it would make things like strlen, etc. annoying.

I assume that ease of dealing with the message (you know the name you're
pulling out of it is already a C string) is the reason for the waste. I was just
wondering out loud because most of us don't know what discussion went
on at the labs before such decisions were made. (And if outsiders question
the wisdom of these decisions and wonder about rejected alternatives, they
like as not get insulted for daring to question AT&T wisdom. That's
how it looks to me, anyway.)

But increasing the name length by one byte wasn't really my intention (though
maybe then that real-audio configuration file that's in my NFS home directory
would be available to Plan 9). I was thinking more of using all 28 bytes -- seeing
as we send them anyway -- with no NUL termination. You wouldn't use
them as UTF-8 strings, rather as 28 * 8 bit persistant fids that the server
keeps track of. [Something like the inode + device number from the old days.]

You could use a different mechanism to work out what persistant fid it
was you were looking for: you could have filenames if you chose, or you
could have a database interface (perhaps to pictures, for which filenames
aren't really sufficient).

Anyway, the point is that you'd then have a larger and much more
convenient space to play about with alternative mappings in.

There's been some mention of Styx in this discussion. Is there anywhere I can
get an up-to-date specification of Styx (and the other parts of Inferno) without
paying? I'm interested particularly in what changes, if any, there have been
since 9p (assuming, as I do, that Styx is more-or-less 9p).

-- 
http://users.ch.genedata.com/~enh/




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14 13:04 Russ
  0 siblings, 0 replies; 29+ messages in thread
From: Russ @ 1998-04-14 13:04 UTC (permalink / raw)


> One thing I don't get is why "Fields that contain names are 28-byte
> strings (including a terminal NUL (zero) byte)". Why send the zero,
> if it's always zero and it's always there? Why not use these 28 bytes
> as a sort of persistant fid?

If you don't send the zero, how do you indicate filenames less
than 27 bytes?  How would "x.c" get transmitted?  I suppose you
could increase the name length by one by adding the hack that
there's an implicit zero on byte 29 that isn't transmitted,
but it's an ugly hack for a questionable cause (one byte?) and
it would make things like strlen, etc. annoying.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14  9:50 Elliott.Hughes
  0 siblings, 0 replies; 29+ messages in thread
From: Elliott.Hughes @ 1998-04-14  9:50 UTC (permalink / raw)


> Hello?  Where did everybody go...?

Easter holiday?

> In that case the overhead for 9p now gets quite high given all
> the Twalks.  So how about a uchar length followed by data?  Makes
> sense.

One thing I don't get is why "Fields that contain names are 28-byte
strings (including a terminal NUL (zero) byte)". Why send the zero,
if it's always zero and it's always there? Why not use these 28 bytes
as a sort of persistant fid?

> And another thing, there is enough about Plan9 that the user needs
> to be aware of, might as well add spaces to file names so the DOS
> weenys don't eat your lunch when you do IMAP and CIFS...

Is it really worth it? Who is going to take advantage of it? Nearly all my
files have short names ending .c or .java, and I suspect (as Tom Duff
mentioned) that this will be true for most of us.

I don't know. Is "there are more important things to think about" sufficient
reason not to do something? Maybe I'd have done more and pontificated
less in my time if I hadn't thought like this!

-- 
http://users.ch.genedata.com/~enh/




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14  9:08 Elliott.Hughes
  0 siblings, 0 replies; 29+ messages in thread
From: Elliott.Hughes @ 1998-04-14  9:08 UTC (permalink / raw)


Tom Duff wrote:
> On Apr 8,  1:08pm, Russ Cox wrote:
> > i don't know the official reasons that space isn't allowed,
> > but in general file names with spaces (which you have to deal
> > with in Unix and Windows) are a pain for oodles of reasons.  the most
> > noticeable one is that it messes up scripts and the like:
> > ls -l | awk '{print $10}' is no longer guaranteed to give
> > you filenames.

> ... characters other than letters, digits,
> underscore, minus, plus and dot were so little used that
> forbidding them would not impact any important use of the
> system.  Obviously people stick to those characters to
> avoid colliding with the shell's syntax characters.  I suggested
> (or at least considered) formalizing the restriction, specifically
> to make file names easier to find by programs like awk.

An alternative (taken by a friend of mine in his modified version of
the other rc) is to make the shell glob more carefully. So the file
called "Hello World" globbed as Hello' 'World (in a use of free
careting that I never liked). Actually, free careting means that rc
is the shell in which it's least painful to deal with spaces in
filenames: you don't need to go back to the start of the filename
to insert the quote character that you forgot.

You could easily change the utilities to output filenames in this
protected form. [Personally I thought that '/' as directory separator
was a bad choice compared to DOS' '\\'. I've never seen anyone
want to use '\\' in a filename, but I've seen plenty want to use '/'.]

But I think this is missing the point. Filesystems in the 9P mode are
for programmers, not for users. No-one really cares whether they
have 27 bytes worth of filename or 255 ISO-Latin-1 characters,
or even what the computer has to do to to ensure you can have
chess symbols in filenames: all they're bothered about is whether
or not they can find their files again. Is a simple name the best way
to do this? I'm not convinced that it is.

If you want to implement a collection of repulsive protocols that
assume a certain implementation, write some library functions or
a user-level filesystem to provide this functionality. Despite what
the advertising may have claimed, having 31-character names
or even 255-character names is neither here nor there.

-- 
http://users.ch.genedata.com/~enh/




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14  6:38 Nigel
  0 siblings, 0 replies; 29+ messages in thread
From: Nigel @ 1998-04-14  6:38 UTC (permalink / raw)


I use the non-breaking space when translating Win95 VFAT names in a
hacked dossrv. Win95 fienames are stuffed full of spaces because
Microsoft don't care whether the DOS shell can parse path names anymore.


> -----Original Message-----
> once when i had to worry about conveying names with spaces (x.400
> addresses)
> i used iso no-break space (+U'00A0' i think).  it was adequate,
> which is more than i can say for some of the RFCs i've read or had
> to implement over the years.  i found i nearly always had to check
> someone else's code to see what clients or servers actually expected.
> implement the rfc precisely (or as accurately as you can determine
> it),
> and you often hit problems.
> 




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14  6:24 forsyth
  0 siblings, 0 replies; 29+ messages in thread
From: forsyth @ 1998-04-14  6:24 UTC (permalink / raw)


>>But I think this is missing the point. Filesystems in the 9P mode are
>>for programmers, not for users. No-one really cares whether they
>>have 27 bytes worth of filename or 255 ISO-Latin-1 characters,
>>or even what the computer has to do to to ensure you can have
>>chess symbols in filenames: all they're bothered about is whether
>>or not they can find their files again. Is a simple name the best way

that has been my point of view in this as well.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-14  5:33 forsyth
  0 siblings, 0 replies; 29+ messages in thread
From: forsyth @ 1998-04-14  5:33 UTC (permalink / raw)


>>Thus, for example, whatever weirdness INN does to store news could be
>>exported through a file system protocol and mounted locally or remotely
i

someone wrote an article for Dr Dobbs several years ago demonstrating
something along those lines for news, using QNX's file system primitives.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-13 23:03 geoff
  0 siblings, 0 replies; 29+ messages in thread
From: geoff @ 1998-04-13 23:03 UTC (permalink / raw)


I thought it was obvious that the data exported through a file system
protocol (a file system interface) need not be a conventional file system.
Thus, for example, whatever weirdness INN does to store news could be
exported through a file system protocol and mounted locally or remotely
instead of accessing the data via NNTP.  You can even do this via NFS with
a user-mode NFS server.

Geoff Collyer




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-13 13:50 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-13 13:50 UTC (permalink / raw)


From: geoff@plan9.bell-labs.com

>                             I'm told that a `full feed' runs to 1 - 2
>gigabytes per day, depending on who you ask for numbers, and that
>information is undoubtedly out of date; it's surely much more now.

It is more.

>I would have thought my record of activism against NNTP, especially for
>news reading, spoke for itself.  To hit some highlights: the C News crew

It does.

>ignored NNTP for years; publicly encouraged use of NFS or other, better
>remote file systems instead of NNTP for news reading, despite aversion to

Speaking from my own experience, news on a filesystem is ok inside an
organization where you don't need logging (of news access patterns) or
per user authenication.  Also, as the volume has increased, the techniques
of storing the data has moved away from the file system (look at cfs in
INN).  Since the access pattern of reading news is different than the
access pattern of general file access, it makes sense to separate the
access from the implementation.  But NNTP, as a protocol, is lacking.

What about IMAP?  It looks very much like a filesystem protocol that
makes few implementation restrictions.  The problem here is that it
also puts some client side functions in the server (e.g. searching files
for pattern matches.)  Since IMAP is not well adopted yet, perhaps
9P (or Styx) could be put forward as an alternative?

[snip more news/NNTP stuff]

Enough about news.  This is the Plan9 list.  The reason I picked News
is because it is hard, has insane volume and the users look at the
output and tell me it doesn't work.  In other words I'm using News to
test my operating system!  My file servers contain all the performance
(thin OS, efficient protocol, btree indexed directories, RAID-0) and
robustness (RAID-1, checksumed data blocks, full logging including data)
to make it work.  The cpu servers simply convert NNTP, IMAP, etc. to 9P.

Think about it.  ISPs spend a *lot* of good money on big UNIX iron,
giant NFS file servers, and very large RAID boxes to handle this
application at enormous cost.  So I gather together a bunch of
inexpensive PCs, hard drives and networking and do a *better* job
because I use Plan9!  That is what I call fun!

-------------------------------------------------------------------
G. David Butler   | Who I? Zathras, a Plan9er. Nobody uses Zathras'
                  | system, but Zathras not mind. Zathras used to 
                  | having others ignore Zathras. Besides, Zathras
gdb@dbSystems.com | have best system, so Zathras happy.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-13  6:00 geoff
  0 siblings, 0 replies; 29+ messages in thread
From: geoff @ 1998-04-13  6:00 UTC (permalink / raw)


David, you're welcome to the last word; I don't want to drag this out.

Friends who read netnews and maintain it tell me that things have indeed
changed, for the worse: not only is volume ever-increasing, but the vast
preponderance of netnews these days is porno and spam.  Ignoring all that,
you've still got to find the signal in what's left (as always).  I think
the design criteria of A News were about right: 1 to 2 useful articles per
day, it's just much harder to find them now.  The bandwidth is being used,
all right, but to what end?  I'm told that a `full feed' runs to 1 - 2
gigabytes per day, depending on who you ask for numbers, and that
information is undoubtedly out of date; it's surely much more now.

I would have thought my record of activism against NNTP, especially for
news reading, spoke for itself.  To hit some highlights: the C News crew
ignored NNTP for years; publicly encouraged use of NFS or other, better
remote file systems instead of NNTP for news reading, despite aversion to
NFS as an inferior network file system; denounced the (buggy) `relaynews
daemon' hackery done elsewhere to inject (pointlessly) redundant incoming
NNTP connections directly into the guts of C News; and when uucp began to
fade (never to be fully replaced) and our netnews neighbours insisted that
we exchange news via NNTP, we wrote and shipped a pair of simple,
non-munging, exchange-only NNTP programs.  When I invented nov and did the
first proof-of-concept implementations, I deliberately made it easy to get
the nov files by merely importing /usr/spool/news but deliberately ignored
access to nov data via NNTP.  Some people can't take a hint (nor a strong
suggestion).

Indeed, despite common sense and all our efforts, some people professed to
prefer NNTP to file systems.  In hindsight, we may have incorrectly
expected that people would figure out for themselves that NNTP is
defective; that given the Internet, you normally only need one news feed
and thus can use remote file access or even uucp or FTP or rsh or a bare
TCP connection to exchange news; and that it's simpler and better for news
readers to just read from files instead of having to contain code to read
from files and completely different code to request and read articles via
NNTP sockets.  Perhaps we should have declared a jihad on NNTP, but life
is short and the Internet is full of defective protocols (take a look at
FTP, never mind the obnoxious interactive Unix client), with more popping
up every day.  What can you do?  You can lead a hacker to wisdom, but you
can't make him think.

Now that 9P and Styx specifications and implementations are available
outside Lucent, we're all in an even better position to push for use of
good remote file systems instead of Yet Another Dopey Internet Protocol.
Any protocol that ends in TP for `Transfer Protocol' is an obvious
candidate for being eliminated by using remote file access instead.  In
the short run, there may still be some utility in replacing some of the
existing clunkers (e.g. SMTP, LDAP, TELNET, DNS) with improved protocols,
but surely the long-term goal should be Fewer But Better Protocols, and
the Better Protocols should certainly include (at least) one file system
protocol.

Geoff Collyer
NNTP Non-Proliferation Task Force




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-12  3:27 ozan
  0 siblings, 0 replies; 29+ messages in thread
From: ozan @ 1998-04-12  3:27 UTC (permalink / raw)


> The Internet needs fewer but better protocols.

well, you know the process..





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-10 14:43 forsyth
  0 siblings, 0 replies; 29+ messages in thread
From: forsyth @ 1998-04-10 14:43 UTC (permalink / raw)


i've done various forms of nntp support, client and server, without too much trouble.
getting a reliable feed was rather more of a problem, as was the volume,
and the volume of noise.

once when i had to worry about conveying names with spaces (x.400 addresses)
i used iso no-break space (+U'00A0' i think).  it was adequate,
which is more than i can say for some of the RFCs i've read or had
to implement over the years.  i found i nearly always had to check
someone else's code to see what clients or servers actually expected.
implement the rfc precisely (or as accurately as you can determine it),
and you often hit problems.

pop3 is one example:  several famous clients made undocumented assumptions
(i think one was `there is a space following +OK').
spaces and file name lengths were the least of my problems.

i'm surprised that nfs requires that the underlying file system
actually have links, as opposed to rejecting requests to form or manipulate
them (especially given that there are valid reasons for refusing such
requests even on file systems that have got links).




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-10  4:04 geoff
  0 siblings, 0 replies; 29+ messages in thread
From: geoff @ 1998-04-10  4:04 UTC (permalink / raw)


I'll second td's comments and note that he described your application
as `one weird little application'; NNTP, IMAP and CIFS are network
protocols, not applications, and their implementations may or may not
be related to the size of their specifications (one would hope not,
given the size of many protocol specifications).

NNTP may not be little but it's certainly weird or at least
irrelevant.  I normally prefer not to talk about my shadey past, but
as senior author of C News and inventor of nov, the common
news-overview database used by the reader software, I've seen a lot of
netnews and its growth over time, and it's hard to see why anyone
would want to read netnews any more (I quit years ago).  The signal in
the noise is so faint it's almost undetectable and the volume is
ludicrously high.  If you did want to receive a very small subset of
netnews, you'd surely be better off with forsyth's plan 9 netnews
implementation than with INN and its CERT alert.  As may be obvious,
I've always felt that NNTP is poorly suited to news transport and news
reading.

The Internet needs fewer but better protocols.  One that it doesn't
need is NNTP.  (SMTP is another, and I'm tackling it first.)  For a
lot of things, my preference is to use filesystem protocols like 9P or
Styx.  Even NFS is a better protocol than NNTP for reading news.  For
example, it wasn't necessary to change NFS nor issue a revision of the
NFS RFC(s) when nov was invented; both were necessary for NNTP (the
news readers had to change either way to exploit nov).  This business
of inventing a new protocol (and RFC or six) every time someone has an
idea is looney and contributes to the ever-increasing proliferation of
RFCs (not to mention the difficulty of speaking with authority; a new
RFC obsoleting the ones you've read may have been issued since you
checked last month or week or hour).  To see what I mean about RFCs,
try to find the complete set of current (not yet obsoleted) RFCs
pertaining to mail, including the dozens of SMTP extensions and the
dozens of MIME RFCs.  Now find all the current draft RFCs pertaining
to mail.  Do it again a month later to make sure more RFCs (and
drafts) haven't been bred in the sewers while you were doing real
work.  Read all the RFCs you found, rinse and repeat until dizzy or
you need to get back to work.

Geoff Collyer
RFC Non-Proliferation League




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-10  1:03 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-10  1:03 UTC (permalink / raw)


From: "Tom Duff" <td@pixar.com>

>It's ludicrous to make an incompatible change like this with, as you noted,
>such far-reaching consequences, just for one weird little application.

NNTP (look at the new draft rfc1036)? IMAP? CIFS?  These are not "little"
nor "weird" applications.

>If your application doesn't like the names the file system gives it,
>keep a little name-mapping table somewhere and write open and create
>routines that use it.

How do you map a dense UTF-8 encoding on another with 255 octets
to 27?  If there is a character that can be used as a ubiquitous
terminator (like 0x7f perhaps?) then

a-very-long-file-name-that-is-much-longer-than-27-octets

could be

a-very-long-file-name-that-/is-much-longer-than-27-octe/ts{0x7f}

or you could trust MD5 and turn it into 16 bytes mapped to (16/3)*4=24
base64 characters.  But md5 is only one way so you have to create a
file of mappings (your suggestion).  How do you update a file without
file locking by 100's of cpu servers?  That would be very hard with
exclusive access files that don't block (that can be fixed too)!
(BTW: There are also 100's of file servers.)  Also the failure
scenarios are crazy.

We are told Plan9 is dead.  Inferno is alive.  The last time I
talked to Lucent about licensing, they will not license Plan9 for
re-distribution.  So I'm alone anyway if I use Plan9.  It is *much*
easier to change the OS to host the application than it is to twist
the application to live with the OS.  Remember INN, CYRUS and SAMBA
are applications that already exist...

I'm open to suggestions...

Thanks again.

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-09 23:44 Tom
  0 siblings, 0 replies; 29+ messages in thread
From: Tom @ 1998-04-09 23:44 UTC (permalink / raw)


> NAMELEN = 28
> FNAMELEN = 64? 128? 256!?  Hell, might as well fix it now. 256.
>
> In that case the overhead for 9p now gets quite high given all
> the Twalks.  So how about a uchar length followed by data?  Makes
> sense.
>
> Boy, the fileserver needs a much bigger inode.  I don't see any way
> around it.  Of course DIRPERBUF gets a lot smaller.  It's good I
> have indexes.
>
> And another thing, there is enough about Plan9 that the user needs
> to be aware of, might as well add spaces to file names so the DOS
> weenys don't eat your lunch when you do IMAP and CIFS...

It's ludicrous to make an incompatible change like this with, as you noted,
such far-reaching consequences, just for one weird little application.
If your application doesn't like the names the file system gives it,
keep a little name-mapping table somewhere and write open and create
routines that use it.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-09 22:08 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-09 22:08 UTC (permalink / raw)


>>From: "G. David Butler" <gdb@dbSystems.com>
>
>>I think each component limited to 27 octets is ok.
>
>What am I saying?!?  A 9 "character" file name (with up to 3 octets
>per "character") is a little anemic!  This neeeds to be at least 20.
>
>So do we change NAMELEN from 28 to 64, or bigger?  Anybody from Japan
>have an opinion?

Hello?  Where did everybody go...?

[continuing to talk to self...]

Self, first you need to split user name lengths from file name
lengths.  How about FNAMELEN?  Seems reasonable.

NAMELEN = 28
FNAMELEN = 64? 128? 256!?  Hell, might as well fix it now. 256.

In that case the overhead for 9p now gets quite high given all
the Twalks.  So how about a uchar length followed by data?  Makes
sense.

Boy, the fileserver needs a much bigger inode.  I don't see any way
around it.  Of course DIRPERBUF gets a lot smaller.  It's good I
have indexes.

And another thing, there is enough about Plan9 that the user needs
to be aware of, might as well add spaces to file names so the DOS
weenys don't eat your lunch when you do IMAP and CIFS...

Off to work.

-------------------------------------------------------------------
G. David Butler   | Who I? Zathras, a Plan9er. Nobody uses Zathras'
                  | system, but Zathras not mind. Zathras used to 
                  | having others ignore Zathras. Besides, Zathras
gdb@dbSystems.com | have best system, so Zathras happy.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-09  3:10 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-09  3:10 UTC (permalink / raw)


>From: "G. David Butler" <gdb@dbSystems.com>

>I think each component limited to 27 octets is ok.

What am I saying?!?  A 9 "character" file name (with up to 3 octets
per "character") is a little anemic!  This neeeds to be at least 20.

So do we change NAMELEN from 28 to 64, or bigger?  Anybody from Japan
have an opinion?

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 23:56 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-08 23:56 UTC (permalink / raw)


>From: "Rob Pike" <rob@plan9.bell-labs.com>
>
>This discussion isn't pin-headed enough yet.

Ok.

>Let me point out that you can map the characters into
>UTF-8, and i believe with a little craft you could squeeze
>28 characters into 27 bytes, since you only have 96 or
>so characters valid.

A quick huffman encoding might do it.  Should we also add
some LZW?

db




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 22:05 Rob
  0 siblings, 0 replies; 29+ messages in thread
From: Rob @ 1998-04-08 22:05 UTC (permalink / raw)


This discussion isn't pin-headed enough yet.
Let me point out that you can map the characters into
UTF-8, and i believe with a little craft you could squeeze
28 characters into 27 bytes, since you only have 96 or
so characters valid.

-rob




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 21:54 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-08 21:54 UTC (permalink / raw)


>From: "Russ Cox" <rsc@plan9.bell-labs.com>

>> The problem is that the RFC2060 "International Mailbox Naming
>> Convention" has one more character in its alphabet than Plan9
>> allows in names so there is no place to map the space.
>
>There are more problems than that.  I'm pretty sure RFC2060 doesn't
>specify a name length, and Plan9 limits you to 27 bytes.  So you're
>going to have to keep some sort of translation table between IMAP
>names and Plan9 names anyway.  

I think each component limited to 27 octets is ok.  Now what is fun
is the namespace starting with a '#'!  (not a real problem, just strip
it or use ./#blah.)

>If this limit wasn't there and you were content to keep the mailbox
>names in encoded form (probably a nice choice, since then you never
>have to decode or encode them) you could substitute 0x7F for space
>in the Plan9 names and you'd be all set -- both are one byte in
>UTF-8, and RFC2060 can't have raw 7Fs while Plan9 can't have raw spaces.

Hold on, /sys/src/9/port/chan.c excludes 0x7f even though
/sys/src/fs/port/dentry.c doesn't.  Should we allow 0x7f?

-------------------------------------------------------------------
G. David Butler   | Who I? Zathras, a Plan9er. Nobody uses Zathras'
                  | system, but Zathras not mind. Zathras used to 
                  | having others ignore Zathras. Besides, Zathras
gdb@dbSystems.com | have best system, so Zathras happy.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 20:32 Russ
  0 siblings, 0 replies; 29+ messages in thread
From: Russ @ 1998-04-08 20:32 UTC (permalink / raw)


> The problem is that the RFC2060 "International Mailbox Naming
> Convention" has one more character in its alphabet than Plan9
> allows in names so there is no place to map the space.

There are more problems than that.  I'm pretty sure RFC2060 doesn't
specify a name length, and Plan9 limits you to 27 bytes.  So you're
going to have to keep some sort of translation table between IMAP
names and Plan9 names anyway.  

If this limit wasn't there and you were content to keep the mailbox
names in encoded form (probably a nice choice, since then you never
have to decode or encode them) you could substitute 0x7F for space
in the Plan9 names and you'd be all set -- both are one byte in
UTF-8, and RFC2060 can't have raw 7Fs while Plan9 can't have raw spaces.

Russ




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 19:58 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-08 19:58 UTC (permalink / raw)


>From: "Russ Cox" <rsc@plan9.bell-labs.com>

>not having spaces in the filenames in plan9 is a wonderful
>blessing.

True.

>come to think of it, i think i'd rather have ms-dos filenames than
>have imap.  use import and edmail.  or pop3 | sendmail.

You would also rather use Plan9 instead of ms-dos or NT.  What's
your point?  :-)

>From: "Tom Duff" <td@pixar.com>

>On Apr 8,  1:08pm, Russ Cox wrote:
>> noticeable one is that it messes up scripts and the like:
>> ls -l | awk '{print $10}' is no longer guaranteed to give
>> you filenames.
>
>Yes, this is certainly the reason.  When I was working on the
[snip]
>Probably rob took the more liberal road of forbidding del, space
>and controls, the first because it is particularly hard to type,
>and the rest because, as Russ noted, they confound the usual
>line- and field-breaking rules.

I agree that it is nice that Plan9 isn't as libral as UNIX in this
regard.

The problem is that the RFC2060 "International Mailbox Naming
Convention" has one more character in its alphabet than Plan9
allows in names so there is no place to map the space.

I guess I could invoke implementation limitations of
"no ASCII space and 27 UTF-8 octets" in names.

-------------------------------------------------------------------
G. David Butler   | Who I? Zathras, a Plan9er. Nobody uses Zathras'
                  | system, but Zathras not mind. Zathras used to 
                  | having others ignore Zathras. Besides, Zathras
gdb@dbSystems.com | have best system, so Zathras happy.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 17:45 Tom
  0 siblings, 0 replies; 29+ messages in thread
From: Tom @ 1998-04-08 17:45 UTC (permalink / raw)


On Apr 8,  1:08pm, Russ Cox wrote:
> Subject: re: [9fans] allowing space (ASCII 0x20) in file names
> intro(5) says "Plan 9 names may contain any printable character
> (that is, any character outside hexadecimal 00-1F and 80-9F) except
> slash and blank", so yes it looks like the idea is to disallow
> unprintables.
>
> i don't know the official reasons that space isn't allowed,
> but in general file names with spaces (which you have to deal
> with in Unix and Windows) are a pain for oodles of reasons.  the most
> noticeable one is that it messes up scripts and the like:
> ls -l | awk '{print $10}' is no longer guaranteed to give
> you filenames.

Yes, this is certainly the reason.  When I was working on the
plan 9 shell, I did a survey of all the file names on all the
unix machines that I could conveniently look at, and discovered,
unsurprisingly, that characters other than letters, digits,
underscore, minus, plus and dot were so little used that
forbidding them would not impact any important use of the
system.  Obviously people stick to those characters to
avoid colliding with the shell's syntax characters.  I suggested
(or at least considered) formalizing the restriction, specifically
to make file names easier to find by programs like awk.
Probably rob took the more liberal road of forbidding del, space
and controls, the first because it is particularly hard to type,
and the rest because, as Russ noted, they confound the usual
line- and field-breaking rules.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 17:08 Russ
  0 siblings, 0 replies; 29+ messages in thread
From: Russ @ 1998-04-08 17:08 UTC (permalink / raw)


intro(5) says "Plan 9 names may contain any printable character
(that is, any character outside hexadecimal 00-1F and 80-9F) except
slash and blank", so yes it looks like the idea is to disallow
unprintables.

i don't know the official reasons that space isn't allowed,
but in general file names with spaces (which you have to deal
with in Unix and Windows) are a pain for oodles of reasons.  the most
noticeable one is that it messes up scripts and the like:
ls -l | awk '{print $10}' is no longer guaranteed to give 
you filenames.  that's just the tip of the iceberg.
"B <echo *.c" in sam would fail.  there is much more.
not having spaces in the filenames in plan9 is a wonderful
blessing.

just my 2¢, but i'd rather have ms-dos filenames than have spaces.
come to think of it, i think i'd rather have ms-dos filenames than
have imap.  use import and edmail.  or pop3 | sendmail.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [9fans] allowing space (ASCII 0x20) in file names
@ 1998-04-08 16:56 G.David
  0 siblings, 0 replies; 29+ messages in thread
From: G.David @ 1998-04-08 16:56 UTC (permalink / raw)


Why is the space (ASCII 0x20) excluded from file names?  This is done
both in /sys/src/9/port/chan.c and /sys/src/fs/port/dentry.c.

/sys/src/fs/port/dentry.c disallows <= 0x20.
/sys/src/9/port/chan.c disallows <= 0x20, '/' (for obvious reasons)
and 0x7f (not so obvious).

Is the idea to disallow unprintables?

I was just looking at IMAP (RFC2060), and it specifies modified
UTF-7 for mailbox names.  The only incompatability is the ASCII
space (0x20).

So, I was wondering, is there a problem allowing the space?

P.S.  I plagiarized the idea for this sig from another, but it is cute.

-------------------------------------------------------------------
G. David Butler   | Who I? Zathras, a Plan9er. Nobody uses Zathras'
                  | system, but Zathras not mind. Zathras used to 
                  | having others ignore Zathras. Besides, Zathras
gdb@dbSystems.com | have best system, so Zathras happy.




^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~1998-04-14 15:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-04-10 14:49 [9fans] allowing space (ASCII 0x20) in file names G.David
  -- strict thread matches above, loose matches on Subject: below --
1998-04-14 15:18 Russ
1998-04-14 15:16 Tom
1998-04-14 13:50 Rob
1998-04-14 13:35 Elliott.Hughes
1998-04-14 13:04 Russ
1998-04-14  9:50 Elliott.Hughes
1998-04-14  9:08 Elliott.Hughes
1998-04-14  6:38 Nigel
1998-04-14  6:24 forsyth
1998-04-14  5:33 forsyth
1998-04-13 23:03 geoff
1998-04-13 13:50 G.David
1998-04-13  6:00 geoff
1998-04-12  3:27 ozan
1998-04-10 14:43 forsyth
1998-04-10  4:04 geoff
1998-04-10  1:03 G.David
1998-04-09 23:44 Tom
1998-04-09 22:08 G.David
1998-04-09  3:10 G.David
1998-04-08 23:56 G.David
1998-04-08 22:05 Rob
1998-04-08 21:54 G.David
1998-04-08 20:32 Russ
1998-04-08 19:58 G.David
1998-04-08 17:45 Tom
1998-04-08 17:08 Russ
1998-04-08 16:56 G.David

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).