9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] spaces, separators, and utf-8
@ 2002-06-01 23:01 Geoff Collyer
  2002-06-05 10:03 ` Douglas A. Gwyn
  0 siblings, 1 reply; 5+ messages in thread
From: Geoff Collyer @ 2002-06-01 23:01 UTC (permalink / raw)
  To: 9fans

Michael may only be arguing for admitting the space character in file
names, but I believe that others will go farther once space is
admitted, having witnessed various wheels of reincarnation in the
past.  Yes, tab is a control character, but a tab sometimes appears as
only a single space or two and so some people will argue that tab
should be admitted too, since it's just another form of whitespace and
visually similar to space.  And once tab is admitted, some people will
wonder why other whitespace should be excluded, and so will lobby for
return, newline, form feed and vertical tab.  About this time,
somebody will assert that any utf-8 string should be admitted as a
file name.  Others *may* be able to argue successfully to exclude NUL
characters in general and slashes from individual components.  And
there's the tricky business of the '#' namespace.  Then, seeking ever
greater generality, somebody will suggest that any sequence of bytes
should be a acceptable as a file name.  Again, there will be debate
about slashes and NULs.  And now we're back to the situation on Unix,
where names were indeed fairly unrestrained, though variants
experimented with restrictions.  Berkeley at one time forbade
characters with the high bit set in file names.

Let's try a few exercises to see what the brave new world looks like.
I created a file called

	michael's mother's recipes

on Mac OS X. To refer to this file by name from rc, let's see what
we'd have to type:

	; cd /n/imac/tmp/zoo
	; ls
	'michael''s mother''s recipes'

Not impossible, but not something I'd want to type often.  Next I
created a file with a similar name, but with spaces replaced by
newlines:

	: imac; ls -v1
	michael's
	mother's
	recipes
	michael's mother's recipes

Plain ls prints this:

	: imac; ls -1
	michael's?mother's?recipes
	michael's mother's recipes

I can't manipulate this latest file via u9fs currently:

	; ls
	ls: .: bad character in file name: 'michael''s
	mother''s
	recipes'

du and find on Unix naïvely print the names, which tends to confuse
programs that want to process the names, thus leading to ``find
-print0'' and a corresponding xargs option to cope with one common
case, but there hasn't been any general solution, particularly where
the file names are just one column of a program's output.

	: imac; du -a
	0	./michael's
	mother's
	recipes
	0	./michael's mother's recipes
	0	.

I suppose one could universally adopt Mike Lesk's solution of using
BEL (control-G, \a) or some character in the private-use space as a
column delimiter.


I am indeed working on UTF-8 issues (among others) in OS X. The most
recent version of Terminal I've tried does better at displaying UTF-8
than 10.1.4's but there's still some odd interaction with locale
files.  Unfortunately, OS X has to deal with UTF-8 as just one of
several supported encodings, though I believe it's the most common,
and we have to support locale files.  If we could get agreement on
UTF-8 as the standard encoding, with tcs-like transliterations at the
edges, and get ANSI, ISO and IEEE to drop the whole idea of locales
from their standards, things would eventually get better (as we phased
out support for the deprecated locale notion).

[If it isn't obvious why locales don't work, it's for pretty much the
same reasons that you want a single large alphabet and encoding
(Unicode and UTF-8) rather than a bunch of local encodings (e.g., Big
Five).  A professor of Japanese studies in Greece, writing in Greek
about Japanese should be able to freely intermix those characters.
locales pretend to describe a geographic area and its culture,
language, and other conventions.  But people move and take some of
those things with them.  So what locale are newly-arrived Koreans
living in California in?  They aren't in Korea's time zone but they
may not yet speak the primary language(s) of California.  Locales
don't fit multiculturalism (programs need to be prepared to synthesize
them on the fly, but then a big catalogue of them isn't very useful),
and proliferate if you try to honestly describe the situations of
people away from their places of origin.  I end up mixing British and
American conventions when configuring my machines, since an English
Canadian locale doesn't seem to be widely recognised.]

Has anybody figured out how (or if) to cope with Unicode 3?  They've
broken their promise to stick to 16 bits, which UTF-8 can cope with,
if we crank up UTFmax.  Is switching to 32-bit runes only a minor
performance hit?



^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [9fans] lures
@ 2002-06-01 14:59 Lucio De Re
  2002-06-01 17:54 ` [9fans] spaces, separators, and utf-8 Michael Baldwin
  0 siblings, 1 reply; 5+ messages in thread
From: Lucio De Re @ 2002-06-01 14:59 UTC (permalink / raw)
  To: 9fans

On Sat, Jun 01, 2002 at 10:34:13AM -0400, Michael Baldwin wrote:
>
> yeah, that's what i like about geoff.  if you can't have animated
> discussions of such fringe stuff as cursor keys vs. mouse or spaces in
> filenames on 9fans, then it would be a duller world.

The clincher is that the space is useful both as a separator of
command line arguments and as a joiner of filename "words".  Seeing
as even Micahel Baldwin does not suggest using spaces as path
separators (why not?) I would be tempted to go along with the school
of thought that proposes using a teeny dot in filenames.

The rationale being that long filenames, GUIs and Internationalisation
are all the _new_ rage and may as well be lumped into a single
paradigm change.

That we should need a keyboard key for the new pseudospace (it was
very useful in Wang Word Processors, others may remember it from
MultiMate days) in as convenient a position as the present space
bar, well, that is a little harder to address.

Also, I think it was dhog suggesting proportional spacing fonts in
program source (I shudder!) but in my mind a diminutive dot would
look nice as a linking space in the proportional space representation
of a long file name.

Finally, the space as a command line argument separator could be
sacrificed, but the result would not be aesthetically pleasing, in
my opinion.  And the diminutive dot would look wrong in this context,
specially if using a proportional font.

To give Michael his due, spaces in filenames can no longer be
suppressed.  But if they become very common, it will become more
convenient to use a GUI than command line composition.  That will
be a sad day.

++L

PS:  It's tempting to dig up the arguments in favour of Oberon's flat
filespace, I wonder how it would be received by the proponents of a
namespace _based_ on file paths :-)


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-06-05 10:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-01 23:01 [9fans] spaces, separators, and utf-8 Geoff Collyer
2002-06-05 10:03 ` Douglas A. Gwyn
  -- strict thread matches above, loose matches on Subject: below --
2002-06-01 14:59 [9fans] lures Lucio De Re
2002-06-01 17:54 ` [9fans] spaces, separators, and utf-8 Michael Baldwin
2002-06-01 18:21   ` Scott Schwartz
2002-06-01 22:00   ` Dan Cross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).