9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] blanks already handled properly?
@ 2002-07-04 14:53 FJ Ballesteros
  0 siblings, 0 replies; 6+ messages in thread
From: FJ Ballesteros @ 2002-07-04 14:53 UTC (permalink / raw)
  To: 9fans


I may be going mad, but, isn't the space problem already solved?

We know that

1 file names may contain spaces, thus ' ' is a legal file name char.
2 spaces are used both as delimiters in command lines and as part of
file names.

1 just means that most programs using [^ \n\t]* as a regexp for a file
name
  should just use [^\n\t]* instead. For example, I tried in acme to
select
  "a b" and button-3 on it: nothing happen. Didn't look at the source
but
  probably acme stopped at the blank.

2 means that file formats should consider that ambiguity.

The shell, IMHO, is already happy with spaces (to my surprise). I tried
with something like:

	touch x 'a b' a2
	files=a*
	for (f in $files) echo $f
	script_that_prints_one_arg_per_line $files

it all worked fine. Thus the shell is already aware that
an already expanded name is just a name, and it doesn't
matter if it contains a ' '  (yes, don't laught, I learned this today).
The use of lists in rc is a good thing, now I know.

So, isn't the answer just that
1. all programs must consider ' ' as a valid char while scanning for
names
   (like the shell already seems to do).
2. [file] formats must be chosen so that they are not ambiguous if file
   names contained on them contain blanks.
3. programs not outputing commands don't need to quote the printed names
4. programs outputing commands  need to quote the printed names.

All this would require is just Rob's %q and a shell q command to
do things like
touch x 'a b' a2
files=a*
for (f in $files) echo cp `{q $f} /other/place/`{q $f}

 [ But note that this other line works without requiring quoting:
	for (f in $files) cp $f /other/place/$f          ]

In case this solves the problem, we would only have to search for
programs that don't handle ' ' within file names and fix them; then
remove quoting from the output of programs that do not output commands;
then add quoting to the output of programs that output commands.

What do you say?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] blanks already handled properly?
@ 2002-07-05  9:22 okamoto
  0 siblings, 0 replies; 6+ messages in thread
From: okamoto @ 2002-07-05  9:22 UTC (permalink / raw)
  To: 9fans

>#
>#	Plan9 3ed
>#
>cpu% cat >''''
>alice
>cpu% ls -l
>--rw-rw-r-- M 49654 arisawa arisawa      6 Jul  5 17:54 '

I don't see this.   My result is same as release 4.
I don't know why though...

Kenji



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] blanks already handled properly?
  2002-07-05  8:14 Fco.J.Ballesteros
@ 2002-07-05  9:02 ` arisawa
  0 siblings, 0 replies; 6+ messages in thread
From: arisawa @ 2002-07-05  9:02 UTC (permalink / raw)
  To: 9fans

Hello,

I am surprised. (or my shame?)

#
#	Plan9 3ed
#
cpu% cat >''''
alice
cpu% ls -l
--rw-rw-r-- M 49654 arisawa arisawa      6 Jul  5 17:54 '
...
cpu% cat ''''
alice
cpu%

#
#	Plan9 4ed
#
term% cat>''''
bob
term% ls -l
--rw-rw-rw- M 8 arisawa arisawa       4 Jul  5 17:59 ''''
...
term% cat ''''
bob
term%

Kenji Arisawa


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] blanks already handled properly?
@ 2002-07-05  8:14 Fco.J.Ballesteros
  2002-07-05  9:02 ` arisawa
  0 siblings, 1 reply; 6+ messages in thread
From: Fco.J.Ballesteros @ 2002-07-05  8:14 UTC (permalink / raw)
  To: 9fans

:  > In case this solves the problem, we would only have to search for
:  > programs that don't handle ' ' within file names and fix them; then
:  > remove quoting from the output of programs that do not output
:  > commands; then add quoting to the output of programs that output
:  > commands.
:  
:  But this assumes that programs know when they are and aren't
:  manipulating file names, which is not true in general (see Software

I think I misunderstood your comment.

I'm not talking about letting programs recognize that ' ' is part of
a field in a file (again, if your file format contains file names, it
should probably be defined so that's not a problem).

I'm talking about programs assuming that something is a file name
and not considering ' ' as part of the name. For example, the regexps
for file names in your lib/plumbing that do not include ' ' as a valid
char, and similar pieces within C programs.


PS: Sorry about attaching the post twice in my last mail, forgot to
delete the Include line.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] blanks already handled properly?
@ 2002-07-05  7:52 Fco.J.Ballesteros
  0 siblings, 0 replies; 6+ messages in thread
From: Fco.J.Ballesteros @ 2002-07-05  7:52 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1509 bytes --]

:  > In case this solves the problem, we would only have to search for
:  > programs that don't handle ' ' within file names and fix them; then
:  > remove quoting from the output of programs that do not output
:  > commands; then add quoting to the output of programs that output
:  > commands.
:  
:  But this assumes that programs know when they are and aren't
:  manipulating file names, which is not true in general (see Software

My argument doesn't imply that all programs know when they are handling
file names...

:  Tools for a fuller exposition of this philosophy).  sort and awk don't
:  know if field 4 is a file name or not.  Plus, as Charles observed, one
:  might want to quote fields other than file names that contain blanks.

...  because in cases like this, I'd say you have to use a sensible
file format so that sort and awk could be used later on it.

Only programs that know that are producing command lines (which
include file names) would need to quote their output file names.

Agreement on this?

:  In some ways, it seems that adopting a non-space delimiter might be
:  the least painful of the alternatives to deal with file formats.

Exactly, as I said in my previous mail, if space is causing a problem
within your file format, we could say just that it's not a good file
format and it should be changed (If you want quoting you can still
include that as part of your format, independently of file names).

Any other problem with this approach?

[-- Attachment #2: Type: message/rfc822, Size: 3103 bytes --]

From: Geoff Collyer <geoff@collyer.net>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] blanks already handled properly?
Date: Thu, 4 Jul 2002 16:29:15 -0700
Message-ID: <163def7f5ee028605e6ccee38bc7c57f@collyer.net>

> In case this solves the problem, we would only have to search for
> programs that don't handle ' ' within file names and fix them; then
> remove quoting from the output of programs that do not output
> commands; then add quoting to the output of programs that output
> commands.

But this assumes that programs know when they are and aren't
manipulating file names, which is not true in general (see Software
Tools for a fuller exposition of this philosophy).  sort and awk don't
know if field 4 is a file name or not.  Plus, as Charles observed, one
might want to quote fields other than file names that contain blanks.

In some ways, it seems that adopting a non-space delimiter might be
the least painful of the alternatives to deal with file formats.


While I'm here, this is the script I use to print hex values and
glyphs (if we have any) of characters listed in /lib/unicode:

#!/bin/rc
# uniquery pattern... - print hex & glyph of any chars matching pattern
#	in /lib/unicode
sts=''
for (pat) {
	hexes = `{grep $pat /lib/unicode | column 1}
	if (~ $#hexes 0) {
		echo $0: no such unicode chars: $pat >[1=2]
		sts='no such chars'
	}
	if not
		for (hex in $hexes)
			unicode $hex-$hex
}
exit $sts

You can replace "column 1" with "awk '{print $1}'".  This is column:

#!/bin/rc
# column [-F sep] [n...]] - print n'th column(s)
rfork e
switch ($1) {
case -F
	sep=-F^$2; shift 2
case -F?*
	sep=-F^`{echo $1 | sed 's/^-F//'}; shift
}
switch ($#*) {
case 0
	* = 1
case *
	if (! ~ $1 [0-9] [0-9][0-9] [0-9][0-9][0-9]) {
		echo usage: $0 '[-F sep] [n...]' >[1=2]
		exit usage
	}
}
arglist=`{echo $* | sed -e 's/[0-9]+/$&,/g' -e 's/,$//'}
exec awk $sep '{print '^$"arglist^'}'

Here's a sample use of uniquery:

: cpu; uniquery space
0008
0020  
00a0  
2002  
2003  
2004  
2005  
2006  
2007  
2008  
2009  
200a  
200b ​
2408 ␈
2420 ␠
3000  
303f 〿
feff 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] blanks already handled properly?
@ 2002-07-04 23:29 Geoff Collyer
  0 siblings, 0 replies; 6+ messages in thread
From: Geoff Collyer @ 2002-07-04 23:29 UTC (permalink / raw)
  To: 9fans

> In case this solves the problem, we would only have to search for
> programs that don't handle ' ' within file names and fix them; then
> remove quoting from the output of programs that do not output
> commands; then add quoting to the output of programs that output
> commands.

But this assumes that programs know when they are and aren't
manipulating file names, which is not true in general (see Software
Tools for a fuller exposition of this philosophy).  sort and awk don't
know if field 4 is a file name or not.  Plus, as Charles observed, one
might want to quote fields other than file names that contain blanks.

In some ways, it seems that adopting a non-space delimiter might be
the least painful of the alternatives to deal with file formats.


While I'm here, this is the script I use to print hex values and
glyphs (if we have any) of characters listed in /lib/unicode:

#!/bin/rc
# uniquery pattern... - print hex & glyph of any chars matching pattern
#	in /lib/unicode
sts=''
for (pat) {
	hexes = `{grep $pat /lib/unicode | column 1}
	if (~ $#hexes 0) {
		echo $0: no such unicode chars: $pat >[1=2]
		sts='no such chars'
	}
	if not
		for (hex in $hexes)
			unicode $hex-$hex
}
exit $sts

You can replace "column 1" with "awk '{print $1}'".  This is column:

#!/bin/rc
# column [-F sep] [n...]] - print n'th column(s)
rfork e
switch ($1) {
case -F
	sep=-F^$2; shift 2
case -F?*
	sep=-F^`{echo $1 | sed 's/^-F//'}; shift
}
switch ($#*) {
case 0
	* = 1
case *
	if (! ~ $1 [0-9] [0-9][0-9] [0-9][0-9][0-9]) {
		echo usage: $0 '[-F sep] [n...]' >[1=2]
		exit usage
	}
}
arglist=`{echo $* | sed -e 's/[0-9]+/$&,/g' -e 's/,$//'}
exec awk $sep '{print '^$"arglist^'}'

Here's a sample use of uniquery:

: cpu; uniquery space
0008
0020  
00a0  
2002  
2003  
2004  
2005  
2006  
2007  
2008  
2009  
200a  
200b ​
2408 ␈
2420 ␠
3000  
303f 〿
feff 



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-07-05  9:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-04 14:53 [9fans] blanks already handled properly? FJ Ballesteros
2002-07-04 23:29 Geoff Collyer
2002-07-05  7:52 Fco.J.Ballesteros
2002-07-05  8:14 Fco.J.Ballesteros
2002-07-05  9:02 ` arisawa
2002-07-05  9:22 okamoto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).