From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <887f8d1b34047ae49b20087db9b2f055@plan9.escet.urjc.es>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] blanks already handled properly?
From: Fco.J.Ballesteros <nemo@plan9.escet.urjc.es>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="upas-azfqrbkyukrzspnsjekytvgddo"
Date: Fri,  5 Jul 2002 09:52:24 +0200
Topicbox-Message-UUID: c2bf48bc-eaca-11e9-9e20-41e7f4b1d025

This is a multi-part message in MIME format.
--upas-azfqrbkyukrzspnsjekytvgddo
Content-Disposition: inline
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

:  > In case this solves the problem, we would only have to search for
:  > programs that don't handle ' ' within file names and fix them; then
:  > remove quoting from the output of programs that do not output
:  > commands; then add quoting to the output of programs that output
:  > commands.
:  
:  But this assumes that programs know when they are and aren't
:  manipulating file names, which is not true in general (see Software

My argument doesn't imply that all programs know when they are handling
file names...

:  Tools for a fuller exposition of this philosophy).  sort and awk don't
:  know if field 4 is a file name or not.  Plus, as Charles observed, one
:  might want to quote fields other than file names that contain blanks.

...  because in cases like this, I'd say you have to use a sensible
file format so that sort and awk could be used later on it.

Only programs that know that are producing command lines (which
include file names) would need to quote their output file names.

Agreement on this?

:  In some ways, it seems that adopting a non-space delimiter might be
:  the least painful of the alternatives to deal with file formats.

Exactly, as I said in my previous mail, if space is causing a problem
within your file format, we could say just that it's not a good file
format and it should be changed (If you want quoting you can still
include that as part of your format, independently of file names).

Any other problem with this approach?

--upas-azfqrbkyukrzspnsjekytvgddo
Content-Type: message/rfc822
Content-Disposition: inline

Received: from mail.cse.psu.edu ([130.203.4.6]) by aquamar; Fri Jul  5 01:30:17 MDT 2002
Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.6.6])
	by mail.cse.psu.edu (CSE Mail Server) with ESMTP
	id CBD501998C; Thu,  4 Jul 2002 19:30:07 -0400 (EDT)
Delivered-To: 9fans@cse.psu.edu
Received: from collyer.net (adsl-63-192-14-226.dsl.snfc21.pacbell.net [63.192.14.226])
	by mail.cse.psu.edu (CSE Mail Server) with SMTP id 98F4419988
	for <9fans@cse.psu.edu>; Thu,  4 Jul 2002 19:29:45 -0400 (EDT)
Message-ID: <163def7f5ee028605e6ccee38bc7c57f@collyer.net>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] blanks already handled properly?
From: Geoff Collyer <geoff@collyer.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Sender: 9fans-admin@cse.psu.edu
Errors-To: 9fans-admin@cse.psu.edu
X-BeenThere: 9fans@cse.psu.edu
X-Mailman-Version: 2.0.11
Precedence: bulk
Reply-To: 9fans@cse.psu.edu
List-Id: Fans of the OS Plan 9 from Bell Labs <9fans.cse.psu.edu>
List-Archive: <https://lists.cse.psu.edu/archives/9fans/>
Date: Thu, 4 Jul 2002 16:29:15 -0700

> In case this solves the problem, we would only have to search for
> programs that don't handle ' ' within file names and fix them; then
> remove quoting from the output of programs that do not output
> commands; then add quoting to the output of programs that output
> commands.

But this assumes that programs know when they are and aren't
manipulating file names, which is not true in general (see Software
Tools for a fuller exposition of this philosophy).  sort and awk don't
know if field 4 is a file name or not.  Plus, as Charles observed, one
might want to quote fields other than file names that contain blanks.

In some ways, it seems that adopting a non-space delimiter might be
the least painful of the alternatives to deal with file formats.


While I'm here, this is the script I use to print hex values and
glyphs (if we have any) of characters listed in /lib/unicode:

#!/bin/rc
# uniquery pattern... - print hex & glyph of any chars matching pattern
#	in /lib/unicode
sts=''
for (pat) {
	hexes = `{grep $pat /lib/unicode | column 1}
	if (~ $#hexes 0) {
		echo $0: no such unicode chars: $pat >[1=2]
		sts='no such chars'
	}
	if not
		for (hex in $hexes)
			unicode $hex-$hex
}
exit $sts

You can replace "column 1" with "awk '{print $1}'".  This is column:

#!/bin/rc
# column [-F sep] [n...]] - print n'th column(s)
rfork e
switch ($1) {
case -F
	sep=-F^$2; shift 2
case -F?*
	sep=-F^`{echo $1 | sed 's/^-F//'}; shift
}
switch ($#*) {
case 0
	* = 1
case *
	if (! ~ $1 [0-9] [0-9][0-9] [0-9][0-9][0-9]) {
		echo usage: $0 '[-F sep] [n...]' >[1=2]
		exit usage
	}
}
arglist=`{echo $* | sed -e 's/[0-9]+/$&,/g' -e 's/,$//'}
exec awk $sep '{print '^$"arglist^'}'

Here's a sample use of uniquery:

: cpu; uniquery space
0008
0020  
00a0  
2002  
2003  
2004  
2005  
2006  
2007  
2008  
2009  
200a  
200b ​
2408 ␈
2420 ␠
3000 　
303f 〿
feff ﻿

--upas-azfqrbkyukrzspnsjekytvgddo--