From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <887f8d1b34047ae49b20087db9b2f055@plan9.escet.urjc.es> To: 9fans@cse.psu.edu Subject: Re: [9fans] blanks already handled properly? From: Fco.J.Ballesteros MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="upas-azfqrbkyukrzspnsjekytvgddo" Date: Fri, 5 Jul 2002 09:52:24 +0200 Topicbox-Message-UUID: c2bf48bc-eaca-11e9-9e20-41e7f4b1d025 This is a multi-part message in MIME format. --upas-azfqrbkyukrzspnsjekytvgddo Content-Disposition: inline Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit : > In case this solves the problem, we would only have to search for : > programs that don't handle ' ' within file names and fix them; then : > remove quoting from the output of programs that do not output : > commands; then add quoting to the output of programs that output : > commands. : : But this assumes that programs know when they are and aren't : manipulating file names, which is not true in general (see Software My argument doesn't imply that all programs know when they are handling file names... : Tools for a fuller exposition of this philosophy). sort and awk don't : know if field 4 is a file name or not. Plus, as Charles observed, one : might want to quote fields other than file names that contain blanks. ... because in cases like this, I'd say you have to use a sensible file format so that sort and awk could be used later on it. Only programs that know that are producing command lines (which include file names) would need to quote their output file names. Agreement on this? : In some ways, it seems that adopting a non-space delimiter might be : the least painful of the alternatives to deal with file formats. Exactly, as I said in my previous mail, if space is causing a problem within your file format, we could say just that it's not a good file format and it should be changed (If you want quoting you can still include that as part of your format, independently of file names). Any other problem with this approach? --upas-azfqrbkyukrzspnsjekytvgddo Content-Type: message/rfc822 Content-Disposition: inline Received: from mail.cse.psu.edu ([130.203.4.6]) by aquamar; Fri Jul 5 01:30:17 MDT 2002 Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.6.6]) by mail.cse.psu.edu (CSE Mail Server) with ESMTP id CBD501998C; Thu, 4 Jul 2002 19:30:07 -0400 (EDT) Delivered-To: 9fans@cse.psu.edu Received: from collyer.net (adsl-63-192-14-226.dsl.snfc21.pacbell.net [63.192.14.226]) by mail.cse.psu.edu (CSE Mail Server) with SMTP id 98F4419988 for <9fans@cse.psu.edu>; Thu, 4 Jul 2002 19:29:45 -0400 (EDT) Message-ID: <163def7f5ee028605e6ccee38bc7c57f@collyer.net> To: 9fans@cse.psu.edu Subject: Re: [9fans] blanks already handled properly? From: Geoff Collyer MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: 9fans-admin@cse.psu.edu Errors-To: 9fans-admin@cse.psu.edu X-BeenThere: 9fans@cse.psu.edu X-Mailman-Version: 2.0.11 Precedence: bulk Reply-To: 9fans@cse.psu.edu List-Id: Fans of the OS Plan 9 from Bell Labs <9fans.cse.psu.edu> List-Archive: Date: Thu, 4 Jul 2002 16:29:15 -0700 > In case this solves the problem, we would only have to search for > programs that don't handle ' ' within file names and fix them; then > remove quoting from the output of programs that do not output > commands; then add quoting to the output of programs that output > commands. But this assumes that programs know when they are and aren't manipulating file names, which is not true in general (see Software Tools for a fuller exposition of this philosophy). sort and awk don't know if field 4 is a file name or not. Plus, as Charles observed, one might want to quote fields other than file names that contain blanks. In some ways, it seems that adopting a non-space delimiter might be the least painful of the alternatives to deal with file formats. While I'm here, this is the script I use to print hex values and glyphs (if we have any) of characters listed in /lib/unicode: #!/bin/rc # uniquery pattern... - print hex & glyph of any chars matching pattern # in /lib/unicode sts='' for (pat) { hexes = `{grep $pat /lib/unicode | column 1} if (~ $#hexes 0) { echo $0: no such unicode chars: $pat >[1=2] sts='no such chars' } if not for (hex in $hexes) unicode $hex-$hex } exit $sts You can replace "column 1" with "awk '{print $1}'". This is column: #!/bin/rc # column [-F sep] [n...]] - print n'th column(s) rfork e switch ($1) { case -F sep=-F^$2; shift 2 case -F?* sep=-F^`{echo $1 | sed 's/^-F//'}; shift } switch ($#*) { case 0 * = 1 case * if (! ~ $1 [0-9] [0-9][0-9] [0-9][0-9][0-9]) { echo usage: $0 '[-F sep] [n...]' >[1=2] exit usage } } arglist=`{echo $* | sed -e 's/[0-9]+/$&,/g' -e 's/,$//'} exec awk $sep '{print '^$"arglist^'}' Here's a sample use of uniquery: : cpu; uniquery space 0008 0020 00a0   2002   2003   2004   2005   2006   2007   2008   2009   200a   200b ​ 2408 ␈ 2420 ␠ 3000   303f 〿 feff  --upas-azfqrbkyukrzspnsjekytvgddo--