rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* Field splitting in backquote substitutions
@ 1991-06-28 16:54 Mark-Jason Dominus
  0 siblings, 0 replies; 3+ messages in thread
From: Mark-Jason Dominus @ 1991-06-28 16:54 UTC (permalink / raw)
  To: rc; +Cc: mjd


    The field splitting rule in rc is very consistent.  If $ifs is 
(' ' '\t' '\n'), then any space, tab, or newline starts a new field.

This means that if you do someting like

	for (i in `{echo 'foo  to   you'} )
		echo $i

you get

	foo

	to


	you

which is probably not what you wanted, and maybe not what you expected
either since the csh and sh backquotes would have yielded

	foo
	to
	you

instead.

    In awk, FS can be an arbitrary regular expression, and any string
which matches FS is taken as a field separator.  But there's a special
case: if FS is a single blank, then it really means that any sequence of
spaces, tabs, and/or newlines constitute a field separator.

    It sems like a mistake to put a fully general regular-expression
handler into rc just to allow nice field separators, but at the same
time it's hard to see a good way to get the expected backquote behavior.
Does anyone have any good ideas?

    A related question: Does anyone know a sed pattern which replaces
arbitrary sequences of blanks and tabs with a single blank each?  Note
that neither 's/[ \t][ \t]+/ /' nor 's/[ \t][ \t]+/ /g' works.

  The night is pleasing to us because, like memory, it erases idle details.
Mark-Jason Dominus 	  			    mjd@central.cis.upenn.edu 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Field splitting in backquote substitutions
       [not found] <9106281654.AA14063@saul.cis.upenn.edu>
@ 1991-07-03  1:05 ` Dave Mason
  0 siblings, 0 replies; 3+ messages in thread
From: Dave Mason @ 1991-07-03  1:05 UTC (permalink / raw)
  To: rc; +Cc: mjd

>     The field splitting rule in rc is very consistent.  If $ifs is 
> (' ' '\t' '\n'), then any space, tab, or newline starts a new field.
>[...]
>     In awk, FS can be an arbitrary regular expression, and any string
> which matches FS is taken as a field separator.  But there's a special
> case: if FS is a single blank, then it really means that any sequence of
> spaces, tabs, and/or newlines constitute a field separator.
> 
>     It sems like a mistake to put a fully general regular-expression
> handler into rc just to allow nice field separators, but at the same
> time it's hard to see a good way to get the expected backquote behavior.
> Does anyone have any good ideas?

I'd really like both the new and old behaviours on demand.  If you're
splitting things like /etc/passwd or a path name, you actually want all the
empty strings from adjacent terminators.  How about a special first
character in ifs, for example:
	`` : {echo 'a::b:c}
would give:
	'a' '' 'b' 'c'
but:
	`` '*.*:' {echo '...This...****is:::*.a.test::::'}
would give:
	'This' 'is' 'a' 'test'
and ifs would be initialized to '* \t\n', and a '*' in any other position
would treat it as a separater character.

Note we don't need full regular expressions, so why put them in.  This
simple addition would give both of the commonly useful behaviours.

	../Dave


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Field splitting in backquote substitutions
@ 1991-06-28 17:10 John Mackin
  0 siblings, 0 replies; 3+ messages in thread
From: John Mackin @ 1991-06-28 17:10 UTC (permalink / raw)
  To: Mark-Jason Dominus; +Cc: The rc Mailing List

        The field splitting rule in rc is very consistent.  If $ifs is 
    (' ' '\t' '\n'), then any space, tab, or newline starts a new field.

    This means that if you do someting like [...] which is probably not
    what you wanted, and maybe not what you expected either [...]

Damn right it's not what you wanted or what you expected.  Byron and I
discussed this in mail just a few days ago; he considered it a feature
initially, but has agreed that it is a feature that will be removed.
ifs substitution in rc will work like that in other shells.  That,
after all, is the only sensible way for it to work.  To have leading
whitespace introduce leading null strings, and embedded successive
ones doing the same thing, is just right out.  It may well be fine
for awk -- as I said to Byron, if you want to split the password
file, set FS to : and use awk!  In a shell, when I do

	t = `{ ls -l file }

I don't care how many spaces separated the columns; I want the columns
to be the elements of the resulting lists.  If you think about it,
you will see that it _must_ be this way.

    A related question: Does anyone know a sed pattern which replaces
    arbitrary sequences of blanks and tabs with a single blank each?  Note
    that neither 's/[ \t][ \t]+/ /' nor 's/[ \t][ \t]+/ /g' works.

I am confused.  Is there some subtlety here I have missed, or do you
(Mark) have a truly bizarre version of sed, or did you just not RTFM,
or what?  Allowing for the fact that you have adopted "\t" here
merely as a typographical convention to make the tabs visible in
your mail, since no version of sed known to me will interpret \t
as a tab -- even allowing for that, this is too easy.  For starters,
no version of sed allows + as a metacharacter; and if it did, those
patterns would be wrong anyway, since they would replace only
runs of MULTIPLE blanks and tabs with a single space, which is not
what I read "arbitrary sequences" to mean.  If sed did allow +,
the solution would be:

	sed 's/[ \t]+/ /g'

(where, of course, \t means you type a literal tab character).  But,
sed doesn't allow +, so you do the same thing using *:

	sed 's/[ \t][ \t]*/ /g'

I have tested this.  It works.  If it doesn't work for you, then
either I have misunderstood the question (more likely) or your sed
is broken (less likely, but anything is possible).

OK,
John.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~1991-07-03  1:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1991-06-28 16:54 Field splitting in backquote substitutions Mark-Jason Dominus
1991-06-28 17:10 John Mackin
     [not found] <9106281654.AA14063@saul.cis.upenn.edu>
1991-07-03  1:05 ` Dave Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).