rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* Match operator puzzlement
@ 1992-01-31 21:01 Tom Culliton x2278
  1992-02-01 16:02 ` John Mackin
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Culliton x2278 @ 1992-01-31 21:01 UTC (permalink / raw)
  To: rc

Reply-To: srg!culliton@uunet.uu.net

OK maybe it's just 'cuz it's Friday but my poor brain refuses to come
up with a good answer to this one.  While writing some fairly large and
complex rc scripts I had a requirement to match something against a
list of patterns specified at the command line and figured it'd be easy
give rc's ~ match operator.

The first attempt was something like this:

	patterns=`{echo $1}	# go from '*.o *.a' to (*.o *.a)

	# lots of stuffto generate a list on file names....

	for (i in $list) {
		if (~ $i $patterns) {
			dealwith $i
		} else {
			handle $i
		}
	}

Which didn't work as planned for semi-obvious reasons involving
re-scanning.  This didn't distress me too much because I mostly
understood why after a bit of thought.  The next most obvious thing to
try was somthing like this

	patterns=$1	# we don't care if it's a list for this

	# lots of stuff to generate a list on file names....

	for (i in $list) {
		if (eval ~ $i $patterns) {	# etc...

OOPS! I encountered a file name with a $ in it so make that

		if (eval ~ '$i' $patterns) { 	# etc...

But what about patterns with $ and so forth in them?  If we make
$patterns a list again and say '$patterns' to protect against that, we
get literal matching.  I haven't been able to find a way out of the
swamp here any words of wisdom would be appreciated.

Thanks for listening.

Tom


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Match operator puzzlement
  1992-01-31 21:01 Match operator puzzlement Tom Culliton x2278
@ 1992-02-01 16:02 ` John Mackin
  0 siblings, 0 replies; 8+ messages in thread
From: John Mackin @ 1992-02-01 16:02 UTC (permalink / raw)
  To: The rc Mailing List; +Cc: Tom Culliton

Tom Culliton raised some interesting points about pattern matching.

    Which didn't work as planned for semi-obvious reasons involving
    re-scanning.

The reason a straightforward attempt doesn't work isn't really anything
to do with rescanning at all, since there IS no rescanning -- don't
forget that that is rc's main principle: in the absence of 'eval',
which exists to break the rule, there is NEVER rescanning.

The reason it doesn't work is, to quote Byron, for metacharacters in
a ~ pattern to behave as metacharacters, they must appear _literally_
and _unquoted_.  Nothing else will serve; no subterfuge, however subtle,
will make them match unless they are literal and not quoted.

Usually this doesn't present a problem, since a simple eval suffices.
Tom, however, has either a weird application (if this really is a
practical problem) or a curious bent of mind (if it's just a
theoretical one), since he posits:

    OOPS! I encountered a file name with a $ in it so make that

    		if (eval ~ '$i' $patterns) { 	# etc...

    But what about patterns with $ and so forth in them?

Hmm.  Filenames with $ in them?  I didn't know rc had been ported to
VMS :).  Seriously, filenames with $ in them are not a good idea.
Still, the above does deal with that.  As to patterns with $ in
them, that's what makes this an interesting question.  In fact,
let's leave eval aside for a moment, and consider just the question
of how to match a pattern with $ in it.  Now,

	~ 'get$down' 'get$down'

does work, naturally, and as naturally,

	~ 'get$down' get$down

does not, since the $down in the pattern is variable-expanded (into
nothing since I don't have that set).  Everything you would expect
to work, does work.  All these match:

	~ 'get$down' *n
	~ 'get$down' *down
	~ 'get$down' get?down

And this doesn't:

	~ 'get$down' '*$down'

Recall the basic principle: the metacharacter must be literal and
unquoted to be effective.  So leaving eval aside, we have to ask
this question: how can the metacharacter be unquoted, to be effective,
and the $ be quoted, to prevent variable expansion?  When we know
the question, the answer is obvious:

	~ 'get$down' * ^ '$down'

which does indeed match as expected.

The answer to Tom's question is simply to use the same mechanism
along with eval, using the exact code of his last example:

	patterns = $1
	...
		if ( eval ~ '$i' $patterns )

The point, though, is that if the pattern is to contain any of rc's
syntax characters, appropriate quoting must be used.  $ is not the
only character that causes these problems; consider a pattern
containing '=' -- similar hassles arise there.  So one cannot
just write

	cmd '*.o *.a *$bar'

but must rather write

	cmd '*.o *.a * ^ ''$bar'''

I am willing to admit that this is a touch cumbersome.  However,
in closing I'd like to stick up for the way rc works here.  It is
simple and clean and _predictable_, unlike other shells.  I'd hate
to even imagine trying something like this in csh.  And I'd like
to just beat a little harder on an earlier point: UNIX gives us a
hell of a lot of power in many ways.  Not the least of those is
our ability to put any character in a pathname segment other than
NUL or slash.  But, as always, the converse of power is responsibility;
being a properly responsible UNIX citizen means being aware that if
we are going to put characters in pathnames that don't, by all rights,
reasonably belong there (like $), we have to accept the consequences
(our tools get harder to create, and have more work to do).

Of course, the beauty of UNIX is that as long as we _are_ willing
to accept the consequences, we _can_ do it.

And the beauty of rc is that it's easy to see how.

OK,
John.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Match operator puzzlement
@ 1992-02-03 17:09 Byron Rakitzis
  0 siblings, 0 replies; 8+ messages in thread
From: Byron Rakitzis @ 1992-02-03 17:09 UTC (permalink / raw)
  To: rc

Re: evaluating *'s. The rule is simple. Globbing happens after
nearly everything else, so if you have a command that begins
with a ~, the ~ will "steal" the metacharacters:

	foo='*'

	eval ~ bar $foo

gets rescanned as

	~ bar *

which returns true.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Match operator puzzlement
@ 1992-02-02 20:20 Tom Culliton x2278
  0 siblings, 0 replies; 8+ messages in thread
From: Tom Culliton x2278 @ 1992-02-02 20:20 UTC (permalink / raw)
  To: rc

Reply-To: srg!culliton@uunet.uu.net

John Mackin writes:

> The reason a straightforward attempt doesn't work isn't really anything
> to do with rescanning at all, since there IS no rescanning

I thought about adding "or lack thereof", honest I did!  But since it
seemed to imply that not rescanning was wrong it got left out.  The
real difficulty, which I saw quite clearly, is that ~ doesn't accept a
list of patterns (primarily so you can match the empty list) so the
eval is needed to flatten the command out and rescan it (flattening the
list doesn't work because it becomes a single item) but at this point
we get undesired rescanning (in a filename context?) which is not
easily quoted away. 

There also seems to be a minor ambiguity between string matching
contexts and filename expansion contexts. In most contexts $ gives the
contents of variable (or something related) and patterns (using * ? [])
give a list of filenames.  In matching contexts $ works the same but
patterns are compared against strings.  Where the two collide it gets a
bit messy.  Given:

	patterns=('*.o' '*.a')		# or patterns='*.o *.a'
	eval ~ '$i' $patterns

It's not clear to me which context *.o will be evaluated in.  Will it
filename expand to a whole list of patterns which are then matched? 
This matters for obvious reasons.  Maybe Byron can clarify this?

For the curious the application is to scan a development directory tree
automatically generating a makefile/target for configuration management
purposes.  One of the requirements was to be able to exclude files
matching a certain template or set of templates (given on the command
line) from the target, typically things like *.o *.a and other
generated or binary files.  While testing it I actually did encounter a
filename with a $ in it (a backup or temporary file for some app) 8-P
and while I don't like names like that, my program can't just choke on
them!

My less than satisfactory solution was to carefully document that the
patterns had to be quoted, just so, when they are given on the command
line.  Byron's sed script seems like it will solve most of the problem
and John's newline hack should cover any remaining glitches.

Tom


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Match operator puzzlement
@ 1992-02-01 19:46 malte
  0 siblings, 0 replies; 8+ messages in thread
From: malte @ 1992-02-01 19:46 UTC (permalink / raw)
  To: rc

It seems to me that weird problems deserve weird solutions.
I'd suggest to try selfmodifying rc-code, something like

prog = `function_generating_rc_script

eval $prog

Malte.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Match operator puzzlement
@ 1992-02-01 18:43 malte
  0 siblings, 0 replies; 8+ messages in thread
From: malte @ 1992-02-01 18:43 UTC (permalink / raw)
  To: rc

Could someone please give a more elaborated example? I'm afraid
I don't see the point here.

Malte.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Match operator puzzlement
  1992-02-01 17:49 Byron Rakitzis
@ 1992-02-01 18:02 ` John Mackin
  0 siblings, 0 replies; 8+ messages in thread
From: John Mackin @ 1992-02-01 18:02 UTC (permalink / raw)
  To: The rc Mailing List

Byron gives us:

	sed -e 's/\([^[*?]\)/''\1''/g' -e 's/''''//g'

Damn good thinking.  Top notch in fact.  It is possible to make
this work for input containing ', but in an indirect manner.  I think you
have to be indirect since I don't think a grep-style RE can do that.
In practice, there is a neat enough solution: assume that newline
doesn't appear in the input string and initially map ' into newline,
then map that back into '' on output.  As long as the character you
pick doesn't occur in the input it's fine.  Yes, it's a kludge, but
I would be very surprised to find people with real applications for
command-line patterns containing newline.

I'm not prepared to say that it can't be done with an egrep-style (full)
RE but I can't see a solution.  If anyone has one please mail it to the list.

Extending Byron's sed produces this (with $nl being a newline, as usual):

	sed	-e 's/''/\' ^ $nl ^ '/g'	\
		-e 's/\([^[*?]\)/''\1''/g'	\
		-e 's/''''//g'			\
		-e 's/\n/''''/g'

OK,
John.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Match operator puzzlement
@ 1992-02-01 17:49 Byron Rakitzis
  1992-02-01 18:02 ` John Mackin
  0 siblings, 1 reply; 8+ messages in thread
From: Byron Rakitzis @ 1992-02-01 17:49 UTC (permalink / raw)
  To: rc

Hm. I think I can do a little better than John here. The two following
sed substitutions should be ok to quote all characters in a word except
the metacharacters *, ? and [. The sed scripts will FAIL if there are
single quotes in the input, but I'm sure there's a way around that
problem too, I just haven't thought of a good way yet.

Anyway:

First you turn all single non-meta characters into quoted characters:
	
	sed 's/\([^[*?]\)/''\1''/g'	# using rc's quoting rules, of course.

e.g., "Hello?" goes to "'H''e''l''l''o'?"

Now you need to remove all '' sequences:

	sed 's/''''//g'

Now "Hello?" becomes "'Hello'?", which is exactly what we want. Not bad.

(BTW, I think I can prove by induction that this method works for all
strings which do not contain a ', but I do not have enough room
to write the proof here :-)

So the ~ example becomes something like:

	eval ~ 'subject' `{echo $funky_pattern | sed garbage}


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1992-02-03 17:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1992-01-31 21:01 Match operator puzzlement Tom Culliton x2278
1992-02-01 16:02 ` John Mackin
1992-02-01 17:49 Byron Rakitzis
1992-02-01 18:02 ` John Mackin
1992-02-01 18:43 malte
1992-02-01 19:46 malte
1992-02-02 20:20 Tom Culliton x2278
1992-02-03 17:09 Byron Rakitzis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).