rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* pattern matching
@ 1993-06-03 11:17 Brynjulv Hauksson
  0 siblings, 0 replies; only message in thread
From: Brynjulv Hauksson @ 1993-06-03 11:17 UTC (permalink / raw)
  To: rc

rc has been my login shell for about a year, and I don't understand
how I managed so long using csh.  I've read most of the 
discussions on this list about things users want added (and sometimes
removed).  I've had my own ideas about this from time to time, and
forgotten most of them.  There is however one idea I can't quite
get rid of, and which I don't think has been mentioned on the list.
(I'm not sure if the following qualifies as a serious wish for an rc
extension, take it as a suggestion of a possible feature in some
future shell if you wish).

I think the builtin ~ is very useful, but there are times when I wish
it would do more:  leave a record somewhere of *what* and *how* the match
succeded. This "record" should be a list, with one element for each
meta-character or simple string segment that matched, in the order they
occurred in the "subject".  Example:
	subject	pattern	record
	-Php4	-?*	('-' 'P' 'hp4')

I sometimes find myself doing things like:
	~ $s pattern && var = `{echo $s | sed 'sed pattern'}

This is a bit clumsy:  
- I need to specify a pattern twice, and in two different notations,
  since the standard utilities for pattern extraction use
  regular expressions and not "glob"-notation.
- if I really want a list result, this can be surprisingly tricky:
	var = ``($special){echo $s | 
		sed 'sed pattern using $special as a field separator'}
  can sometimes be made to work, but I need to be very careful about 
  potential empty strings, and the choice of $special.

I wrote an external command called `match' which could be used like:
	; eval var '=(' ``(){match string pattern ...} ')' 
using a slightly modified version of the match-routine in the
rc-source (the wanted list result is "almost available" as a side
effect of the ~ command, in the sense that rc's match-routine could
keep track of the needed information easily and cheaply).  If string
matches one of the patterns, match succeeds and prints the matching
string in a format suitable for eval to standard output. If there is
no match, it fails and prints nothing.

There are some problems with this - you really need to invent yet 
another syntax and semantics for "glob"-pattern matching: 
- inserting literal "metacharacters" in the pattern needs a syntax different
  from the one rc uses.
- you need to decide how to handle, and distinguish, 
  various borderline cases.
- matching lists against lists is messy
- I still frequently need to specify patterns *twice*, once for
  for checking if there was a match (using `~') and once for doing
  the extraction (using `match').
In the end `match' did not turn out to be all that useful, although
I still think "glob"-based pattern extraction could be very useful,
provided it was built into the shell.

Could this facility be grafted on to rc (or an rc-decendant)? Perhaps:

1) pattern matching in switch- and ~-commands could keep track of
how they matched, and quietly assign a suitable list to some special
variable, say `$**', on a succesful match. I don't think the 
cost of doing this would be prohibitive, but it would be a feature 
you'd have to pay for, whether you used it or not.
(You could possibly reduce the cost somewhat by doing bookkeeping and
assignment only under certain conditions, like if $** was undefined.)
Anyway, this sort of "magic" variable and quiet sideeffect 
has a slightly perl-ish flavour, which I'm not sure if I like.

2) you could add a new operator, say `=~', 
with syntax and semantics like a cross between rc's `~' and `=':
	var =~ subject pattern
could, if it succeded, assign a list of the result to var. Examples:
	; x =~ -abcdef -?* && whatis x
	x=(- a bcdef)
(One could conceivably make the existing ~-operator take an optional 
prefixed variable name. I haven't looked into what sort of complexities
that would introduce into rc's grammar).

Some examples of usage (assuming the $** hack):
	# crude basename
	fn basename { name = $1 suffix = $2 {
		while (~ $name */*) name = $**(3)
		~ $name * ^ $suffix && name = $**(1)
		echo $name
	}}
	# remove a trailing newline from a string
	; ~ $str * ^ $nl && str = $**(1)

Potential gain from simple builtin pattern extraction:

- most of *my* uses of sed/awk/expr for pattern based string handling
would dissapear. I could use the same syntax and matching semantics as
rc uses for matching. Simple pattern extraction would become
accessible in a very convenient fashion, since I would normally need
to specify a pattern *once* only.

- you would get a limited builtin stringhandling capability. It seems
that most shells, rc included, have builtin facilities for
concatenating strings, while picking them apart is much harder. This
seems like an asymmetry to me, which a builtin "glob" pattern
extraction facility could partially remedy.

- brynjulv


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~1993-06-03 11:17 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1993-06-03 11:17 pattern matching Brynjulv Hauksson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).