* pattern matching
@ 1993-06-03 11:17 Brynjulv Hauksson
0 siblings, 0 replies; only message in thread
From: Brynjulv Hauksson @ 1993-06-03 11:17 UTC (permalink / raw)
To: rc
rc has been my login shell for about a year, and I don't understand
how I managed so long using csh. I've read most of the
discussions on this list about things users want added (and sometimes
removed). I've had my own ideas about this from time to time, and
forgotten most of them. There is however one idea I can't quite
get rid of, and which I don't think has been mentioned on the list.
(I'm not sure if the following qualifies as a serious wish for an rc
extension, take it as a suggestion of a possible feature in some
future shell if you wish).
I think the builtin ~ is very useful, but there are times when I wish
it would do more: leave a record somewhere of *what* and *how* the match
succeded. This "record" should be a list, with one element for each
meta-character or simple string segment that matched, in the order they
occurred in the "subject". Example:
subject pattern record
-Php4 -?* ('-' 'P' 'hp4')
I sometimes find myself doing things like:
~ $s pattern && var = `{echo $s | sed 'sed pattern'}
This is a bit clumsy:
- I need to specify a pattern twice, and in two different notations,
since the standard utilities for pattern extraction use
regular expressions and not "glob"-notation.
- if I really want a list result, this can be surprisingly tricky:
var = ``($special){echo $s |
sed 'sed pattern using $special as a field separator'}
can sometimes be made to work, but I need to be very careful about
potential empty strings, and the choice of $special.
I wrote an external command called `match' which could be used like:
; eval var '=(' ``(){match string pattern ...} ')'
using a slightly modified version of the match-routine in the
rc-source (the wanted list result is "almost available" as a side
effect of the ~ command, in the sense that rc's match-routine could
keep track of the needed information easily and cheaply). If string
matches one of the patterns, match succeeds and prints the matching
string in a format suitable for eval to standard output. If there is
no match, it fails and prints nothing.
There are some problems with this - you really need to invent yet
another syntax and semantics for "glob"-pattern matching:
- inserting literal "metacharacters" in the pattern needs a syntax different
from the one rc uses.
- you need to decide how to handle, and distinguish,
various borderline cases.
- matching lists against lists is messy
- I still frequently need to specify patterns *twice*, once for
for checking if there was a match (using `~') and once for doing
the extraction (using `match').
In the end `match' did not turn out to be all that useful, although
I still think "glob"-based pattern extraction could be very useful,
provided it was built into the shell.
Could this facility be grafted on to rc (or an rc-decendant)? Perhaps:
1) pattern matching in switch- and ~-commands could keep track of
how they matched, and quietly assign a suitable list to some special
variable, say `$**', on a succesful match. I don't think the
cost of doing this would be prohibitive, but it would be a feature
you'd have to pay for, whether you used it or not.
(You could possibly reduce the cost somewhat by doing bookkeeping and
assignment only under certain conditions, like if $** was undefined.)
Anyway, this sort of "magic" variable and quiet sideeffect
has a slightly perl-ish flavour, which I'm not sure if I like.
2) you could add a new operator, say `=~',
with syntax and semantics like a cross between rc's `~' and `=':
var =~ subject pattern
could, if it succeded, assign a list of the result to var. Examples:
; x =~ -abcdef -?* && whatis x
x=(- a bcdef)
(One could conceivably make the existing ~-operator take an optional
prefixed variable name. I haven't looked into what sort of complexities
that would introduce into rc's grammar).
Some examples of usage (assuming the $** hack):
# crude basename
fn basename { name = $1 suffix = $2 {
while (~ $name */*) name = $**(3)
~ $name * ^ $suffix && name = $**(1)
echo $name
}}
# remove a trailing newline from a string
; ~ $str * ^ $nl && str = $**(1)
Potential gain from simple builtin pattern extraction:
- most of *my* uses of sed/awk/expr for pattern based string handling
would dissapear. I could use the same syntax and matching semantics as
rc uses for matching. Simple pattern extraction would become
accessible in a very convenient fashion, since I would normally need
to specify a pattern *once* only.
- you would get a limited builtin stringhandling capability. It seems
that most shells, rc included, have builtin facilities for
concatenating strings, while picking them apart is much harder. This
seems like an asymmetry to me, which a builtin "glob" pattern
extraction facility could partially remedy.
- brynjulv
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~1993-06-03 11:17 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1993-06-03 11:17 pattern matching Brynjulv Hauksson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).