rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* read; string manipulation
@ 1993-09-20 19:08 "D. Hugh Redelmeier"
  0 siblings, 0 replies; only message in thread
From: "D. Hugh Redelmeier" @ 1993-09-20 19:08 UTC (permalink / raw)
  To: rc

| I feel that the man page for and implementation of a shell level scanf
| will be horrid.

As is the manual for the C scanf!  What a crock.  [I'm an ex-X3J11
member.]

================

| The people who have expressed a desire for a built in read (including
| myself) have normally used performance as their justification (in the
| same way that echo is justified).

I want it for performance reasons.  But unlike most performance
reasons, the difference is so great that it is a qualitative
difference.  The performance difference is great enough to
legitimately affect ones programming style.

I also want one for the practical reason of making it part of the
common language of rc.  This is different from making it a
primitive in the implementation sense.

In my experience with sh, I have found "read" to be quite useful.
On the other hand, very few of my uses of read do anything but stuff
the whole line, with no interpretation, into a variable.  I think
that using IFS or scanf-like semantics is overkill.

================

WARNING: the following is not politically correct.  It is also
veering off topic.  Furthermore, it is a poorly thought-out "dump"
of my thoughts -- I don't have the time to put them in order.  I
hope that it inspires some thought.

I do see a need for more string manipulation in rc, and not tied the
"read" as scanf would be.  I realize that going too far would be a
mistake, but sed is so awkward and slow.  I don't have a concrete
suggestion.  What I have most felt the need for is a facility like
ed's (and sed's) substitute command, including the \( and \)
capability.

I see three parts in the design of such a construct:
- how to analyze a string (parse it)
- how to synthesize a string (build a new string that depends on
  parts of the old)
- how to jam this into RC

Analysis facility: to make it more rc-like, I would use the less-powerful
sh/rc regular expressions, not ed's.

We need a clean and simple notation for string synthesis.  This is a
difficult area in which to strike the right balance between
simplicity and power.  I favour simplicity -- if you really need
power, use another tool.

I don't have a recommendation for synthesis, but I do have a few
thoughts.  I welcome any others.

====

I recently discovered the MSDOS command "rename" has an interesting
feature that perhaps indicates a way.  An example first:

	ren dog.* cat.*

This means: for each file matching dog.*, rename it cat, with the
same extension as it had before.  Any file whose name matches
the first regular expression is renamed to the second regular
expression with each operator in the second replaced with the
characters that matched the corresponding (!) operator in the first
expression.

Clearly this does not suit rc-style command execution since
globing is done before the command is invoked (except for a few
built-in commands in rc).

What it does is present a cute notation for string synthesis where a
correspondence is imputed between regular expression operators.
This would obviate the need for the \( and \) notation, at some
expense in power, and *perhaps* an increase in intuitiveness and
visual clarity.  The main loss of power is that components cannot be
freely reordered.  [Warning: this kind of logic might lead to
something like C's bizarre declaration syntax.]

I leave this path unexplored.

====

A regular expression match in rc could have the side effect of
setting variables with the matched components.

- There are too many components to have a variable for each.
  Perhaps a variable for each explicit RE operator would make
  sense.  The other parts are literal, so they can simply be retyped.

- The variables could reasonably be a single list variable, with
  subscription used to access the desired component.  There needs to
  be a convention for numbering the components.  The ed \( \)
  convention of numbering left to right seems obvious.

- On the other hand, SNOBOL allows naming of matched components.
  This is quite useful and powerful, but in my opinion it makes the
  patterns messier.  I think we can afford to live without this
  power.

- What if several strings are matched by a regular expression?
  Where could the matches for each be recorded?  A matrix would be
  needed.  A different list variable could be used for each matched
  string, or, transposing, a different list variable could be used
  for each regular expression operator.  Simpler would be to only
  record the last match's parse.

====

Where would such a feature hook in?

- The ~ command could have the side-effect of setting the positional
  variables

	if ( ~ $x *.c )
		dest=$match(1).o

- The switch statement could have that side-effect

	switch ( $x ) {
	case *.c
		dest=$match(1).o
	}

- The for statement could have the side-effect, or even to set its
  control variable to a synthetic expression.  (Perhaps someone knows
  how Miranda list comprehensions and can import some inspiration.)

	for (x is $match(1).o in *.c) {
		...
	}

I wish there were a pretty functional notation.

The way ~ takes multiple patterns seems a bit weak.  For our
purpose, it would be nice if the syntax of ~ were:

	~ subject pattern [synthesis]

so that the result could optionally be specified by some sort of
synthesizing expression (perhaps even the MSDOS rename kind).  Of
course, the result is in the result variable, so we still don't have
a functional notation.

Hugh Redelmeier
hugh@mimosa.com or {utcsri, uunet!attcan, utzoo, scocan}!redvax!hugh
When all else fails: hugh@csri.toronto.edu
voice: +1 416 482-8253


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~1993-09-20 19:09 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1993-09-20 19:08 read; string manipulation "D. Hugh Redelmeier"

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).