The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] RegExp decision for meta characters: Circumflex
@ 2021-09-17  8:52 markus schnalke
  2021-09-17  9:32 ` Rob Pike
  0 siblings, 1 reply; 8+ messages in thread
From: markus schnalke @ 2021-09-17  8:52 UTC (permalink / raw)
  To: tuhs

Hoi,

I'm interested in the early design decisions for meta characters
in REs, mainly regarding Ken's RE implementation in ed.

Two questions:

1) Circumflex

As far as I see, the circumflex (^) is the only meta character that
has two different special meanings in REs: First being the
beginning of line anchor and second inverting a character class.
Why was it chosen for the second one? Why not the exclamation mark
in that case? (Sure, C didn't exist by then, but the bang probably
was used to negate in other languages of the time, I think.)

2) Symbol for the end of line anchor

What is the reason that the beginning of line and end of line
anchors are different symbols? Is there a reason why not only one
symbol, say the circumflex, was chosen to represent both? I
currently see no disadvantages of such a design. (Circumflexes
aren't likely to end lines of text, neither.)

I would appreciate if you could help me understand these design
decisions better. Maybe there existed RE notations that were simply
copied ...


meillo

^ permalink raw reply	[flat|nested] 8+ messages in thread
* [TUHS] RegExp decision for meta characters: Circumflex
@ 2021-09-17 16:40 Douglas McIlroy
  2021-09-17 20:40 ` Chris Torek
  0 siblings, 1 reply; 8+ messages in thread
From: Douglas McIlroy @ 2021-09-17 16:40 UTC (permalink / raw)
  To: TUHS main list

> Maybe there existed RE notations that were simply copied ...

Ed was derived from Ken's earlier qed. Qed's descendant in Multics was
described in a 1969 GE document:
http://www.bitsavers.org/pdf/honeywell/multics/swenson/6906.multics-condensed-guide.pdf.
Unfortunately it describes regular expressions only sketchily by
example. However, alternation, symbolized by | with grouping by
parentheses, was supported in qed, whereas alternation was omitted
from ed. The GE document does not mention character classes; an
example shows how to use alternation for the same purpose.
Beginning-of-line is specified by a logical-negation symbol. In
apparent contradiction, the v1 manual says the meanings of [ and ^ are
the same in ed and (an unspecified version of) qed. My guess about the
discrepancies is no better than yours.

(I am amused by the title "condensed guide" for a manual in which each
qed request gets a full page of explanation. It exemplifies how Unix
split from Multics in matters of taste.)

Doug

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-18  1:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17  8:52 [TUHS] RegExp decision for meta characters: Circumflex markus schnalke
2021-09-17  9:32 ` Rob Pike
2021-09-17  9:32   ` Rob Pike
2021-09-17 10:10   ` markus schnalke
2021-09-17 16:40 Douglas McIlroy
2021-09-17 20:40 ` Chris Torek
2021-09-18  1:03   ` Greg 'groggy' Lehey
2021-09-18  1:23     ` Bakul Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).