The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] RegExp decision for meta characters: Circumflex
@ 2021-09-17  8:52 markus schnalke
  2021-09-17  9:32 ` Rob Pike
  0 siblings, 1 reply; 8+ messages in thread
From: markus schnalke @ 2021-09-17  8:52 UTC (permalink / raw)
  To: tuhs

Hoi,

I'm interested in the early design decisions for meta characters
in REs, mainly regarding Ken's RE implementation in ed.

Two questions:

1) Circumflex

As far as I see, the circumflex (^) is the only meta character that
has two different special meanings in REs: First being the
beginning of line anchor and second inverting a character class.
Why was it chosen for the second one? Why not the exclamation mark
in that case? (Sure, C didn't exist by then, but the bang probably
was used to negate in other languages of the time, I think.)

2) Symbol for the end of line anchor

What is the reason that the beginning of line and end of line
anchors are different symbols? Is there a reason why not only one
symbol, say the circumflex, was chosen to represent both? I
currently see no disadvantages of such a design. (Circumflexes
aren't likely to end lines of text, neither.)

I would appreciate if you could help me understand these design
decisions better. Maybe there existed RE notations that were simply
copied ...


meillo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [TUHS] RegExp decision for meta characters: Circumflex
  2021-09-17  8:52 [TUHS] RegExp decision for meta characters: Circumflex markus schnalke
@ 2021-09-17  9:32 ` Rob Pike
  2021-09-17  9:32   ` Rob Pike
  2021-09-17 10:10   ` markus schnalke
  0 siblings, 2 replies; 8+ messages in thread
From: Rob Pike @ 2021-09-17  9:32 UTC (permalink / raw)
  To: markus schnalke; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1424 bytes --]

You'd have to ask ken why he chose the characters he did, but I can answer
the second question. The beginning and end of line are the same. If you
make ^ mean both beginning and end of line, what does this ed command do:

s/^/x/

Which end gets the x?

-rob


On Fri, Sep 17, 2021 at 7:00 PM markus schnalke <meillo@marmaro.de> wrote:

> Hoi,
>
> I'm interested in the early design decisions for meta characters
> in REs, mainly regarding Ken's RE implementation in ed.
>
> Two questions:
>
> 1) Circumflex
>
> As far as I see, the circumflex (^) is the only meta character that
> has two different special meanings in REs: First being the
> beginning of line anchor and second inverting a character class.
> Why was it chosen for the second one? Why not the exclamation mark
> in that case? (Sure, C didn't exist by then, but the bang probably
> was used to negate in other languages of the time, I think.)
>
> 2) Symbol for the end of line anchor
>
> What is the reason that the beginning of line and end of line
> anchors are different symbols? Is there a reason why not only one
> symbol, say the circumflex, was chosen to represent both? I
> currently see no disadvantages of such a design. (Circumflexes
> aren't likely to end lines of text, neither.)
>
> I would appreciate if you could help me understand these design
> decisions better. Maybe there existed RE notations that were simply
> copied ...
>
>
> meillo
>

[-- Attachment #2: Type: text/html, Size: 1904 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [TUHS] RegExp decision for meta characters: Circumflex
  2021-09-17  9:32 ` Rob Pike
@ 2021-09-17  9:32   ` Rob Pike
  2021-09-17 10:10   ` markus schnalke
  1 sibling, 0 replies; 8+ messages in thread
From: Rob Pike @ 2021-09-17  9:32 UTC (permalink / raw)
  To: markus schnalke; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

*NOT* the same. Sorry....

I hope the example explains better than my prose.

-rob


On Fri, Sep 17, 2021 at 7:32 PM Rob Pike <robpike@gmail.com> wrote:

> You'd have to ask ken why he chose the characters he did, but I can answer
> the second question. The beginning and end of line are the same. If you
> make ^ mean both beginning and end of line, what does this ed command do:
>
> s/^/x/
>
> Which end gets the x?
>
> -rob
>
>
> On Fri, Sep 17, 2021 at 7:00 PM markus schnalke <meillo@marmaro.de> wrote:
>
>> Hoi,
>>
>> I'm interested in the early design decisions for meta characters
>> in REs, mainly regarding Ken's RE implementation in ed.
>>
>> Two questions:
>>
>> 1) Circumflex
>>
>> As far as I see, the circumflex (^) is the only meta character that
>> has two different special meanings in REs: First being the
>> beginning of line anchor and second inverting a character class.
>> Why was it chosen for the second one? Why not the exclamation mark
>> in that case? (Sure, C didn't exist by then, but the bang probably
>> was used to negate in other languages of the time, I think.)
>>
>> 2) Symbol for the end of line anchor
>>
>> What is the reason that the beginning of line and end of line
>> anchors are different symbols? Is there a reason why not only one
>> symbol, say the circumflex, was chosen to represent both? I
>> currently see no disadvantages of such a design. (Circumflexes
>> aren't likely to end lines of text, neither.)
>>
>> I would appreciate if you could help me understand these design
>> decisions better. Maybe there existed RE notations that were simply
>> copied ...
>>
>>
>> meillo
>>
>

[-- Attachment #2: Type: text/html, Size: 2413 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [TUHS] RegExp decision for meta characters: Circumflex
  2021-09-17  9:32 ` Rob Pike
  2021-09-17  9:32   ` Rob Pike
@ 2021-09-17 10:10   ` markus schnalke
  1 sibling, 0 replies; 8+ messages in thread
From: markus schnalke @ 2021-09-17 10:10 UTC (permalink / raw)
  To: Rob Pike; +Cc: TUHS main list

Hoi.

[2021-09-17 11:32] Rob Pike <robpike@gmail.com>
>
> You'd have to ask ken why he chose the characters he did, but I can answer the
> second question. The beginning and end of line are the same. If you make ^ mean
> both beginning and end of line, what does this ed command do:
> 
> s/^/x/
> 
> Which end gets the x?

Perfect answer! I just never thought about replacing. *oops*
Now that's obvious. ;-)


meillo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [TUHS] RegExp decision for meta characters: Circumflex
  2021-09-18  1:03   ` Greg 'groggy' Lehey
@ 2021-09-18  1:23     ` Bakul Shah
  0 siblings, 0 replies; 8+ messages in thread
From: Bakul Shah @ 2021-09-18  1:23 UTC (permalink / raw)
  To: Greg 'groggy' Lehey; +Cc: tuhs

On Sep 17, 2021, at 6:03 PM, Greg 'groggy' Lehey <grog@lemis.com> wrote:
> 
> On Friday, 17 September 2021 at 13:40:25 -0700, Chris Torek wrote:
>> Also worth noting, though the precise history predates my own
>> experience: it's common in grammar theory to use `$` as the end
>> symbol.  Was this from REs using `$` as an end symbol as well,
>> or did REs adopt `$` from here, or ...?
> 
> Weren't there programming languages that used $ as a statement
> terminator instead of ;?

IIRC Macsyma used ; as well as $ as statement terminators.
; if you wanted to print the output of an expresion,
$ if you wanted to suppress the output. But I suspect 
it is not related to this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [TUHS] RegExp decision for meta characters: Circumflex
  2021-09-17 20:40 ` Chris Torek
@ 2021-09-18  1:03   ` Greg 'groggy' Lehey
  2021-09-18  1:23     ` Bakul Shah
  0 siblings, 1 reply; 8+ messages in thread
From: Greg 'groggy' Lehey @ 2021-09-18  1:03 UTC (permalink / raw)
  To: Chris Torek; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

On Friday, 17 September 2021 at 13:40:25 -0700, Chris Torek wrote:
> Also worth noting, though the precise history predates my own
> experience: it's common in grammar theory to use `$` as the end
> symbol.  Was this from REs using `$` as an end symbol as well,
> or did REs adopt `$` from here, or ...?

Weren't there programming languages that used $ as a statement
terminator instead of ;?

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA.php

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [TUHS] RegExp decision for meta characters: Circumflex
  2021-09-17 16:40 Douglas McIlroy
@ 2021-09-17 20:40 ` Chris Torek
  2021-09-18  1:03   ` Greg 'groggy' Lehey
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Torek @ 2021-09-17 20:40 UTC (permalink / raw)
  To: tuhs

Also worth noting, though the precise history predates my own
experience: it's common in grammar theory to use `$` as the end
symbol.  Was this from REs using `$` as an end symbol as well,
or did REs adopt `$` from here, or ...?

Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] RegExp decision for meta characters: Circumflex
@ 2021-09-17 16:40 Douglas McIlroy
  2021-09-17 20:40 ` Chris Torek
  0 siblings, 1 reply; 8+ messages in thread
From: Douglas McIlroy @ 2021-09-17 16:40 UTC (permalink / raw)
  To: TUHS main list

> Maybe there existed RE notations that were simply copied ...

Ed was derived from Ken's earlier qed. Qed's descendant in Multics was
described in a 1969 GE document:
http://www.bitsavers.org/pdf/honeywell/multics/swenson/6906.multics-condensed-guide.pdf.
Unfortunately it describes regular expressions only sketchily by
example. However, alternation, symbolized by | with grouping by
parentheses, was supported in qed, whereas alternation was omitted
from ed. The GE document does not mention character classes; an
example shows how to use alternation for the same purpose.
Beginning-of-line is specified by a logical-negation symbol. In
apparent contradiction, the v1 manual says the meanings of [ and ^ are
the same in ed and (an unspecified version of) qed. My guess about the
discrepancies is no better than yours.

(I am amused by the title "condensed guide" for a manual in which each
qed request gets a full page of explanation. It exemplifies how Unix
split from Multics in matters of taste.)

Doug

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-18  1:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17  8:52 [TUHS] RegExp decision for meta characters: Circumflex markus schnalke
2021-09-17  9:32 ` Rob Pike
2021-09-17  9:32   ` Rob Pike
2021-09-17 10:10   ` markus schnalke
2021-09-17 16:40 Douglas McIlroy
2021-09-17 20:40 ` Chris Torek
2021-09-18  1:03   ` Greg 'groggy' Lehey
2021-09-18  1:23     ` Bakul Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).