From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from plan9.att.com ([192.20.225.252]) by hawkwind.utcs.utoronto.ca with SMTP id <24103>; Wed, 26 Jul 1995 11:31:50 -0400 From: rob@plan9.att.com To: sam-fans@hawkwind.utcs.toronto.edu Date: Wed, 26 Jul 1995 11:29:33 -0400 Subject: Re: s cmd lossage Message-Id: <95Jul26.113150edt.24103@hawkwind.utcs.utoronto.ca> some time ago - i'm catching up on old sam mail - quanstro@sartre.minerva.bah.com said: this s command lost with the latest (straight from at&t) version of sam. s:(([A-Z][a-z]*[ ]?)+)[ ]+([0-9]+):NAME \1\nPHONE 205 \2\n:g ^- space tab what happened was that \2 was set to \1. i've tried it here and it works as written. let me explain what's going on, because i think you're also seeing it work correctly but it's confusing you. i tried this source: ABCD 01234 and got NAME ABCD PHONE 205 D which is correct. as it says in the manual, the \digit operators on the right side of a substitution refer to the text matched by the subexpression beginning at the digit-th left parenthesis. here \1 would refer to the match of (([A-Z][a-z]*[ ]?)+) which would be ABCD and \2 would refer to the most recent match of ([A-Z][a-z]*[ ]?) which is D confusion comes because of the nesting -- whose meaning is defined by the manual -- and the repetition operator (+) -- whose meaning is not but should be clear from any thought about the implementation. referring to the implementation is the last refuge of the writer of incomplete documentation, but i believe the behavior is reasonable. you wrote a near-nonsense expression and got near-nonsense results. i see no bug here. now you may have some input text that shows other behavior, but if so please interpret the answer carefully before deciding there's a bug. i don't deny there could be one, but nested repeated regexps can be fertile sources of confusion as well as errors. -rob