From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8005 invoked from network); 10 Dec 1998 23:42:20 -0000 Received: from ns2.primenet.com.au (HELO primenet.com.au) (7795@203.24.36.3) by ns1.primenet.com.au with SMTP; 10 Dec 1998 23:42:19 -0000 Received: (qmail 12396 invoked from network); 10 Dec 1998 16:14:54 -0000 Received: from math.gatech.edu (list@130.207.146.50) by ns2.primenet.com.au with SMTP; 10 Dec 1998 16:14:54 -0000 Received: (from list@localhost) by math.gatech.edu (8.9.1/8.9.1) id LAA00006; Thu, 10 Dec 1998 11:09:18 -0500 (EST) Resent-Date: Thu, 10 Dec 1998 11:09:18 -0500 (EST) Message-Id: <9812101552.AA30992@ibmth.df.unipi.it> To: zsh-workers@math.gatech.edu Subject: Strange substring search behaviour In-Reply-To: "Peter Stephenson"'s message of "Wed, 09 Dec 1998 18:04:27 NFT." <9812091704.AA42751@ibmth.df.unipi.it> Date: Thu, 10 Dec 1998 16:52:52 +0100 From: Peter Stephenson Resent-Message-ID: <"VQCGH2.0.iK7.k8_Rs"@math> Resent-From: zsh-workers@math.gatech.edu X-Mailing-List: archive/latest/4743 X-Loop: zsh-workers@math.gatech.edu Precedence: list Resent-Sender: zsh-workers-request@math.gatech.edu Peter Stephenson wrote: > In fact, the internals are pretty much all there to be able to replace > the shortest match instead of the longest match for the pattern. The > only thing missing is the syntax. I decided on a syntax: S for shortest substring; the substring flag is not used for substitutions otherwise. However, I discovered an ambiguity I wasn't aware of. The form ${(S)foo#bar} is supposed to find substrings in $foo, using the shortest match (## would give the longest match). But (the M flag means print the portion actually matched rather than the string with that deleted, it doesn't affect what actually matches): % foo="twinkle twinkle little star" % print ${(M)foo#t*e} # shortest match of t*e at head twinkle # so far so good % print ${(MS)foo#t*e} # same but look for substrings tle This suprised me. I would have expected it to start from the head, and look for the shortest string that matches there, and carry on down the string looking for the shortest match from any position. Instead it looks for the shortest *possible* match *anywhere*. Maybe I should have guessed? It makes it difficult for shortest-match substitution, since that has to start from the beginning and go down the string (i.e., I wanted ${(S)foo//t*e/spy} to print `spy spy lispy star' and this posting came about because it didn't). Furthermore, this makes it a little strange when used with the I.n. flag, which tells you to use the n'th match. % print ${(MSI.1.)foo#t*e} tle # first match: shortest % print ${(MSI.2.)foo#t*e} ttle # second match: second shortest % print ${(MSI.3.)foo#t*e} twinkle # first occurrence of third shortest % print ${(MSI.4.)foo#t*e} twinkle # the other twinkle % print ${(MSI.5.)foo#t*e} twinkle little # all rather interesting... % print ${(MSI.6.)foo#t*3} twinkle twinkle # ...in its own way... % print ${(MSI.7.)foo#t*e} twinkle twinkle little # ...but is it right? # (in fact, that's the *longest* match). I would have expected `twinkle', `twinkle', `ttle' and `tle' (the last has already gone by then if you're doing a global substitution so doesn't get replaced), i.e. the shortest matches from each position in order of finding. I'd quite like to rewrite the whole thing the way my original inclinations told me. Any comments? In other words, does anyone think they or anyone else is expecting to find the globally shortest match first? Should I ask for a vote on zsh-users? -- Peter Stephenson Tel: +39 050 844536 WWW: http://www.ifh.de/~pws/ Dipartimento di Fisica, Via Buonarroti 2, 56127 Pisa, Italy