From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-request@math.gatech.edu>
Received: (qmail 8005 invoked from network); 10 Dec 1998 23:42:20 -0000
Received: from ns2.primenet.com.au (HELO primenet.com.au) (7795@203.24.36.3)
  by ns1.primenet.com.au with SMTP; 10 Dec 1998 23:42:19 -0000
Received: (qmail 12396 invoked from network); 10 Dec 1998 16:14:54 -0000
Received: from math.gatech.edu (list@130.207.146.50)
  by ns2.primenet.com.au with SMTP; 10 Dec 1998 16:14:54 -0000
Received: (from list@localhost)
	by math.gatech.edu (8.9.1/8.9.1) id LAA00006;
	Thu, 10 Dec 1998 11:09:18 -0500 (EST)
Resent-Date: Thu, 10 Dec 1998 11:09:18 -0500 (EST)
Message-Id: <9812101552.AA30992@ibmth.df.unipi.it>
To: zsh-workers@math.gatech.edu
Subject: Strange substring search behaviour
In-Reply-To: "Peter Stephenson"'s message of "Wed, 09 Dec 1998 18:04:27 NFT."
             <9812091704.AA42751@ibmth.df.unipi.it> 
Date: Thu, 10 Dec 1998 16:52:52 +0100
From: Peter Stephenson <pws@ibmth.df.unipi.it>
Resent-Message-ID: <"VQCGH2.0.iK7.k8_Rs"@math>
Resent-From: zsh-workers@math.gatech.edu
X-Mailing-List: <zsh-workers@math.gatech.edu> archive/latest/4743
X-Loop: zsh-workers@math.gatech.edu
Precedence: list
Resent-Sender: zsh-workers-request@math.gatech.edu

Peter Stephenson wrote:
> In fact, the internals are pretty much all there to be able to replace
> the shortest match instead of the longest match for the pattern.  The
> only thing missing is the syntax.

I decided on a syntax:  S for shortest substring; the substring flag
is not used for substitutions otherwise.

However, I discovered an ambiguity I wasn't aware of.  The form
${(S)foo#bar} is supposed to find substrings in $foo, using the
shortest match (## would give the longest match).  But (the M flag
means print the portion actually matched rather than the string with
that deleted, it doesn't affect what actually matches):

% foo="twinkle twinkle little star"
% print ${(M)foo#t*e}                # shortest match of t*e at head
twinkle                              # so far so good
% print ${(MS)foo#t*e}               # same but look for substrings
tle

This suprised me.  I would have expected it to start from the head,
and look for the shortest string that matches there, and carry on down
the string looking for the shortest match from any position.  Instead
it looks for the shortest *possible* match *anywhere*.  Maybe I should
have guessed?  It makes it difficult for shortest-match substitution,
since that has to start from the beginning and go down the string
(i.e., I wanted ${(S)foo//t*e/spy} to print `spy spy lispy star' and
this posting came about because it didn't).

Furthermore, this makes it a little strange when used with the I.n. flag,
which tells you to use the n'th match.

% print ${(MSI.1.)foo#t*e} 
tle                                  # first match: shortest
% print ${(MSI.2.)foo#t*e} 
ttle                                 # second match: second shortest
% print ${(MSI.3.)foo#t*e} 
twinkle                              # first occurrence of third shortest
% print ${(MSI.4.)foo#t*e} 
twinkle                              # the other twinkle
% print ${(MSI.5.)foo#t*e} 
twinkle little                       # all rather interesting...
% print ${(MSI.6.)foo#t*3} 
twinkle twinkle                      # ...in its own way...
% print ${(MSI.7.)foo#t*e} 
twinkle twinkle little               # ...but is it right?
                                     # (in fact, that's the *longest* match).

I would have expected `twinkle', `twinkle', `ttle' and `tle' (the last
has already gone by then if you're doing a global substitution so
doesn't get replaced), i.e. the shortest matches from each position in
order of finding.

I'd quite like to rewrite the whole thing the way my original
inclinations told me.  Any comments?  In other words, does anyone
think they or anyone else is expecting to find the globally shortest
match first?  Should I ask for a vote on zsh-users?

-- 
Peter Stephenson <pws@ibmth.df.unipi.it>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarroti 2, 56127 Pisa, Italy