zsh-workers
 help / color / mirror / code / Atom feed
* a='foo"'; echo ${a/foo"/"bar} outputs bar
@ 2022-12-11 17:50 Stephane Chazelas
  2022-12-13  6:21 ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Stephane Chazelas @ 2022-12-11 17:50 UTC (permalink / raw)
  To: Zsh hackers list

$ a='foo"'
$ echo ${a/foo"/"bar}
bar

Whether quotes should escape the "/" is not clearly documented,
though the doc does tell us to use backslash for that.

However the fact that the first " is taken as being part of the
pattern while the second one is removed doesn't make much sense.

In ksh93, bash and mksh, quotes can be used to escape the /,
i.e. one can do:

$ a='/x/y/z' bash -c 'echo "${a/"/y/"/+}"'
/x+z
$ a='/x/y/z' ksh -c 'echo "${a/"/y/"/+}"'
/x+z
$ a='/x/y/z' mksh -c 'echo "${a/"/y/"/+}"'
/x+z

Maybe zsh could align with them?

The csh-style one looks better:

$ a='foo"' zsh -c 'echo ${a:s/foo"/"bar}'
bar"

quotes don't escape the / either but at least the " is to taken  as part of the
pattern.

Better than tcsh anyway:

$ a='foo"' tcsh -c 'echo ${a:s/foo"/"bar}'
^C^C^C^Z
zsh: suspended  a='foo"' tcsh -c 'echo ${a:s/foo"/"bar}'
(148)$ kill %

(ended up using all my RAM).

-- 
Stephane



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar
  2022-12-11 17:50 a='foo"'; echo ${a/foo"/"bar} outputs bar Stephane Chazelas
@ 2022-12-13  6:21 ` Bart Schaefer
  2023-08-07  2:36   ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2022-12-13  6:21 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Dec 11, 2022 at 9:50 AM Stephane Chazelas <stephane@chazelas.org> wrote:
>
> $ a='foo"'
> $ echo ${a/foo"/"bar}
> bar
>
> Whether quotes should escape the "/" is not clearly documented,
> though the doc does tell us to use backslash for that.

The whole expression only works because the quotes are balanced:

% echo ${x/foo"/bar}
braceparam dquote>

Clearly something a little wacky is going on here.

> However the fact that the first " is taken as being part of the
> pattern while the second one is removed doesn't make much sense.

This is the fault of singsub() called from line 2670 of
compgetmatch(), which eventually calls remnulargs() via prefork(),
which deletes the Dnull representing the double-quote.  This is only a
problem because an unbalanced quote was able to sneak through.

This was actually caught at paramsubst() line 3118:
                haserr = parse_subst_string(s);
This returned haserr == 1, but the only effect of that is that the
string is retokenized at line 3124.  If you use ${(X)x/foo"/"bar} you
get
  zsh: unmatched "

Next question is where this should be fixed.  The following works for
double-quotes, but not single because of special handling required for
$'...' -- so should be considered only a proof of how the substitution
could work with more conventional quoting.  All tests still pass with
the below.

diff --git a/Src/subst.c b/Src/subst.c
index 0f98e6ea3..86ee1dad8 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -2911,6 +2911,9 @@ paramsubst(LinkList l, LinkNode n, char **str,
int qt, int pf_flags,
             else
             ptr++;
         }
+        if (c == Dnull)
+            while (ptr[1] && ptr[1] != c)
+            ptr++;
         }
         replstr = (*ptr && ptr[1]) ? ptr+1 : "";
         *ptr = '\0';


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar
  2022-12-13  6:21 ` Bart Schaefer
@ 2023-08-07  2:36   ` Bart Schaefer
  2023-10-04  4:20     ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2023-08-07  2:36 UTC (permalink / raw)
  To: Zsh hackers list

Circling back to this ...

On Mon, Dec 12, 2022 at 10:21 PM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> On Sun, Dec 11, 2022 at 9:50 AM Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > $ a='foo"'
> > $ echo ${a/foo"/"bar}
> > bar
> >
> > Whether quotes should escape the "/" is not clearly documented,
> > though the doc does tell us to use backslash for that.
>
> The whole expression only works because the quotes are balanced:
>
> % echo ${x/foo"/bar}
> braceparam dquote>
>
> Clearly something a little wacky is going on here.

With an increased understanding of lex.c gained by stepping through
variations of ${|...} a thousand times, I think I have identified two
possible ways to address this.

> > However the fact that the first " is taken as being part of the
> > pattern while the second one is removed doesn't make much sense.
>
> This is the fault of singsub() called from line 2670 of
> compgetmatch(), which eventually calls remnulargs() via prefork(),
> which deletes the Dnull representing the double-quote.  This is only a
> problem because an unbalanced quote was able to sneak through.

One possible change is for the first quote to be taken as part of the
pattern but the second quote to NOT be removed.  This results in

$ echo ${a/foo"/"bar}
braceparam dquote>

It's always been possible to do quoted substrings in the replacement
rather than in the pattern, so this is consistent.  But it also means
that

$ a='foo"'
$ x= bar
$ echo ${a/foo"/"$x"}
$ bar

Something similar would have to be done with single quotes, because
handling double quotes doesn't fix this oddity:

% a="foo'"
% x=bar
% echo ${a/foo'/$'${x}y}
$bary

That is, the single quote after the slash is simply ignored in the
replacement.  That happens in 5.9, 5.8, and earlier.

The second possible change is to report "bad substitution" for the
original example.  As mentioned in the previous post,

> This was actually caught at paramsubst() line 3118:
>                 haserr = parse_subst_string(s);
> This returned haserr == 1, but the only effect of that is that the
> string is retokenized at line 3124.

An option I haven't pursued so far is to allow double-quoted
substrings to appear in the pattern.  This would make some sense given
that we allow parameter expansions in the pattern and can protect them
with single quotes, but using single quotes in the pattern has other
unexpected side-effects on the replacement:

% echo ${a/'foo"'/"$x"y}
"bar"y

Incidentally, it's sort-of possible in 5.8 to use $'...' quoting in
patterns, but the same odd stuff is going on there as well.

Thoughts?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar
  2023-08-07  2:36   ` Bart Schaefer
@ 2023-10-04  4:20     ` Bart Schaefer
  2023-10-17 16:44       ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2023-10-04  4:20 UTC (permalink / raw)
  To: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 2020 bytes --]

On Sun, Aug 6, 2023 at 7:36 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> Circling back to this ...

... and again ...

> With an increased understanding of lex.c gained by stepping through
> variations of ${|...} a thousand times, I think I have identified two
> possible ways to address this.

Turns out there's a fundamental problem with ${var/pat/repl}, namely
that upon reaching paramsubst() the only recognized quoting of the
slash character is to precede it with a backslash. Conversely,
gettokstr() recognizes '...', "...", and $'...' quoting, and may
return STRING tokens containing backslashes that paramsubst() won't be
able to deal with.

Other things that have never worked are e.g. ${a/${b/c/d}/e}, which
gives bad substitution, and ${a/"${b/c/d}"/e} which AFAICT always
substitutes ${a} with no replacements, and ${a/"$( /bin/echo d )"/e}
which is bad pattern: "$(.  Fixing this properly would require doing a
full-fledged parse inside paramsubst(), rather than just a search for
unquoted backslashes.

Several of my attempts so far have caused at least Test/Y01 to fail in
comptesteval, so there is evidence for dependency on the existing
behavior.

> Something similar would have to be done with single quotes

I had an idea here that seems to work: while reading single-quoted
strings in gettokstr(), recognize when we're in a pattern and replace
all / with \/.  That only fixes '...' and $'...' at best, though.

So, back to workers/51202, my original reply on this thread:  handle
double-quotes in paramsubst(), leaving single quotes to gettokstr().

This still isn't perfect (see above regarding ${a/${b/c/d}/e}) but it
fixes the two Xfail tests in D04parameter and handles most of the
straightforward cases, including

a='/x/y/z' zsh -fc 'echo "${a/"/y/"/+}"'

and even the ${a/"${b/c/d}"/e} example, although it'll break on any
more double quotes inside other double quotes.

Other existing tests still pass.  Let me know what you think.

[-- Attachment #2: subst-pat-quote.txt --]
[-- Type: text/plain, Size: 3337 bytes --]

diff --git a/Src/lex.c b/Src/lex.c
index 33b17cc95..31b130b07 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -938,7 +938,7 @@ gettokstr(int c, int sub)
 {
     int bct = 0, pct = 0, brct = 0, seen_brct = 0, fdpar = 0;
     int intpos = 1, in_brace_param = 0, cmdsubst = 0;
-    int inquote, unmatched = 0;
+    int inquote, unmatched = 0, in_pattern = 0;
     enum lextok peek;
 #ifdef DEBUG
     int ocmdsp = cmdsp;
@@ -1160,7 +1160,7 @@ gettokstr(int c, int sub)
 	    if (bct-- == in_brace_param) {
 		if (cmdsubst)
 		    cmdpop();
-		in_brace_param = cmdsubst = 0;
+		in_brace_param = cmdsubst = in_pattern = 0;
 	    }
 	    c = Outbrace;
 	    break;
@@ -1309,7 +1309,8 @@ gettokstr(int c, int sub)
 			    lexbuf.ptr--, lexbuf.len--;
 			else
 			    break;
-		    }
+		    } else if (in_pattern && c == '/')
+			add(Bnull);
 		    add(c);
 		}
 		ALLOWHIST
@@ -1397,26 +1398,36 @@ gettokstr(int c, int sub)
 	     */
 	    c = Dash;
            break;
-       case LX2_BANG:
-           /*
-            * Same logic as Dash, for ! to perform negation in range.
-            */
-           if (seen_brct)
-               c = Bang;
-           else
-               c = '!';
-       }
-       add(c);
-       c = hgetc();
-       if (intpos)
+	case LX2_BANG:
+	    /*
+	     * Same logic as Dash, for ! to perform negation in range.
+	     */
+	    if (seen_brct)
+		c = Bang;
+	    else
+		c = '!';
+	case LX2_OTHER:
+	    if (in_brace_param) {
+		if (c == '/') {
+		    if (in_pattern == 0)
+			in_pattern = 2;
+		    else
+			--in_pattern;
+		}
+	    }
+	}
+	add(c);
+	c = hgetc();
+	if (intpos)
 	    intpos--;
-       if (lexstop)
+	if (lexstop)
 	    break;
-       if (!cmdsubst && in_brace_param && act == LX2_STRING &&
-	   (c == '|' || c == Bar || inblank(c))) {
-	   cmdsubst = in_brace_param;
-	   cmdpush(CS_CURSH);
-       }
+	if (!cmdsubst && in_brace_param && act == LX2_STRING &&
+	    (c == '|' || c == Bar || inblank(c))) {
+	    cmdsubst = in_brace_param;
+	    cmdpush(CS_CURSH);
+	} else if (in_pattern == 2 && c != '/')
+	    in_pattern = 1;
     }
   brk:
     if (errflag) {
diff --git a/Src/subst.c b/Src/subst.c
index cdbfc138a..419069f5a 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -3088,6 +3088,13 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 			chuck(ptr);
 		    else
 			ptr++;
+		} else if (c == Dnull) {
+		    chuck(ptr);
+		    while (*ptr && *ptr != c)
+			ptr++;
+		    if (*ptr == Dnull)
+			chuck(ptr);
+		    ptr--;	/* Outer loop is about to increment */
 		}
 	    }
 	    replstr = (*ptr && ptr[1]) ? ptr+1 : "";
diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
index c2008582c..69a4fd3ec 100644
--- a/Test/D04parameter.ztst
+++ b/Test/D04parameter.ztst
@@ -2762,13 +2762,13 @@ F:behavior, see http://austingroupbugs.net/view.php?id=888
 
  slash='/'
  print -r -- x${slash/'/'}y
--Df:(users/28784) substituting a single-quoted backslash, part #1: slash
+0:(users/28784) substituting a single-quoted backslash, part #1: slash
 >xy
 
  single_quote="'"
- print -r -- x${single_quote/'/'}y
--Df:(users/28784) substituting a single-quoted backslash, part #2: single quote
->x/y
+ print -r -- x${single_quote/$'/'}y
+0:(users/28784) substituting a single-quoted backslash, part #2: single quote
+>x'y
 
  control="foobar"
  print -r -- x${control/'bar'}y

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar
  2023-10-04  4:20     ` Bart Schaefer
@ 2023-10-17 16:44       ` Bart Schaefer
  0 siblings, 0 replies; 5+ messages in thread
From: Bart Schaefer @ 2023-10-17 16:44 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, Oct 3, 2023 at 9:20 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> I had an idea here that seems to work: while reading single-quoted
> strings in gettokstr(), recognize when we're in a pattern and replace
> all / with \/.  That only fixes '...' and $'...' at best, though.
>
> So, back to workers/51202, my original reply on this thread:  handle
> double-quotes in paramsubst(), leaving single quotes to gettokstr().

Any comments on this?  Worth pushing?


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-17 16:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-11 17:50 a='foo"'; echo ${a/foo"/"bar} outputs bar Stephane Chazelas
2022-12-13  6:21 ` Bart Schaefer
2023-08-07  2:36   ` Bart Schaefer
2023-10-04  4:20     ` Bart Schaefer
2023-10-17 16:44       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).