* a='foo"'; echo ${a/foo"/"bar} outputs bar @ 2022-12-11 17:50 Stephane Chazelas 2022-12-13 6:21 ` Bart Schaefer 0 siblings, 1 reply; 5+ messages in thread From: Stephane Chazelas @ 2022-12-11 17:50 UTC (permalink / raw) To: Zsh hackers list $ a='foo"' $ echo ${a/foo"/"bar} bar Whether quotes should escape the "/" is not clearly documented, though the doc does tell us to use backslash for that. However the fact that the first " is taken as being part of the pattern while the second one is removed doesn't make much sense. In ksh93, bash and mksh, quotes can be used to escape the /, i.e. one can do: $ a='/x/y/z' bash -c 'echo "${a/"/y/"/+}"' /x+z $ a='/x/y/z' ksh -c 'echo "${a/"/y/"/+}"' /x+z $ a='/x/y/z' mksh -c 'echo "${a/"/y/"/+}"' /x+z Maybe zsh could align with them? The csh-style one looks better: $ a='foo"' zsh -c 'echo ${a:s/foo"/"bar}' bar" quotes don't escape the / either but at least the " is to taken as part of the pattern. Better than tcsh anyway: $ a='foo"' tcsh -c 'echo ${a:s/foo"/"bar}' ^C^C^C^Z zsh: suspended a='foo"' tcsh -c 'echo ${a:s/foo"/"bar}' (148)$ kill % (ended up using all my RAM). -- Stephane ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar 2022-12-11 17:50 a='foo"'; echo ${a/foo"/"bar} outputs bar Stephane Chazelas @ 2022-12-13 6:21 ` Bart Schaefer 2023-08-07 2:36 ` Bart Schaefer 0 siblings, 1 reply; 5+ messages in thread From: Bart Schaefer @ 2022-12-13 6:21 UTC (permalink / raw) To: Zsh hackers list On Sun, Dec 11, 2022 at 9:50 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > $ a='foo"' > $ echo ${a/foo"/"bar} > bar > > Whether quotes should escape the "/" is not clearly documented, > though the doc does tell us to use backslash for that. The whole expression only works because the quotes are balanced: % echo ${x/foo"/bar} braceparam dquote> Clearly something a little wacky is going on here. > However the fact that the first " is taken as being part of the > pattern while the second one is removed doesn't make much sense. This is the fault of singsub() called from line 2670 of compgetmatch(), which eventually calls remnulargs() via prefork(), which deletes the Dnull representing the double-quote. This is only a problem because an unbalanced quote was able to sneak through. This was actually caught at paramsubst() line 3118: haserr = parse_subst_string(s); This returned haserr == 1, but the only effect of that is that the string is retokenized at line 3124. If you use ${(X)x/foo"/"bar} you get zsh: unmatched " Next question is where this should be fixed. The following works for double-quotes, but not single because of special handling required for $'...' -- so should be considered only a proof of how the substitution could work with more conventional quoting. All tests still pass with the below. diff --git a/Src/subst.c b/Src/subst.c index 0f98e6ea3..86ee1dad8 100644 --- a/Src/subst.c +++ b/Src/subst.c @@ -2911,6 +2911,9 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, else ptr++; } + if (c == Dnull) + while (ptr[1] && ptr[1] != c) + ptr++; } replstr = (*ptr && ptr[1]) ? ptr+1 : ""; *ptr = '\0'; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar 2022-12-13 6:21 ` Bart Schaefer @ 2023-08-07 2:36 ` Bart Schaefer 2023-10-04 4:20 ` Bart Schaefer 0 siblings, 1 reply; 5+ messages in thread From: Bart Schaefer @ 2023-08-07 2:36 UTC (permalink / raw) To: Zsh hackers list Circling back to this ... On Mon, Dec 12, 2022 at 10:21 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Sun, Dec 11, 2022 at 9:50 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > > > $ a='foo"' > > $ echo ${a/foo"/"bar} > > bar > > > > Whether quotes should escape the "/" is not clearly documented, > > though the doc does tell us to use backslash for that. > > The whole expression only works because the quotes are balanced: > > % echo ${x/foo"/bar} > braceparam dquote> > > Clearly something a little wacky is going on here. With an increased understanding of lex.c gained by stepping through variations of ${|...} a thousand times, I think I have identified two possible ways to address this. > > However the fact that the first " is taken as being part of the > > pattern while the second one is removed doesn't make much sense. > > This is the fault of singsub() called from line 2670 of > compgetmatch(), which eventually calls remnulargs() via prefork(), > which deletes the Dnull representing the double-quote. This is only a > problem because an unbalanced quote was able to sneak through. One possible change is for the first quote to be taken as part of the pattern but the second quote to NOT be removed. This results in $ echo ${a/foo"/"bar} braceparam dquote> It's always been possible to do quoted substrings in the replacement rather than in the pattern, so this is consistent. But it also means that $ a='foo"' $ x= bar $ echo ${a/foo"/"$x"} $ bar Something similar would have to be done with single quotes, because handling double quotes doesn't fix this oddity: % a="foo'" % x=bar % echo ${a/foo'/$'${x}y} $bary That is, the single quote after the slash is simply ignored in the replacement. That happens in 5.9, 5.8, and earlier. The second possible change is to report "bad substitution" for the original example. As mentioned in the previous post, > This was actually caught at paramsubst() line 3118: > haserr = parse_subst_string(s); > This returned haserr == 1, but the only effect of that is that the > string is retokenized at line 3124. An option I haven't pursued so far is to allow double-quoted substrings to appear in the pattern. This would make some sense given that we allow parameter expansions in the pattern and can protect them with single quotes, but using single quotes in the pattern has other unexpected side-effects on the replacement: % echo ${a/'foo"'/"$x"y} "bar"y Incidentally, it's sort-of possible in 5.8 to use $'...' quoting in patterns, but the same odd stuff is going on there as well. Thoughts? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar 2023-08-07 2:36 ` Bart Schaefer @ 2023-10-04 4:20 ` Bart Schaefer 2023-10-17 16:44 ` Bart Schaefer 0 siblings, 1 reply; 5+ messages in thread From: Bart Schaefer @ 2023-10-04 4:20 UTC (permalink / raw) To: Zsh hackers list [-- Attachment #1: Type: text/plain, Size: 2020 bytes --] On Sun, Aug 6, 2023 at 7:36 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > Circling back to this ... ... and again ... > With an increased understanding of lex.c gained by stepping through > variations of ${|...} a thousand times, I think I have identified two > possible ways to address this. Turns out there's a fundamental problem with ${var/pat/repl}, namely that upon reaching paramsubst() the only recognized quoting of the slash character is to precede it with a backslash. Conversely, gettokstr() recognizes '...', "...", and $'...' quoting, and may return STRING tokens containing backslashes that paramsubst() won't be able to deal with. Other things that have never worked are e.g. ${a/${b/c/d}/e}, which gives bad substitution, and ${a/"${b/c/d}"/e} which AFAICT always substitutes ${a} with no replacements, and ${a/"$( /bin/echo d )"/e} which is bad pattern: "$(. Fixing this properly would require doing a full-fledged parse inside paramsubst(), rather than just a search for unquoted backslashes. Several of my attempts so far have caused at least Test/Y01 to fail in comptesteval, so there is evidence for dependency on the existing behavior. > Something similar would have to be done with single quotes I had an idea here that seems to work: while reading single-quoted strings in gettokstr(), recognize when we're in a pattern and replace all / with \/. That only fixes '...' and $'...' at best, though. So, back to workers/51202, my original reply on this thread: handle double-quotes in paramsubst(), leaving single quotes to gettokstr(). This still isn't perfect (see above regarding ${a/${b/c/d}/e}) but it fixes the two Xfail tests in D04parameter and handles most of the straightforward cases, including a='/x/y/z' zsh -fc 'echo "${a/"/y/"/+}"' and even the ${a/"${b/c/d}"/e} example, although it'll break on any more double quotes inside other double quotes. Other existing tests still pass. Let me know what you think. [-- Attachment #2: subst-pat-quote.txt --] [-- Type: text/plain, Size: 3337 bytes --] diff --git a/Src/lex.c b/Src/lex.c index 33b17cc95..31b130b07 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -938,7 +938,7 @@ gettokstr(int c, int sub) { int bct = 0, pct = 0, brct = 0, seen_brct = 0, fdpar = 0; int intpos = 1, in_brace_param = 0, cmdsubst = 0; - int inquote, unmatched = 0; + int inquote, unmatched = 0, in_pattern = 0; enum lextok peek; #ifdef DEBUG int ocmdsp = cmdsp; @@ -1160,7 +1160,7 @@ gettokstr(int c, int sub) if (bct-- == in_brace_param) { if (cmdsubst) cmdpop(); - in_brace_param = cmdsubst = 0; + in_brace_param = cmdsubst = in_pattern = 0; } c = Outbrace; break; @@ -1309,7 +1309,8 @@ gettokstr(int c, int sub) lexbuf.ptr--, lexbuf.len--; else break; - } + } else if (in_pattern && c == '/') + add(Bnull); add(c); } ALLOWHIST @@ -1397,26 +1398,36 @@ gettokstr(int c, int sub) */ c = Dash; break; - case LX2_BANG: - /* - * Same logic as Dash, for ! to perform negation in range. - */ - if (seen_brct) - c = Bang; - else - c = '!'; - } - add(c); - c = hgetc(); - if (intpos) + case LX2_BANG: + /* + * Same logic as Dash, for ! to perform negation in range. + */ + if (seen_brct) + c = Bang; + else + c = '!'; + case LX2_OTHER: + if (in_brace_param) { + if (c == '/') { + if (in_pattern == 0) + in_pattern = 2; + else + --in_pattern; + } + } + } + add(c); + c = hgetc(); + if (intpos) intpos--; - if (lexstop) + if (lexstop) break; - if (!cmdsubst && in_brace_param && act == LX2_STRING && - (c == '|' || c == Bar || inblank(c))) { - cmdsubst = in_brace_param; - cmdpush(CS_CURSH); - } + if (!cmdsubst && in_brace_param && act == LX2_STRING && + (c == '|' || c == Bar || inblank(c))) { + cmdsubst = in_brace_param; + cmdpush(CS_CURSH); + } else if (in_pattern == 2 && c != '/') + in_pattern = 1; } brk: if (errflag) { diff --git a/Src/subst.c b/Src/subst.c index cdbfc138a..419069f5a 100644 --- a/Src/subst.c +++ b/Src/subst.c @@ -3088,6 +3088,13 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags, chuck(ptr); else ptr++; + } else if (c == Dnull) { + chuck(ptr); + while (*ptr && *ptr != c) + ptr++; + if (*ptr == Dnull) + chuck(ptr); + ptr--; /* Outer loop is about to increment */ } } replstr = (*ptr && ptr[1]) ? ptr+1 : ""; diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst index c2008582c..69a4fd3ec 100644 --- a/Test/D04parameter.ztst +++ b/Test/D04parameter.ztst @@ -2762,13 +2762,13 @@ F:behavior, see http://austingroupbugs.net/view.php?id=888 slash='/' print -r -- x${slash/'/'}y --Df:(users/28784) substituting a single-quoted backslash, part #1: slash +0:(users/28784) substituting a single-quoted backslash, part #1: slash >xy single_quote="'" - print -r -- x${single_quote/'/'}y --Df:(users/28784) substituting a single-quoted backslash, part #2: single quote ->x/y + print -r -- x${single_quote/$'/'}y +0:(users/28784) substituting a single-quoted backslash, part #2: single quote +>x'y control="foobar" print -r -- x${control/'bar'}y ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: a='foo"'; echo ${a/foo"/"bar} outputs bar 2023-10-04 4:20 ` Bart Schaefer @ 2023-10-17 16:44 ` Bart Schaefer 0 siblings, 0 replies; 5+ messages in thread From: Bart Schaefer @ 2023-10-17 16:44 UTC (permalink / raw) To: Zsh hackers list On Tue, Oct 3, 2023 at 9:20 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > I had an idea here that seems to work: while reading single-quoted > strings in gettokstr(), recognize when we're in a pattern and replace > all / with \/. That only fixes '...' and $'...' at best, though. > > So, back to workers/51202, my original reply on this thread: handle > double-quotes in paramsubst(), leaving single quotes to gettokstr(). Any comments on this? Worth pushing? ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-10-17 16:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-11 17:50 a='foo"'; echo ${a/foo"/"bar} outputs bar Stephane Chazelas 2022-12-13 6:21 ` Bart Schaefer 2023-08-07 2:36 ` Bart Schaefer 2023-10-04 4:20 ` Bart Schaefer 2023-10-17 16:44 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).