* regexp-replace and ^, word boundary or look-behind operators @ 2019-12-16 21:10 Stephane Chazelas 2019-12-16 21:27 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2019-12-16 21:10 UTC (permalink / raw) To: Zsh hackers list The way regexp-replace works means that these things: $ a='aaab'; regexp-replace a '^a' x; echo "$a" xxxb $ a='abab'; regexp-replace a '\<ab' '<$MATCH>'; echo $a <ab><ab> $ set -o rematchpcre $ a=xxx; regexp-replace a '(?<!x)x' y; echo $a yyy don't work properly as after the first substitution, the regex is no longer matched on the full subject, but on the part of subject after the last match. I don't think that can be fixed without exposing more of the regex/pcre C API. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: regexp-replace and ^, word boundary or look-behind operators 2019-12-16 21:10 regexp-replace and ^, word boundary or look-behind operators Stephane Chazelas @ 2019-12-16 21:27 ` Stephane Chazelas 2019-12-17 7:38 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2019-12-16 21:27 UTC (permalink / raw) To: Zsh hackers list 2019-12-16 21:10:13 +0000, Stephane Chazelas: > The way regexp-replace works means that these things: > > $ a='aaab'; regexp-replace a '^a' x; echo "$a" > xxxb > $ a='abab'; regexp-replace a '\<ab' '<$MATCH>'; echo $a > <ab><ab> > $ set -o rematchpcre > $ a=xxx; regexp-replace a '(?<!x)x' y; echo $a > yyy [...] FWIW, looks like some sed implementations (like that of the heirloom toolchest or busybox) or ksh93 have the same problem: $ echo xxx | busybox sed 's/\<x/y/g' yyy $ a=xxx ksh -c 'echo ${a//~(E:^x)/y}' yyy $ a=xxx ksh -c 'echo ${a//[[:<:]]x/y}' yyy It may be that the POSIX regex API doesn't have a way to fix that (REG_NOTBOL addresses the ^ case, but there's nothing about \< / \b / [[:<]] which are non-POSIX extensions anyway). PCRE should be OK, so it could be just a matter of exposing it via the pcre_match builtin and document the limitation otherwise for EREs (PCRE is the new de-facto standard anyway). -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: regexp-replace and ^, word boundary or look-behind operators 2019-12-16 21:27 ` Stephane Chazelas @ 2019-12-17 7:38 ` Stephane Chazelas 2019-12-17 11:11 ` [PATCH] " Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2019-12-17 7:38 UTC (permalink / raw) To: Zsh hackers list 2019-12-16 21:27:06 +0000, Stephane Chazelas: [...] > PCRE should be OK, so it could be just a matter of > exposing it via the pcre_match builtin [...] D'oh, it's there already with the -b, -n options. I'll try and suggest a regexp-replace improvement using that. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH] Re: regexp-replace and ^, word boundary or look-behind operators 2019-12-17 7:38 ` Stephane Chazelas @ 2019-12-17 11:11 ` Stephane Chazelas 2019-12-18 0:22 ` Daniel Shahaf 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2019-12-17 11:11 UTC (permalink / raw) To: zsh-workers 2019-12-17 07:38:46 +0000, Stephane Chazelas: > 2019-12-16 21:27:06 +0000, Stephane Chazelas: > [...] > > PCRE should be OK, so it could be just a matter of > > exposing it via the pcre_match builtin > [...] > > D'oh, it's there already with the -b, -n options. > > I'll try and suggest a regexp-replace improvement using that. [...] There's another issue in that the zero-width matches cause infinite loops. Here's my first attempt at fixing those issues (also fixing a few issues in the zpgrep example functions while I'm at it): diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo index d32ba018d..61e6a434f 100644 --- a/Doc/Zsh/contrib.yo +++ b/Doc/Zsh/contrib.yo @@ -4301,6 +4301,9 @@ and arithmetic expressions which will be replaced: in particular, a reference to tt($MATCH) will be replaced by the text matched by the pattern. The return status is 0 if at least one match was performed, else 1. + +Note that if not using PCRE, using the tt(^) or word boundary operators +(where available) may not work properly. ) findex(run-help) item(tt(run-help) var(cmd))( diff --git a/Functions/Example/zpgrep b/Functions/Example/zpgrep index 8b1edaa1c..556e58cd6 100644 --- a/Functions/Example/zpgrep +++ b/Functions/Example/zpgrep @@ -2,24 +2,31 @@ # zpgrep() { -local file pattern +local file pattern ret pattern=$1 shift +ret=1 if ((! ARGC)) then set -- - fi -pcre_compile $pattern +zmodload zsh/pcre || return +pcre_compile -- "$pattern" pcre_study for file do if [[ "$file" == - ]] then - while read -u0 buf; do pcre_match $buf && print $buf; done + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done else - while read -u0 buf; do pcre_match $buf && print $buf; done < "$file" + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done < "$file" fi done +return "$ret" } diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index dec105524..41ea9d79e 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -8,36 +8,79 @@ # $ and backtick substitutions; in particular, $MATCH will be replaced # by the portion of the string matched by the regular expression. -integer pcre +# we use positional parameters instead of variables to avoid +# clashing with the user's variable. Make sure we start with 3 and only +# 3 elements: +argv=("$1" "$2" "$3") -[[ -o re_match_pcre ]] && pcre=1 +# $4 records whether pcre is enabled as that information would otherwise +# be lost after emulate -L zsh +4=0 +[[ -o re_match_pcre ]] && 4=1 emulate -L zsh -(( pcre )) && setopt re_match_pcre - -# $4 is the string to be matched -4=${(P)1} -# $5 is the final string -5= -# 6 indicates if we made a change -6= + + local MATCH MBEGIN MEND local -a match mbegin mend -while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and subsituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break - fi -done -5+=$4 - -eval ${1}=${(q)5} -# status 0 if we did something, else 1. -[[ -n $6 ]] +if (( $4 )); then + # if using pcre, we're using pcre_match and a running offset + # That's needed for ^, \A, \b, and look-behind operators to work + # properly. + + zmodload zsh/pcre || return 2 + pcre_compile -- "$2" && pcre_study || return 2 + + # $4 is the current *byte* offset, $5, $6 reserved for later + 4=0 5= 6=1 + + local ZPCRE_OP IFS=' ' + while pcre_match -b -n $4 -- "${(P)1}"; do + # append offsets and computed replacement to the array + argv+=($=ZPCRE_OP ${(e)3}) + + # for 0-width matches, increase offset by 1 to avoid + # infinite loop + 4=$((argv[-2] + (argv[-3] == argv[-2]))) + done + + (($# > 6)) || return # no match + + set +o multibyte + + # $5 contains the result, $6 the current offset + for 2 3 4 in "$@[7,-1]"; do + 5+=${(P)1[$6,$2]}$4 + 6=$(($3 + 1)) + done + 5+=${(P)1[$6,-1]} +else + # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where + # available) won't work properly. + + # $4 is the string to be matched + 4=${(P)1} + + while [[ -n $4 ]]; do + if [[ $4 =~ $2 ]]; then + # append initial part and substituted match + 5+=${4[1,MBEGIN-1]}${(e)3} + # truncate remaining string + if ((MEND < MBEGIN)); then + # zero-width match, skip one character for the next match + ((MEND++)) + 5+=${4[1]} + fi + 4=${4[MEND+1,-1]} + # indicate we did something + 6=1 + else + break + fi + done + [[ -n $6 ]] || return # no match + 5+=$4 +fi + +eval $1=\$5 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] Re: regexp-replace and ^, word boundary or look-behind operators 2019-12-17 11:11 ` [PATCH] " Stephane Chazelas @ 2019-12-18 0:22 ` Daniel Shahaf 2019-12-18 8:31 ` Stephane Chazelas 2020-01-01 14:03 ` [PATCH v2] " Stephane Chazelas 0 siblings, 2 replies; 58+ messages in thread From: Daniel Shahaf @ 2019-12-18 0:22 UTC (permalink / raw) To: Stephane Chazelas, zsh-workers Stephane Chazelas wrote on Tue, 17 Dec 2019 11:11 +00:00: > +++ b/Doc/Zsh/contrib.yo > @@ -4301,6 +4301,9 @@ and arithmetic expressions which will be > replaced: in particular, a > reference to tt($MATCH) will be replaced by the text matched by the > pattern. > > The return status is 0 if at least one match was performed, else 1. > + > +Note that if not using PCRE, using the tt(^) or word boundary operators > +(where available) may not work properly. Suggest to avoid the double negative: 1. s/not using PCRE/using POSIX ERE's/ 2. Add "(ERE's)" after "POSIX extended regular expressions" in the first paragraph I'll push a minor change to that first paragraph in a moment. > ) > +++ b/Functions/Example/zpgrep > @@ -2,24 +2,31 @@ > +eval $1=\$5 How about «: ${(P)1::="$5"}» to avoid eval? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] Re: regexp-replace and ^, word boundary or look-behind operators 2019-12-18 0:22 ` Daniel Shahaf @ 2019-12-18 8:31 ` Stephane Chazelas 2020-01-01 14:03 ` [PATCH v2] " Stephane Chazelas 1 sibling, 0 replies; 58+ messages in thread From: Stephane Chazelas @ 2019-12-18 8:31 UTC (permalink / raw) To: Daniel Shahaf; +Cc: zsh-workers 2019-12-18 00:22:53 +0000, Daniel Shahaf: [...] > > +eval $1=\$5 > > How about «: ${(P)1::="$5"}» to avoid eval? I suppose that would work but would not prevent code injection vulnerabilities if $1 was not guaranteed to contain a valid variable name: $ 1='a[`uname>&2`]' $ : ${(P)1::="$5"} Linux zsh: bad math expression: empty string Linux zsh: bad math expression: empty string Note that uname was run twice suggesting it's potentially less efficient than using eval (IIRC, that was already discussed here. possibly that was fixed in a newer version). Here, I'd say it's the caller's responsibility to make sure they pass a valid lvalue as first argument. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v2] regexp-replace and ^, word boundary or look-behind operators 2019-12-18 0:22 ` Daniel Shahaf 2019-12-18 8:31 ` Stephane Chazelas @ 2020-01-01 14:03 ` Stephane Chazelas 2021-04-30 6:11 ` Stephane Chazelas 1 sibling, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2020-01-01 14:03 UTC (permalink / raw) To: zsh-workers 2019-12-18 00:22:53 +0000, Daniel Shahaf: [...] > > + > > +Note that if not using PCRE, using the tt(^) or word boundary operators > > +(where available) may not work properly. > > Suggest to avoid the double negative: > > 1. s/not using PCRE/using POSIX ERE's/ > > 2. Add "(ERE's)" after "POSIX extended regular expressions" in the first paragraph > > I'll push a minor change to that first paragraph in a moment. [...] Thanks, I've incorporated that suggesting and fixed an issue with PCRE when the replacement was empty or generated more than one element. diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo index 558342711..9a804fc11 100644 --- a/Doc/Zsh/contrib.yo +++ b/Doc/Zsh/contrib.yo @@ -4284,7 +4284,7 @@ See also the tt(pager), tt(prompt) and tt(rprompt) styles below. findex(regexp-replace) item(tt(regexp-replace) var(var) var(regexp) var(replace))( Use regular expressions to perform a global search and replace operation -on a variable. POSIX extended regular expressions are used, +on a variable. POSIX extended regular expressions (ERE) are used, unless the option tt(RE_MATCH_PCRE) has been set, in which case Perl-compatible regular expressions are used (this requires the shell to be linked against the tt(pcre) @@ -4302,6 +4302,9 @@ and arithmetic expressions which will be replaced: in particular, a reference to tt($MATCH) will be replaced by the text matched by the pattern. The return status is 0 if at least one match was performed, else 1. + +Note that if using POSIX EREs, the tt(^) or word boundary operators +(where available) may not work properly. ) findex(run-help) item(tt(run-help) var(cmd))( diff --git a/Functions/Example/zpgrep b/Functions/Example/zpgrep index 8b1edaa1c..556e58cd6 100644 --- a/Functions/Example/zpgrep +++ b/Functions/Example/zpgrep @@ -2,24 +2,31 @@ # zpgrep() { -local file pattern +local file pattern ret pattern=$1 shift +ret=1 if ((! ARGC)) then set -- - fi -pcre_compile $pattern +zmodload zsh/pcre || return +pcre_compile -- "$pattern" pcre_study for file do if [[ "$file" == - ]] then - while read -u0 buf; do pcre_match $buf && print $buf; done + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done else - while read -u0 buf; do pcre_match $buf && print $buf; done < "$file" + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done < "$file" fi done +return "$ret" } diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index dec105524..0d5948075 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -8,36 +8,84 @@ # $ and backtick substitutions; in particular, $MATCH will be replaced # by the portion of the string matched by the regular expression. -integer pcre +# we use positional parameters instead of variables to avoid +# clashing with the user's variable. Make sure we start with 3 and only +# 3 elements: +argv=("$1" "$2" "$3") -[[ -o re_match_pcre ]] && pcre=1 +# $4 records whether pcre is enabled as that information would otherwise +# be lost after emulate -L zsh +4=0 +[[ -o re_match_pcre ]] && 4=1 emulate -L zsh -(( pcre )) && setopt re_match_pcre - -# $4 is the string to be matched -4=${(P)1} -# $5 is the final string -5= -# 6 indicates if we made a change -6= + + local MATCH MBEGIN MEND local -a match mbegin mend -while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and subsituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break - fi -done -5+=$4 - -eval ${1}=${(q)5} -# status 0 if we did something, else 1. -[[ -n $6 ]] +if (( $4 )); then + # if using pcre, we're using pcre_match and a running offset + # That's needed for ^, \A, \b, and look-behind operators to work + # properly. + + zmodload zsh/pcre || return 2 + pcre_compile -- "$2" && pcre_study || return 2 + + # $4 is the current *byte* offset, $5, $6 reserved for later use + 4=0 6= + + local ZPCRE_OP + while pcre_match -b -n $4 -- "${(P)1}"; do + # append offsets and computed replacement to the array + # we need to perform the evaluation in a scalar assignment so that if + # it generates an array, the elements are converted to string (by + # joining with the first chararacter of $IFS as usual) + 5=${(e)3} + argv+=(${(s: :)ZPCRE_OP} "$5") + + # for 0-width matches, increase offset by 1 to avoid + # infinite loop + 4=$((argv[-2] + (argv[-3] == argv[-2]))) + done + + (($# > 6)) || return # no match + + set +o multibyte + + # $5 contains the result, $6 the current offset + 5= 6=1 + for 2 3 4 in "$@[7,-1]"; do + 5+=${(P)1[$6,$2]}$4 + 6=$(($3 + 1)) + done + 5+=${(P)1[$6,-1]} +else + # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where + # available) won't work properly. + + # $4 is the string to be matched + 4=${(P)1} + + while [[ -n $4 ]]; do + if [[ $4 =~ $2 ]]; then + # append initial part and substituted match + 5+=${4[1,MBEGIN-1]}${(e)3} + # truncate remaining string + if ((MEND < MBEGIN)); then + # zero-width match, skip one character for the next match + ((MEND++)) + 5+=${4[1]} + fi + 4=${4[MEND+1,-1]} + # indicate we did something + 6=1 + else + break + fi + done + [[ -n $6 ]] || return # no match + 5+=$4 +fi + +eval $1=\$5 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2] regexp-replace and ^, word boundary or look-behind operators 2020-01-01 14:03 ` [PATCH v2] " Stephane Chazelas @ 2021-04-30 6:11 ` Stephane Chazelas 2021-04-30 23:13 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-04-30 6:11 UTC (permalink / raw) To: Zsh hackers list ping. 2020-01-01 14:03:43 +0000, Stephane Chazelas: 2019-12-18 00:22:53 +0000, Daniel Shahaf: [...] > > + > > +Note that if not using PCRE, using the tt(^) or word boundary operators > > +(where available) may not work properly. > > Suggest to avoid the double negative: > > 1. s/not using PCRE/using POSIX ERE's/ > > 2. Add "(ERE's)" after "POSIX extended regular expressions" in the first paragraph > > I'll push a minor change to that first paragraph in a moment. [...] Thanks, I've incorporated that suggesting and fixed an issue with PCRE when the replacement was empty or generated more than one element. diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo index 558342711..9a804fc11 100644 --- a/Doc/Zsh/contrib.yo +++ b/Doc/Zsh/contrib.yo @@ -4284,7 +4284,7 @@ See also the tt(pager), tt(prompt) and tt(rprompt) styles below. findex(regexp-replace) item(tt(regexp-replace) var(var) var(regexp) var(replace))( Use regular expressions to perform a global search and replace operation -on a variable. POSIX extended regular expressions are used, +on a variable. POSIX extended regular expressions (ERE) are used, unless the option tt(RE_MATCH_PCRE) has been set, in which case Perl-compatible regular expressions are used (this requires the shell to be linked against the tt(pcre) @@ -4302,6 +4302,9 @@ and arithmetic expressions which will be replaced: in particular, a reference to tt($MATCH) will be replaced by the text matched by the pattern. The return status is 0 if at least one match was performed, else 1. + +Note that if using POSIX EREs, the tt(^) or word boundary operators +(where available) may not work properly. ) findex(run-help) item(tt(run-help) var(cmd))( diff --git a/Functions/Example/zpgrep b/Functions/Example/zpgrep index 8b1edaa1c..556e58cd6 100644 --- a/Functions/Example/zpgrep +++ b/Functions/Example/zpgrep @@ -2,24 +2,31 @@ # zpgrep() { -local file pattern +local file pattern ret pattern=$1 shift +ret=1 if ((! ARGC)) then set -- - fi -pcre_compile $pattern +zmodload zsh/pcre || return +pcre_compile -- "$pattern" pcre_study for file do if [[ "$file" == - ]] then - while read -u0 buf; do pcre_match $buf && print $buf; done + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done else - while read -u0 buf; do pcre_match $buf && print $buf; done < "$file" + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done < "$file" fi done +return "$ret" } diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index dec105524..0d5948075 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -8,36 +8,84 @@ # $ and backtick substitutions; in particular, $MATCH will be replaced # by the portion of the string matched by the regular expression. -integer pcre +# we use positional parameters instead of variables to avoid +# clashing with the user's variable. Make sure we start with 3 and only +# 3 elements: +argv=("$1" "$2" "$3") -[[ -o re_match_pcre ]] && pcre=1 +# $4 records whether pcre is enabled as that information would otherwise +# be lost after emulate -L zsh +4=0 +[[ -o re_match_pcre ]] && 4=1 emulate -L zsh -(( pcre )) && setopt re_match_pcre - -# $4 is the string to be matched -4=${(P)1} -# $5 is the final string -5= -# 6 indicates if we made a change -6= + + local MATCH MBEGIN MEND local -a match mbegin mend -while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and subsituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break - fi -done -5+=$4 - -eval ${1}=${(q)5} -# status 0 if we did something, else 1. -[[ -n $6 ]] +if (( $4 )); then + # if using pcre, we're using pcre_match and a running offset + # That's needed for ^, \A, \b, and look-behind operators to work + # properly. + + zmodload zsh/pcre || return 2 + pcre_compile -- "$2" && pcre_study || return 2 + + # $4 is the current *byte* offset, $5, $6 reserved for later use + 4=0 6= + + local ZPCRE_OP + while pcre_match -b -n $4 -- "${(P)1}"; do + # append offsets and computed replacement to the array + # we need to perform the evaluation in a scalar assignment so that if + # it generates an array, the elements are converted to string (by + # joining with the first chararacter of $IFS as usual) + 5=${(e)3} + argv+=(${(s: :)ZPCRE_OP} "$5") + + # for 0-width matches, increase offset by 1 to avoid + # infinite loop + 4=$((argv[-2] + (argv[-3] == argv[-2]))) + done + + (($# > 6)) || return # no match + + set +o multibyte + + # $5 contains the result, $6 the current offset + 5= 6=1 + for 2 3 4 in "$@[7,-1]"; do + 5+=${(P)1[$6,$2]}$4 + 6=$(($3 + 1)) + done + 5+=${(P)1[$6,-1]} +else + # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where + # available) won't work properly. + + # $4 is the string to be matched + 4=${(P)1} + + while [[ -n $4 ]]; do + if [[ $4 =~ $2 ]]; then + # append initial part and substituted match + 5+=${4[1,MBEGIN-1]}${(e)3} + # truncate remaining string + if ((MEND < MBEGIN)); then + # zero-width match, skip one character for the next match + ((MEND++)) + 5+=${4[1]} + fi + 4=${4[MEND+1,-1]} + # indicate we did something + 6=1 + else + break + fi + done + [[ -n $6 ]] || return # no match + 5+=$4 +fi + +eval $1=\$5 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v2] regexp-replace and ^, word boundary or look-behind operators 2021-04-30 6:11 ` Stephane Chazelas @ 2021-04-30 23:13 ` Bart Schaefer 2021-05-05 11:45 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-04-30 23:13 UTC (permalink / raw) To: Zsh hackers list On Thu, Apr 29, 2021 at 11:12 PM Stephane Chazelas <stephane.chazelas@gmail.com> wrote: > > ping. I went back and looked at the patch again. Tangential question: "pgrep" commonly refers to grepping the process list, and is linked to "pkill". I know "zpgrep" precedes this patch, but I'm wondering if we should rename it. More directly about regexp-replace: If $argv[4,-1] are going to be ignored/discarded, perhaps there should be a warning? (Another thing that predates the patch, I know) What do you think about replacing the final eval with typeset -g, as mentioned in workers/48760 ? I don't see any reason to doubt the accuracy of the implementation, so except possibly for that last question I think the patch could be applied and the other two items addressed elsewhere. Thoughts? ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more). 2021-04-30 23:13 ` Bart Schaefer @ 2021-05-05 11:45 ` Stephane Chazelas 2021-05-31 0:58 ` Lawrence Velázquez ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Stephane Chazelas @ 2021-05-05 11:45 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list 2021-04-30 16:13:34 -0700, Bart Schaefer: [...] > I went back and looked at the patch again. Thanks. Here's a third version with further improvements addressing some of the comments here. > Tangential question: "pgrep" commonly refers to grepping the process > list, and is linked to "pkill". I know "zpgrep" precedes this patch, > but I'm wondering if we should rename it. I agree, zshpcregrep or zpmatch may be better names. There exists a pcregrep command, zpcregrep would be likely interpreted as zip-pcregrep. I'll leave it out for now. IMO, that zpgrep serves more as example code than a command people would actually use, so it probably doesn't matter much. > More directly about regexp-replace: > > If $argv[4,-1] are going to be ignored/discarded, perhaps there should > be a warning? (Another thing that predates the patch, I know) Agreed. I've addressed that. > What do you think about replacing the final eval with typeset -g, as > mentioned in workers/48760 ? I've compared: (1) eval -- $lvalue=\$value (2) Bart's typeset -g -- $lvalue=$value (3) Daniel's (zsh-workers 45073) : ${(P)lvalue::="$value"} (1) to me is the most legible but if $lvalue is not a valid lvalue, it doesn't necessarily return a useful error message to the user (like when lvalue='reboot;var'...) (2) is also very legible. It has the benefit (or inconvenience depending on PoV) of returning an error if the lvalue is not a scalar. It reports an error (and exits the shell process) upon incorrect lvalues (except ones such as "var=foo"). A major drawback though is that if chokes on lvalue='array[n=1]' or lvalue='assoc[keywith=characters]' (3) is the least legible. It also causes the lvalue to be dereferenced twice. For instance with lvalue='a[++n]', n is incremented twice. However, it does report an error upon invalid lvalue (even though ${(P)lvalue} alone doesn't), and as we use ${(P)lvalue} above already, that has the benefit of that lvalue being interpreted consistently. Non-scalar variables are converted to scalar (like with (1)). It works OK for lvalue='assoc[$key]' and lvalue='assoc[=]' or lvalue='assoc[\]]' for instance. Performance wise, for usual cases (lvalue being a simple variable name and value short enough), (1) seems to be the worst in my tests and (3) best, (2) very close. But that's reversed for the less usual cases. So, I've gone for (3), changed the code to limit the number of times the lvalue is dereferenced. I've also addressed an issue whereby regexp-replace empty '^' x would not insert x in ERE mode. (note that it is affected by the (e) failure exit code issue I've just raised separately; I'm not attempting to work around it here; though I've added the X flag for error reporting to be more consistent) diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo index 8bf1a208e..db06d7925 100644 --- a/Doc/Zsh/contrib.yo +++ b/Doc/Zsh/contrib.yo @@ -4328,7 +4328,7 @@ See also the tt(pager), tt(prompt) and tt(rprompt) styles below. findex(regexp-replace) item(tt(regexp-replace) var(var) var(regexp) var(replace))( Use regular expressions to perform a global search and replace operation -on a variable. POSIX extended regular expressions are used, +on a variable. POSIX extended regular expressions (ERE) are used, unless the option tt(RE_MATCH_PCRE) has been set, in which case Perl-compatible regular expressions are used (this requires the shell to be linked against the tt(pcre) @@ -4346,6 +4346,9 @@ and arithmetic expressions which will be replaced: in particular, a reference to tt($MATCH) will be replaced by the text matched by the pattern. The return status is 0 if at least one match was performed, else 1. + +Note that if using POSIX EREs, the tt(^) or word boundary operators +(where available) may not work properly. ) findex(run-help) item(tt(run-help) var(cmd))( diff --git a/Functions/Example/zpgrep b/Functions/Example/zpgrep index 8b1edaa1c..556e58cd6 100644 --- a/Functions/Example/zpgrep +++ b/Functions/Example/zpgrep @@ -2,24 +2,31 @@ # zpgrep() { -local file pattern +local file pattern ret pattern=$1 shift +ret=1 if ((! ARGC)) then set -- - fi -pcre_compile $pattern +zmodload zsh/pcre || return +pcre_compile -- "$pattern" pcre_study for file do if [[ "$file" == - ]] then - while read -u0 buf; do pcre_match $buf && print $buf; done + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done else - while read -u0 buf; do pcre_match $buf && print $buf; done < "$file" + while IFS= read -ru0 buf; do + pcre_match -- "$buf" && ret=0 && print -r -- "$buf" + done < "$file" fi done +return "$ret" } diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index dec105524..c947a2043 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -1,43 +1,109 @@ -# Replace all occurrences of a regular expression in a variable. The -# variable is modified directly. Respects the setting of the -# option RE_MATCH_PCRE. +# Replace all occurrences of a regular expression in a scalar variable. +# The variable is modified directly. Respects the setting of the option +# RE_MATCH_PCRE, but otherwise sets the zsh emulation mode. # -# First argument: *name* (not contents) of variable. -# Second argument: regular expression -# Third argument: replacement string. This can contain all forms of -# $ and backtick substitutions; in particular, $MATCH will be replaced -# by the portion of the string matched by the regular expression. - -integer pcre +# Arguments: +# +# 1. *name* (not contents) of variable or more generally any lvalue, +# expected to be scalar. That lvalue will be evaluated once to +# retrieve the current value, and two more times (not just one as a +# side effect of using ${(P)varname::=$value}; FIXME) for the +# assignment of the new value if a substitution was made. So lvalues +# such as array[++n] where the subscript is dynamic should be +# avoided. +# +# 2. regular expression +# +# 3. replacement string. This can contain all forms of +# $ and backtick substitutions; in particular, $MATCH will be +# replaced by the portion of the string matched by the regular +# expression. Parsing errors are fatal to the shell process. +# +# we use positional parameters instead of variables to avoid +# clashing with the user's variable. -[[ -o re_match_pcre ]] && pcre=1 +if (( $# < 2 || $# > 3 )); then + setopt localoptions functionargzero + print -ru2 "Usage: $0 <varname> <regexp> [<replacement>]" + return 2 +fi +# $4 records whether pcre is enabled as that information would otherwise +# be lost after emulate -L zsh +4=0 +[[ -o re_match_pcre ]] && 4=1 emulate -L zsh -(( pcre )) && setopt re_match_pcre - -# $4 is the string to be matched -4=${(P)1} -# $5 is the final string -5= -# 6 indicates if we made a change -6= -local MATCH MBEGIN MEND + +# $5 is the string to be matched +5=${(P)1} + +local MATCH MBEGIN MEND local -a match mbegin mend -while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and subsituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break - fi -done -5+=$4 - -eval ${1}=${(q)5} -# status 0 if we did something, else 1. -[[ -n $6 ]] +if (( $4 )); then + # if using pcre, we're using pcre_match and a running offset + # That's needed for ^, \A, \b, and look-behind operators to work + # properly. + + zmodload zsh/pcre || return 2 + pcre_compile -- "$2" && pcre_study || return 2 + + # $4 is the current *byte* offset, $6, $7 reserved for later use + 4=0 7= + + local ZPCRE_OP + while pcre_match -b -n $4 -- "$5"; do + # append offsets and computed replacement to the array + # we need to perform the evaluation in a scalar assignment so that if + # it generates an array, the elements are converted to string (by + # joining with the first chararacter of $IFS as usual) + 6=${(Xe)3} + argv+=(${(s: :)ZPCRE_OP} "$6") + + # for 0-width matches, increase offset by 1 to avoid + # infinite loop + 4=$(( argv[-2] + (argv[-3] == argv[-2]) )) + done + + (( $# > 7 )) || return # no match + + set +o multibyte + + # $6 contains the result, $7 the current offset + 6= 7=1 + for 2 3 4 in "$@[8,-1]"; do + 6+=${5[$7,$2]}$4 + 7=$(( $3 + 1 )) + done + 6+=${5[$7,-1]} +else + # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where + # available) won't work properly. + while + if [[ $5 =~ $2 ]]; then + # append initial part and substituted match + 6+=${5[1,MBEGIN-1]}${(Xe)3} + # truncate remaining string + if (( MEND < MBEGIN )); then + # zero-width match, skip one character for the next match + (( MEND++ )) + 6+=${5[1]} + fi + 5=${5[MEND+1,-1]} + # indicate we did something + 7=1 + fi + [[ -n $5 ]] + do + continue + done + [[ -n $7 ]] || return # no match + 6+=$5 +fi + +# assign result to target variable if at least one substitution was +# made. At this point, if the variable was originally array or assoc, it +# is converted to scalar. If $1 doesn't contain a valid lvalue +# specification, an exception is raised (exits the shell process if +# non-interactive). +: ${(P)1::="$6"} ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more). 2021-05-05 11:45 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas @ 2021-05-31 0:58 ` Lawrence Velázquez 2021-05-31 18:18 ` Bart Schaefer 2024-03-08 15:30 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2 siblings, 0 replies; 58+ messages in thread From: Lawrence Velázquez @ 2021-05-31 0:58 UTC (permalink / raw) To: zsh-workers; +Cc: Stephane Chazelas On Wed, May 5, 2021, at 7:45 AM, Stephane Chazelas wrote: > 2021-04-30 16:13:34 -0700, Bart Schaefer: > [...] > > I went back and looked at the patch again. > > Thanks. Here's a third version with further improvements > addressing some of the comments here. Any further feedback on this? -- vq ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more). 2021-05-05 11:45 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2021-05-31 0:58 ` Lawrence Velázquez @ 2021-05-31 18:18 ` Bart Schaefer 2021-05-31 21:37 ` [PATCH] (?) typeset array[position=index]=value Bart Schaefer 2024-03-08 15:30 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-05-31 18:18 UTC (permalink / raw) To: Bart Schaefer, Zsh hackers list On Wed, May 5, 2021 at 4:45 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > > What do you think about replacing the final eval with typeset -g, as > > mentioned in workers/48760 ? > > [...] very legible. It has the benefit (or inconvenience > depending on PoV) of returning an error if the lvalue is not a > scalar. It reports an error (and exits the shell process) upon > incorrect lvalues (except ones such as "var=foo"). A major > drawback though is that if chokes on lvalue='array[n=1]' or > lvalue='assoc[keywith=characters]' Hmm, I wonder if that should be considered a bug. It seems the parsing in builtin.c:getasg() is not equivalent to what's done for the same input syntax elsewhere. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH] (?) typeset array[position=index]=value 2021-05-31 18:18 ` Bart Schaefer @ 2021-05-31 21:37 ` Bart Schaefer 2021-06-01 5:32 ` Stephane Chazelas 2021-06-05 4:29 ` Mikael Magnusson 0 siblings, 2 replies; 58+ messages in thread From: Bart Schaefer @ 2021-05-31 21:37 UTC (permalink / raw) To: Zsh hackers list On Mon, May 31, 2021 at 11:18 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Wed, May 5, 2021 at 4:45 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > [typeset] chokes on lvalue='array[n=1]' or > > lvalue='assoc[keywith=characters]' > > Hmm, I wonder if that should be considered a bug. This copies (tweaked for context) the code from parse.c at line 2006 or thereabouts. All tests still pass, but as you can see from the comment this is not yet handling x+=y which there doesn't seem to be any reason for typeset NOT to support; I think it would require only another flag in struct asgment, but I haven't attempted that. Commentary? diff --git a/Src/builtin.c b/Src/builtin.c index a16fddcb7..148c56952 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -1933,10 +1933,11 @@ getasg(char ***argvp, LinkList assigns) asg.flags = 0; /* search for `=' */ - for (; *s && *s != '='; s++); + for (; *s && *s != '[' && *s != '=' /* && *s != '+' */; s++); + if (s > asg.name && *s == '[') skipparens('[', ']', &s); /* found `=', so return with a value */ - if (*s) { + if (*s && *s == '=') { *s = '\0'; asg.value.scalar = s + 1; } else { ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-05-31 21:37 ` [PATCH] (?) typeset array[position=index]=value Bart Schaefer @ 2021-06-01 5:32 ` Stephane Chazelas 2021-06-01 16:05 ` Bart Schaefer 2021-06-05 4:29 ` Mikael Magnusson 1 sibling, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-01 5:32 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list While we're at it, could we fix: assoc[]=x unset 'assoc[]' etc. The former can be worked around with assoc[${-+}]=x or assoc+=('' x), the latter not AFAICT. See also worker:43269 https://unix.stackexchange.com/questions/626393/in-zsh-how-do-i-unset-an-arbitrary-associative-array-element/626529#626529 unset -k "$key" assoc_or_array or: assoc[$key]=() Would probably make sense to fix the issue with escaping \ and ] bytes in there unless we're happy to break backward compatibility and make unset "assoc[$key]" work whatever the value of $key (unset 'assoc[f\]oo]]' for unset the element of key 'f\]oo]' for instance) -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-01 5:32 ` Stephane Chazelas @ 2021-06-01 16:05 ` Bart Schaefer 2021-06-02 2:51 ` [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] Bart Schaefer ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: Bart Schaefer @ 2021-06-01 16:05 UTC (permalink / raw) To: Bart Schaefer, Zsh hackers list On Mon, May 31, 2021 at 10:32 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > While we're at it, could we fix: Wow, that thread from 2016 sort of died out in the middle, at workers/37933. Thanks for reminding me of that thread, though, I should at least examine whether getasg() ought to be using parse_subscript() even though the corresponding parse.c block is using skipparens(). > assoc[]=x > unset 'assoc[]' > etc. > > The former can be worked around with assoc[${-+}]=x or > assoc+=('' x), the latter not AFAICT. The former can also be done with assoc[(e)]=x Whereas most subscript flags are meaningless for unset, we might consider supporting (e) there. I'm not sure whether that would fully address the problem with the other characters, though. The issue with the empty key seems merely to be that the subscript validity test for associative arrays never changed from the one for plain arrays. > [...] unless we're happy to break backward > compatibility and make unset "assoc[$key]" work whatever the > value of $key (unset 'assoc[f\]oo]]' for unset the element of > key 'f\]oo]' for instance) Hm, I'm not sure what backward-compatibility would be broken? Do you mean that scripts that already have work-arounds for the current issue might stop working? That's sort of why I'm thinking about (e). If that flag had to be present in order to get the "whatever the value of $key" behavior, old workarounds would be unchanged. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] 2021-06-01 16:05 ` Bart Schaefer @ 2021-06-02 2:51 ` Bart Schaefer 2021-06-02 10:06 ` Stephane Chazelas 2021-06-02 9:11 ` [PATCH] (?) typeset array[position=index]=value Stephane Chazelas 2021-06-10 0:21 ` [PATCH] (?) typeset array[position=index]=value Bart Schaefer 2 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-02 2:51 UTC (permalink / raw) To: Zsh hackers list [-- Attachment #1: Type: text/plain, Size: 1944 bytes --] On Tue, Jun 1, 2021 at 9:05 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > [...] I should at least examine whether getasg() ought to be using > parse_subscript() even though the corresponding parse.c block is using > skipparens(). I've decided parse_subscript() is appropriate there, so that's included below. If anyone sees any problem with that, please point it out. > Whereas most subscript flags are meaningless for unset, we might > consider supporting (e) there. I'm not sure whether that would fully > address the problem with the other characters, though. Turns out it does seem to fix that, so that's the direction I've gone. More discussion below. > The issue with the empty key seems merely to be that the subscript > validity test for associative arrays never changed from the one for > plain arrays. To maintain error-equivalent backward compatibility I didn't "fix" this, instead, hash[(e)] (or hash[(e)''] if you think that more readable) is required in order to unset the element with the empty key. > [...] I'm not sure what backward-compatibility would be broken? Do you > mean that scripts that already have work-arounds for the current issue > might stop working? The one compatibility issue with the foregoing is this: Current pre-patch zsh: % typeset -A zz % bad='(e)bang' % zz[$bad]=x t% typeset -p zz typeset -A zz=( ['(e)bang']=x ) % unset zz\["$bad"\] % typeset -p zz typeset -A zz=( ) With the patch, the "(e)" appearing in the value of $bad becomes a subscript flag, because $bad is expanded before "unset" parses: % zz[$bad]=x % typeset -p zz typeset -A zz=( ['(e)bang']=x ) % unset zz\["$bad"\] % typeset -p zz typeset -A zz=( ['(e)bang']=x ) You have to double the flag: % unset zz\["(e)$bad"\] % typeset -p zz typeset -A zz=( ) Is that a small enough incompatibility for this to be acceptable? I've included doc updates, but not tests. If anyone else would like to take a stab at those? [-- Attachment #2: hash_subscripts.txt --] [-- Type: text/plain, Size: 2913 bytes --] diff --git a/Doc/Zsh/builtins.yo b/Doc/Zsh/builtins.yo index f8e11d79d..07d8f6ac9 100644 --- a/Doc/Zsh/builtins.yo +++ b/Doc/Zsh/builtins.yo @@ -2411,6 +2411,8 @@ but the previous value will still reappear when the scope ends. Individual elements of associative array parameters may be unset by using subscript syntax on var(name), which should be quoted (or the entire command prefixed with tt(noglob)) to protect the subscript from filename generation. +The `tt(LPAR())tt(e)tt(RPAR())' subscript flag may be used to require strict +interpretation of characters in the key. If the tt(-m) flag is specified the arguments are taken as patterns (should be quoted) and all parameters with matching names are unset. Note that this diff --git a/Doc/Zsh/params.yo b/Doc/Zsh/params.yo index dc28a45ae..03547ef3f 100644 --- a/Doc/Zsh/params.yo +++ b/Doc/Zsh/params.yo @@ -443,7 +443,9 @@ not inhibited. This flag can also be used to force tt(*) or tt(@) to be interpreted as a single key rather than as a reference to all values. It may be used -for either purpose on the left side of an assignment. +for either purpose on the left side of an assignment. This flag may +also be used in the subscript of an argument to tt(unset) to disable +interpretation of special characters and quoting. ) enditem() diff --git a/Src/builtin.c b/Src/builtin.c index a16fddcb7..7aff354cc 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -1933,10 +1933,14 @@ getasg(char ***argvp, LinkList assigns) asg.flags = 0; /* search for `=' */ - for (; *s && *s != '='; s++); + for (; *s && *s != '[' && *s != '=' /* && *s != '+' */; s++); + if (s > asg.name && *s == '[') { + char *se = parse_subscript(s + 1, 1, ']'); + if (se && *se == ']') s = se + 1; + } /* found `=', so return with a value */ - if (*s) { + if (*s && *s == '=') { *s = '\0'; asg.value.scalar = s + 1; } else { @@ -3726,11 +3730,16 @@ bin_unset(char *name, char **argv, Options ops, int func) if (ss) { char *sse; *ss = 0; + /* parse_subscript() doesn't handle empty brackets - should it? */ if ((sse = parse_subscript(ss+1, 1, ']'))) { *sse = 0; - subscript = dupstring(ss+1); + if (ss[1] == '(' && ss[2] == 'e' && ss[3] == ')') { + subscript = dupstring(ss+4); + } else { + subscript = dupstring(ss+1); + remnulargs(subscript); + } *sse = ']'; - remnulargs(subscript); untokenize(subscript); } } @@ -3763,6 +3772,12 @@ bin_unset(char *name, char **argv, Options ops, int func) } else if (PM_TYPE(pm->node.flags) == PM_SCALAR || PM_TYPE(pm->node.flags) == PM_ARRAY) { struct value vbuf; + if (!*subscript) { + *ss = '['; + zerrnam(name, "%s: invalid parameter name", s); + returnval = 1; + continue; + } vbuf.isarr = (PM_TYPE(pm->node.flags) == PM_ARRAY ? SCANPM_ARRONLY : 0); vbuf.pm = pm; ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] 2021-06-02 2:51 ` [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] Bart Schaefer @ 2021-06-02 10:06 ` Stephane Chazelas 2021-06-02 14:52 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-02 10:06 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list 2021-06-01 19:51:32 -0700, Bart Schaefer: [...] > > The issue with the empty key seems merely to be that the subscript > > validity test for associative arrays never changed from the one for > > plain arrays. > > To maintain error-equivalent backward compatibility I didn't "fix" > this, instead, hash[(e)] (or hash[(e)''] if you think that more > readable) is required in order to unset the element with the empty > key. I have to admit I don't see the problem here. I would have thought allowing a[]=foo and unset 'a[]' would be no-brainers as there's no concern about backward compatibility as those currently return an error. Even for plain arrays, IMO, it would make sense to allow empty subscripts. In most contexts, an empty arithmetic expression is interpreted as 0: $ echo $(()) 0 $ printf '%d\n' '' 0 $ set -o ksharrays; a[empty]=1; typeset -p a typeset -a a=( 1 ) In ksh93: $ ksh -c '(( a[] = 1 )); typeset -p a' typeset -a a=(1) $ ksh -c 'a[]=1' ksh: syntax error at line 1: `[]' empty subscript (oddly enough) $ ksh -c 'a[""]=1; typeset -p a; unset "a[]"; typeset -p a' typeset -a a=(1) typeset -a a=() $ ksh -c 'typeset -A a; a[""]=1; typeset -p a; unset "a[]"; typeset -p a' typeset -A a=(['']=1) typeset -A a=() mksh: $ mksh -xc 'a[]=1; typeset -p a; unset "a[]"; typeset -p a' + a[]=1 + typeset -p a set -A a typeset a[0]=1 + unset 'a[]' + typeset -p a > The one compatibility issue with the foregoing is this: [...] > With the patch, the "(e)" appearing in the value of $bad becomes a > subscript flag, because $bad is expanded before "unset" parses: > % zz[$bad]=x > % typeset -p zz > typeset -A zz=( ['(e)bang']=x ) > % unset zz\["$bad"\] > % typeset -p zz > typeset -A zz=( ['(e)bang']=x ) > > You have to double the flag: > % unset zz\["(e)$bad"\] Or more legibly: unset "zz[(e)$bad]" > % typeset -p zz > typeset -A zz=( ) > > Is that a small enough incompatibility for this to be acceptable? [...] Well, currently, you already need to escape the (s and )s in general (except when they're matched): $ key='(' zsh -c 'typeset -A a; a[$key]=x; unset "a[$key]"' zsh:unset:1: a[(]: invalid parameter name So I'm not sure there's much of a compatibility problem. But while it allows unsetting the element with empty key with unset 'a[(e)]', it seems to make it even more difficult to unset elements with arbitrary keys. One still can't use: unset "a[$key]" nor unset "a[(e)$key]" That still chokes on ()[]`\ and that still can't be worked around with unset "a[${(b)key}]" as it inserts backslashes in too many places and not in front of backticks: $ key='?' ./Src/zsh -c 'typeset -A a; a[x]=y; a[$key]=x; typeset -p a; unset "a[${(b)key}]"; typeset -p a' typeset -A a=( ['?']=x [x]=y ) typeset -A a=( ['?']=x [x]=y ) And with (e), we can't use backslash to escape problematic characters: $ typeset -A a=('[' x) $ unset 'a[(e)[]' unset: a[(e)[]: invalid parameter name $ unset 'a[(e)\[]' $ typeset -p a typeset -A a=( ['[']=x ) So, you'd need something like: if [[ -n $key ]]; then () { set -o localoptions +o multibyte -o extendedglob unset "a[${key//[][()\`\\]/\\$MATCH}]" } else unset "a[(e)]" fi (untested) To unset an element with arbitrary key (granted, that's an improvement as you can now unset the element with empty key, but IMO not an acceptable solution). "e" for "exact" is also a bit misleading in that case as without it, wildcards and */@ are not treated specially. It's also a bit confusing that subscript flags would be seemingly parsed but later ignored (included in the value of the key) except for (e). The fact that (e) is recognised and (ee) is not also makes for a not very consistent API. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] 2021-06-02 10:06 ` Stephane Chazelas @ 2021-06-02 14:52 ` Bart Schaefer 2021-06-02 16:02 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-02 14:52 UTC (permalink / raw) To: Bart Schaefer, Zsh hackers list On Wed, Jun 2, 2021 at 3:06 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2021-06-01 19:51:32 -0700, Bart Schaefer: > [...] > > > The issue with the empty key seems merely to be that the subscript > > > validity test for associative arrays never changed from the one for > > > plain arrays. > > > > To maintain error-equivalent backward compatibility I didn't "fix" > > this, instead, hash[(e)] (or hash[(e)''] if you think that more > > readable) is required in order to unset the element with the empty > > key. > > I have to admit I don't see the problem here. I would have > thought allowing a[]=foo and unset 'a[]' would be no-brainers Mostly I was thinking about key=$(some derived value) unset "hash[$key]" In existing code that would fail on [[ -z $key ]], but you can't see that by examination. > as there's no concern about backward compatibility as those > currently return an error. That's not our usual criteria for backward compatibility. Usually we only change things if the new construct was previously a syntax error, something that would prevent an old script from even being properly parsed. > Even for plain arrays, IMO, it would make sense to allow empty > subscripts. In most contexts, an empty arithmetic expression is > interpreted as 0: But ... there's no such thing as array position 0 in native zsh. > unset "a[(e)$key]" > > That still chokes on ()[]`\ and that still can't be worked around This is why I asked for test cases ... combinations that I tried DID work with a[(e)$key]. So, that may be a deal-breaker regardless of the rest of this discussion. > It's also a bit confusing that subscript flags would be > seemingly parsed but later ignored (included in the value of the > key) except for (e). Well, they're parsed and ignored currently, and we already allow some subscripts but not others in various assignment contexts, so this doesn't seem all that weird in comparison. > The fact that (e) is recognised and (ee) is > not also makes for a not very consistent API. What would (ee) mean? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] 2021-06-02 14:52 ` Bart Schaefer @ 2021-06-02 16:02 ` Stephane Chazelas 0 siblings, 0 replies; 58+ messages in thread From: Stephane Chazelas @ 2021-06-02 16:02 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list 2021-06-02 07:52:25 -0700, Bart Schaefer: [...] > > I have to admit I don't see the problem here. I would have > > thought allowing a[]=foo and unset 'a[]' would be no-brainers > > Mostly I was thinking about > > key=$(some derived value) > unset "hash[$key]" > > In existing code that would fail on [[ -z $key ]], but you can't see > that by examination. Not sure I follow. That script doesn't work properly atm as it fails to unset the corresponding hash element and would be fixed once we allow unset 'hash[]' I can't think of real life scenarios where one would *rely* on unset 'hash[]' Aborting the shell with a zsh:unset:1: hash[]: invalid parameter name error. > > as there's no concern about backward compatibility as those > > currently return an error. > > That's not our usual criteria for backward compatibility. Usually we > only change things if the new construct was previously a syntax error, > something that would prevent an old script from even being properly > parsed. By that logic, we could never add features like new options to builtins or new flags. For instance, we couldn't add a -k option to unset because unset -k key arr currently is not a zsh syntax error, that script is parsed OK, and that command returns with: unset: bad option: -k [...] > > Even for plain arrays, IMO, it would make sense to allow empty > > subscripts. In most contexts, an empty arithmetic expression is > > interpreted as 0: > > But ... there's no such thing as array position 0 in native zsh. But it would make the API more consistent if array subscripts could be any arithmetic expressions or comma-separated pair of arithmetic expressions. And when ksharrays is not enabled a[] to return the same error as for a[0] or a[empty] ("assignment to invalid subscript range"). Note that 0 position is allowed in second place: a[2,0]=x for instance (or a[2,empty]=x, but not a[2,]=x atm) to insert a x in second position. I'm not saying that's something we should do or would be terribly useful, just that it would make the interface more consistent, and that array[] being rejected should not be a justification for rejecting assoc[]. [...] > > The fact that (e) is recognised and (ee) is > > not also makes for a not very consistent API. > > What would (ee) mean? The e flag passed twice. echo $a[(e)*], $a[(ee)*], $a[(eee)*] All expand to element of key "*". -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-01 16:05 ` Bart Schaefer 2021-06-02 2:51 ` [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] Bart Schaefer @ 2021-06-02 9:11 ` Stephane Chazelas 2021-06-02 13:34 ` Daniel Shahaf 2021-06-10 0:21 ` [PATCH] (?) typeset array[position=index]=value Bart Schaefer 2 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-02 9:11 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list 2021-06-01 09:05:13 -0700, Bart Schaefer: [...] > > [...] unless we're happy to break backward > > compatibility and make unset "assoc[$key]" work whatever the > > value of $key (unset 'assoc[f\]oo]]' for unset the element of > > key 'f\]oo]' for instance) > > Hm, I'm not sure what backward-compatibility would be broken? Do you > mean that scripts that already have work-arounds for the current issue > might stop working? Yes, like that guy's unset_keys at https://unix.stackexchange.com/questions/626393/in-zsh-how-do-i-unset-an-arbitrary-associative-array-element/626529#626529 mentioned earlier. ATM, you need to do: unset 'hash[\]]' To unset the element of key "]" for instance. Having said that, in practice, it's rare for values to be passed literally. What you generally want to do is: unset "hash[$key]" With $key being any arbitrary string. And that's where the problem is. Maybe the best approach would be to make unset a dual keyword/builtin like typeset/export... so one can do: unset hash[$key] And that hash[$key] being interpreted the same as when you do: hash[$key]=value Being able to do: unset hash[(R)pattern] unset hash[(I)pattern] would also be useful. We don't want however subscript flags to be interpreted in: unset "hash[$key]" as that would introduce command injection vulnerabilities, like is already the case in things like: $ key='(n:evil:)' evil='psvar[$(uname>&2)1]' zsh -c 'typeset -A a; (( a[$key]++ ))' Linux Linux (though in that case, that's not limited to subscript flags, key='x]+psvar[$(uname>&2)1' works as well, see https://unix.stackexchange.com/questions/627474/how-to-use-associative-arrays-safely-inside-arithmetic-expressions/627475#627475 ) That's also why I was suggesting allowing: hash[$key]=() to unset an element, where you don't have the ambiguity of whether the contents of $key is going to be interpreted specially. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-02 9:11 ` [PATCH] (?) typeset array[position=index]=value Stephane Chazelas @ 2021-06-02 13:34 ` Daniel Shahaf 2021-06-02 14:20 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Daniel Shahaf @ 2021-06-02 13:34 UTC (permalink / raw) To: zsh-workers Stephane Chazelas wrote on Wed, 02 Jun 2021 09:11 +00:00: > Maybe the best approach would be to make unset a dual > keyword/builtin like typeset/export... so one can do: > > unset hash[$key] > > And that hash[$key] being interpreted the same as when you do: > > hash[$key]=value Haven't read the whole thread, so apologies if I'm missing something, but: Please let's not invent a reserved word that uses different variable expansion rules. The sequence «hash[$key]» should mean the same thing everywhere. Instead, we could have a builtin that takes two separate arguments, as in «foo hash $key» (and «foo hash ''» to unset the element whose key is the empty string). Makes sense? Cheers, Daniel ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-02 13:34 ` Daniel Shahaf @ 2021-06-02 14:20 ` Stephane Chazelas 2021-06-02 15:59 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-02 14:20 UTC (permalink / raw) To: Daniel Shahaf; +Cc: zsh-workers 2021-06-02 13:34:57 +0000, Daniel Shahaf: [...] > Haven't read the whole thread, so apologies if I'm missing something, but: > > Please let's not invent a reserved word that uses different variable > expansion rules. The sequence «hash[$key]» should mean the same thing > everywhere. That's the problem here. It's already different in hash[$key]=1 typeset hash[$key]=1 # same as in assignment in recent versions # of zsh where here typeset is a keyword (( hash[$key] = 1 )) unset hash[$key] # globbing performed. read hash[$key] $dryrun typeset hash[$key]=1 # here typeset not recognised as # keyword $dryrun typeset 'hash[$key]=1' read 'hash[$key]' let 'hash[$key] = 1' # yes, it works despite the single quotes # and is actually safe it seems unset 'hash[$key]' # unsets the element of key $key (literally) (see also https://unix.stackexchange.com/questions/627474/how-to-use-associative-arrays-safely-inside-arithmetic-expressions/627475#627475) Parsing rules are different because we are in different contexts. For unset hash[$key] or unset hash[(e)*] to treat those as associative array lvalue, and not globs, it would need to be a reserved word, like typeset above. But, yes I agree it's all very messy. > Instead, we could have a builtin that takes two separate > arguments, as in «foo hash $key» (and «foo hash ''» to unset the element > whose key is the empty string). > > Makes sense? [...] I suggested: unset -k "$key" hash for that. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-02 14:20 ` Stephane Chazelas @ 2021-06-02 15:59 ` Bart Schaefer 2021-06-03 2:04 ` [PATCH (not final)] (take three?) unset "array[$anything]" Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-02 15:59 UTC (permalink / raw) To: Daniel Shahaf, Zsh hackers list On Wed, Jun 2, 2021 at 7:20 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > $dryrun typeset hash[$key]=1 # here typeset not recognised as > # keyword Same for "builtin typeset ..." of course. > let 'hash[$key] = 1' # yes, it works despite the single quotes I'm surprised that solves the \`()[] problem. I suspect it's accidental and has to do with a different meaning of [...] in math context. > Parsing rules are different because we are in different > contexts. For unset hash[$key] or unset hash[(e)*] to treat > those as associative array lvalue, and not globs, it would need > to be a reserved word, like typeset above. That's true regarding globs, but I've just had a hand-slaps-forehead moment ... take 3 to follow in another thread. ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-02 15:59 ` Bart Schaefer @ 2021-06-03 2:04 ` Bart Schaefer 2021-06-03 2:42 ` Bart Schaefer 2021-06-03 6:05 ` [PATCH (not final)] (take three?) unset "array[$anything]" Stephane Chazelas 0 siblings, 2 replies; 58+ messages in thread From: Bart Schaefer @ 2021-06-03 2:04 UTC (permalink / raw) To: Zsh hackers list [-- Attachment #1: Type: text/plain, Size: 2349 bytes --] On Wed, Jun 2, 2021 at 8:59 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > I've just had a hand-slaps-forehead > moment ... take 3 to follow in another thread. What I realized is that for any unset of an array element, the closing bracket must always be the last character of the argument. There's no reason to parse the subscript or skip over matching brackets; if a '[' is found, just make sure the last character is ']' and the subscript must be everything in between. % typeset -A ax % for k in '' '\' '`' '(' '[' ')' ']'; do for> ax[$k]=$k for> done % typeset -p ax typeset -A ax=( ['']='' ['(']='(' [')']=')' ['[']='[' ['\']='\' [']']=']' ['`']='`' ) % for k in ${(k)ax}; do unset "ax[$k]" done % typeset -p ax typeset -A ax=( ) Given this realization, it's easy to make { unset "hash[$key]" } work "like it always should have". The trouble comes in with (not) breaking past workarounds. Because the current (and theoretically experimental, though we forgot about that for 5 years) code uses parse_subscript(), we get a partly-tokenized (as if double-quoted, actually) string in the cases where backslashes are used to force the closing bracket to be found. If those backslashes aren't needed any more, there's no clean way to ignore them upon untokenize, to get back to something that actually matches the intended hash key. The attached not-ready-for-push patch has 4 variations that can be chosen by #define, currently set up as follows: #define unset_workers_37914 0 #define unset_hashelem_empty_only 0 #define unset_hashelem_literal 1 #define unset_hashelem_stripquote 0 The first one is just the current code. The second one allows { unset "hash[]" } (and gives "invalid subscript" for array[] instead of "invalid parameter name"). The third one, which I have defined by default, uses the subscript literally, so if you can do { hash[$k]=v } you can also do { unset "hash[$k]" } (at least for all cases I tested). The fourth one requires { unset "hash[${(q)k}]" } instead, but I think it otherwise works for all cases. Both of those also work for "hash[]". Therefore I think the best option is to choose one of the latter two, possibly depending on which one induces the least damage to any workarounds for the current behavior that are known in the wild, though aesthetically I'd rather use the literal version. [-- Attachment #2: unset_subscripts.txt --] [-- Type: text/plain, Size: 2174 bytes --] diff --git a/Src/builtin.c b/Src/builtin.c index a16fddcb7..ec0376367 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -1933,10 +1933,14 @@ getasg(char ***argvp, LinkList assigns) asg.flags = 0; /* search for `=' */ - for (; *s && *s != '='; s++); + for (; *s && *s != '[' && *s != '=' /* && *s != '+' */; s++); + if (s > asg.name && *s == '[') { + char *se = parse_subscript(s + 1, 1, ']'); + if (se && *se == ']') s = se + 1; + } /* found `=', so return with a value */ - if (*s) { + if (*s && *s == '=') { *s = '\0'; asg.value.scalar = s + 1; } else { @@ -3724,6 +3728,11 @@ bin_unset(char *name, char **argv, Options ops, int func) while ((s = *argv++)) { char *ss = strchr(s, '['), *subscript = 0; if (ss) { +#define unset_workers_37914 0 +#define unset_hashelem_empty_only 0 +#define unset_hashelem_literal 1 +#define unset_hashelem_stripquote 0 +#if unset_workers_37914 char *sse; *ss = 0; if ((sse = parse_subscript(ss+1, 1, ']'))) { @@ -3733,6 +3742,47 @@ bin_unset(char *name, char **argv, Options ops, int func) remnulargs(subscript); untokenize(subscript); } +#elif unset_hashelem_empty_only + char *sse; + *ss = 0; + if (ss[1] == ']' && !ss[2] ? (sse = ss+1) : + (sse = parse_subscript(ss+1, 1, ']'))) { + *sse = 0; + subscript = dupstring(ss+1); + *sse = ']'; + remnulargs(subscript); + untokenize(subscript); + } +#else + char *sse = ss + strlen(ss)-1; + *ss = 0; + if (*sse == ']') { +# if unset_hashelem_literal + *sse = 0; + subscript = dupstring(ss+1); + *sse = ']'; +# elif unset_hashelem_stripquote + int ne = noerrs; + noerrs = 2; + *ss = 0; + *sse = 0; + subscript = dupstring(ss+1); + *sse = ']'; + /* + * parse_subst_string() removes one level of quoting. + * If it returns nonzero, substring is unchanged, else + * it has been re-tokenized in place, so clean it up. + */ + if (!parse_subst_string(subscript)) { + remnulargs(subscript); + untokenize(subscript); + } + noerrs = ne; +# else + XXX parse error XXX +# endif + } +#endif } if ((ss && !subscript) || !isident(s)) { if (ss) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 2:04 ` [PATCH (not final)] (take three?) unset "array[$anything]" Bart Schaefer @ 2021-06-03 2:42 ` Bart Schaefer 2021-06-03 6:12 ` Bart Schaefer 2021-06-03 6:05 ` [PATCH (not final)] (take three?) unset "array[$anything]" Stephane Chazelas 1 sibling, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-03 2:42 UTC (permalink / raw) To: Zsh hackers list On Wed, Jun 2, 2021 at 7:04 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > #define unset_hashelem_literal 1 > #define unset_hashelem_stripquote 0 > > The fourth one requires { unset "hash[${(q)k}]" } instead, > but I think it otherwise works for all cases. Another point in favor of "stripquote" is that it makes ${(b)k} function in a lot of cases. The failing cases for (b) in that branch are backticks and a leading '\='. Also, if you know of good test cases, please try them / send them along. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 2:42 ` Bart Schaefer @ 2021-06-03 6:12 ` Bart Schaefer 2021-06-03 8:54 ` Peter Stephenson 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-03 6:12 UTC (permalink / raw) To: Zsh hackers list On Wed, Jun 2, 2021 at 7:42 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > Also, if you know of good test cases, please try them / send them along. I've now tried all the hash keys from the test harness in https://unix.stackexchange.com/questions/626393/in-zsh-how-do-i-unset-an-arbitrary-associative-array-element/626529#626529 It found a few additional misfires when using the (b) flag with the stripquote variation, but both literal and stripquote worked in the respectively expected ways for all those cases plus some others I had already tried. Furthermore, both of them do approximately as well with the $MATCH pattern from workers/43269 as does the current variation -- the original shell unsets 281 out of 291 keys, the literal variation 280, and the stripquote variation 284. The keys not unset using that pattern are different in each of the three cases, except that they all fail to unset key='\`' ... several of the failures with the literal variation have multiple backslashes, though not always consecutive, and it does fail with key=']' and key='\' which could be important. So, we're probably in pretty safe territory with either choice but in terms of backwards compatibility the stripquote version is slightly ahead. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 6:12 ` Bart Schaefer @ 2021-06-03 8:54 ` Peter Stephenson 2021-06-03 13:13 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Peter Stephenson @ 2021-06-03 8:54 UTC (permalink / raw) To: Zsh hackers list > On 03 June 2021 at 07:12 Bart Schaefer <schaefer@brasslantern.com> wrote: > So, we're probably in pretty safe territory with either choice but in > terms of backwards compatibility the stripquote version is slightly > ahead. I'd be in favour of this. I agree with the consensus that the current code is not really usable for difficult cases, so some degree of incompatibility is warranted. pws ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 8:54 ` Peter Stephenson @ 2021-06-03 13:13 ` Stephane Chazelas 2021-06-03 14:41 ` Peter Stephenson 2021-06-03 18:12 ` Bart Schaefer 0 siblings, 2 replies; 58+ messages in thread From: Stephane Chazelas @ 2021-06-03 13:13 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh hackers list 2021-06-03 09:54:41 +0100, Peter Stephenson: > > On 03 June 2021 at 07:12 Bart Schaefer <schaefer@brasslantern.com> wrote: > > So, we're probably in pretty safe territory with either choice but in > > terms of backwards compatibility the stripquote version is slightly > > ahead. > > I'd be in favour of this. I agree with the consensus that the current code > is not really usable for difficult cases, so some degree of incompatibility > is warranted. [...] If I understand correctly, the "stripquote" variant is the one where the user can do: unset 'hash["foo"]' unset "hash['']" unset "hash[${(qq)key}]" That is where quotes are parsed and removed but otherwise serve no purpose. IMO, that just confuses things even more. That hardly helps with backward compatibility as users who did work around the previous behaviour will still have to adapt their work around, and those who didn't (who did unset "hash[$key]") will have their code choke on even more characters (", $, ' in addition to the previous ones, and likely more confusing behaviour when there are unmatched quotes or ()). It also deviates even more from the other places that take lvalues causing potential confustion. hash["foo"]=x or read 'hash["foo"]' set the element of key "foo" (quotes included) not foo. To me, acceptable options would be the "literal" one, or add another quoting flag (maybe "bb"?) that quotes in the manner currently expected by unset so people can use unset "hash[${(bb)key}]" (but unfortunately not read "hash[${(bb)key}]" for which $, ` also need to be escaped with backslash so that bb flag would only be useful for unset), or unset -k "$key" hash, or a new option à la bash. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 13:13 ` Stephane Chazelas @ 2021-06-03 14:41 ` Peter Stephenson 2021-06-04 19:25 ` Bart Schaefer 2021-06-03 18:12 ` Bart Schaefer 1 sibling, 1 reply; 58+ messages in thread From: Peter Stephenson @ 2021-06-03 14:41 UTC (permalink / raw) To: Zsh hackers list > On 03 June 2021 at 14:13 Stephane Chazelas <stephane@chazelas.org> wrote: > If I understand correctly, the "stripquote" variant is the one > where the user can do: > > unset 'hash["foo"]' > > unset "hash['']" > > unset "hash[${(qq)key}]" > > That is where quotes are parsed and removed but otherwise > serve no purpose. > > IMO, that just confuses things even more. It would probably be useful to make a list of difficult cases --- both how the original is handled and how anything that looks like a likely workaround is treated --- and then assemble a table of how they're handled now and with the two most promising proposed fixes. pws ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 14:41 ` Peter Stephenson @ 2021-06-04 19:25 ` Bart Schaefer 2021-06-05 18:18 ` Peter Stephenson 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-04 19:25 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh hackers list On Thu, Jun 3, 2021 at 7:42 AM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > It would probably be useful to make a list of difficult cases --- both > how the original is handled and how anything that looks like a likely workaround > is treated --- and then assemble a table of how they're handled now and with the > two most promising proposed fixes. Are the samples I posted in workers/49005 adequate? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-04 19:25 ` Bart Schaefer @ 2021-06-05 18:18 ` Peter Stephenson 2021-06-09 23:31 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Peter Stephenson @ 2021-06-05 18:18 UTC (permalink / raw) To: Zsh hackers list On Fri, 2021-06-04 at 12:25 -0700, Bart Schaefer wrote: > On Thu, Jun 3, 2021 at 7:42 AM Peter Stephenson > <p.w.stephenson@ntlworld.com> wrote: > > > > It would probably be useful to make a list of difficult cases --- both > > how the original is handled and how anything that looks like a likely workaround > > is treated --- and then assemble a table of how they're handled now and with the > > two most promising proposed fixes. > > Are the samples I posted in workers/49005 adequate? Looks very much like what's needed. Thanks. pws ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-05 18:18 ` Peter Stephenson @ 2021-06-09 23:31 ` Bart Schaefer 2021-06-13 16:51 ` Peter Stephenson 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-09 23:31 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh hackers list On Sat, Jun 5, 2021 at 11:18 AM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > On Fri, 2021-06-04 at 12:25 -0700, Bart Schaefer wrote: > > On Thu, Jun 3, 2021 at 7:42 AM Peter Stephenson > > <p.w.stephenson@ntlworld.com> wrote: > > > > > > assemble a table of how they're handled now and with the > > > two most promising proposed fixes. > > > > Are the samples I posted in workers/49005 adequate? > > Looks very much like what's needed. Thanks. How do we make the call on what to do, then? There are a total of 29 test cases where at least one combination (bare current shell, current shell with Stephane's workaround, and stripquote or literal with the workaround) fails on the test key. (31 appear in 49005 but two of the test cases are redundant, which I hadn't noticed before.) Considering only the 3 variants with the workaround, literal misses 5 of the same cases as the current shell, but 9 additional cases. Stripquote misses only one of the same cases as the current shell but also only 5 additional cases. The current shell (w/workaround) misses 4 cases that neither of the other two miss. The cases literal fails on are, as might be expected, those in which the workaround adds extra backslashes. Stripquote works for most of those because it peels the extra backslash off again ... but that's exactly the behavior which Stephane points out "otherwise serves no purpose". I still feel literal is the best option despite potentially breaking more workarounds, because it means the workarounds can just be removed rather than modified. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-09 23:31 ` Bart Schaefer @ 2021-06-13 16:51 ` Peter Stephenson 2021-06-13 18:04 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Peter Stephenson @ 2021-06-13 16:51 UTC (permalink / raw) To: zsh-workers On Wed, 2021-06-09 at 16:31 -0700, Bart Schaefer wrote: > On Sat, Jun 5, 2021 at 11:18 AM Peter Stephenson > <p.w.stephenson@ntlworld.com> wrote: > > > > On Fri, 2021-06-04 at 12:25 -0700, Bart Schaefer wrote: > > > On Thu, Jun 3, 2021 at 7:42 AM Peter Stephenson > > > <p.w.stephenson@ntlworld.com> wrote: > > > > > > > > assemble a table of how they're handled now and with the > > > > two most promising proposed fixes. > > > > > > Are the samples I posted in workers/49005 adequate? > > > > Looks very much like what's needed. Thanks. > > How do we make the call on what to do, then? > > The cases literal fails on are, as might be expected, those in which > the workaround adds extra backslashes. Stripquote works for most of > those because it peels the extra backslash off again ... but that's > exactly the behavior which Stephane points out "otherwise serves no > purpose". > > I still feel literal is the best option despite potentially breaking > more workarounds, because it means the workarounds can just be removed > rather than modified. I don't have a killer argument, but what you just said sounds like the best we've got, so I'd certainly be happy to vote for that, in the absence of a magic wand. I'm wondering about something we could do that would ease the transition, but I think I've convinced myeelf that just adds yet more unwanted complexity. pws ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-13 16:51 ` Peter Stephenson @ 2021-06-13 18:04 ` Bart Schaefer 2021-06-13 19:48 ` Peter Stephenson 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-13 18:04 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh hackers list On Sun, Jun 13, 2021 at 9:54 AM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > I'm wondering about something we could do that would ease the > transition, but I think I've convinced myself that just adds yet more > unwanted complexity. The only thing I could come up with is an option, but then ... in which state does it start, and what do we do with it later? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-13 18:04 ` Bart Schaefer @ 2021-06-13 19:48 ` Peter Stephenson 2021-06-13 21:44 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Peter Stephenson @ 2021-06-13 19:48 UTC (permalink / raw) To: zsh-workers On Sun, 2021-06-13 at 11:04 -0700, Bart Schaefer wrote: > On Sun, Jun 13, 2021 at 9:54 AM Peter Stephenson > <p.w.stephenson@ntlworld.com> wrote: > > > > I'm wondering about something we could do that would ease the > > transition, but I think I've convinced myself that just adds yet more > > unwanted complexity. > > The only thing I could come up with is an option, but then ... in > which state does it start, and what do we do with it later? Yes, in fact those are the same thoughts that flitted through my mind. I think the answer to the first is "we'd have it on the new setting to alert people the change is imminent", but I don't have an answer to the second: it just seems to be stuck there, useless. pws ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-13 19:48 ` Peter Stephenson @ 2021-06-13 21:44 ` Bart Schaefer 2021-06-14 7:19 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-13 21:44 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh hackers list On Sun, Jun 13, 2021 at 12:50 PM Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > Yes, in fact those are the same thoughts that flitted through my mind. One other idea ... NO_UNSET could warn if you unset something that's not set, as well as when you $deref something that's not set. That would at least blat at you if your hash key was misinterpreted. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-13 21:44 ` Bart Schaefer @ 2021-06-14 7:19 ` Stephane Chazelas 0 siblings, 0 replies; 58+ messages in thread From: Stephane Chazelas @ 2021-06-14 7:19 UTC (permalink / raw) To: Bart Schaefer; +Cc: Peter Stephenson, Zsh hackers list 2021-06-13 14:44:03 -0700, Bart Schaefer: > On Sun, Jun 13, 2021 at 12:50 PM Peter Stephenson > <p.w.stephenson@ntlworld.com> wrote: > > > > Yes, in fact those are the same thoughts that flitted through my mind. > > One other idea ... NO_UNSET could warn if you unset something that's > not set, as well as when you $deref something that's not set. That > would at least blat at you if your hash key was misinterpreted. Note that nounset is a POSIX option. And for the unset special utility: "Unsetting a variable or function that was not previously set shall not be considered an error and does not cause the shell to abort." POSIX has no jurisdiction over arrays/hashes, but that does mean we can't have unset fail on unset variables when in sh emulation at least. A shell that would fail upon: unset var When var was not previously set would not be compliant. A script that would do unset "var[foo]" Would not be a POSIX script as var[foo] is not a valid variable name, so implementations are free to do whatever they want for them. But if zsh returned failure for when var[foo] is not set and not in unset var, that would make it quite inconsistent. Changing the behaviour would likely break some scripts that use errexit as well. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 13:13 ` Stephane Chazelas 2021-06-03 14:41 ` Peter Stephenson @ 2021-06-03 18:12 ` Bart Schaefer 2021-06-04 8:02 ` Stephane Chazelas 1 sibling, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-03 18:12 UTC (permalink / raw) To: Peter Stephenson, Zsh hackers list [-- Attachment #1: Type: text/plain, Size: 3153 bytes --] On Thu, Jun 3, 2021 at 6:14 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > If I understand correctly, the "stripquote" variant is the one > where the user can do: > > unset 'hash["foo"]' > > unset "hash['']" > > unset "hash[${(qq)key}]" > > That is where quotes are parsed and removed but otherwise > serve no purpose. I guess you can describe it that way, yes. > That hardly helps with backward compatibility as users who did > work around the previous behaviour will still have to adapt > their work around The point of my previous analysis is that they probably will not have to adapt. Your $MATCH workaround operates correctly with the stripquote version in nearly as many cases as it currently does (it is not foolproof for the current shell implementation). However, that doesn't mean I advocate for this over the "literal" variant. > and those who didn't (who did unset > "hash[$key]") will have their code choke on even more characters > (", $, ' in addition to the previous ones, and likely more > confusing behaviour when there are unmatched quotes or ()). I extended my test set to include a number of variations on those characters as well. I've attached the test script. Here are my test cases that fail if you just blindly do for k in ${(k)ax}; do unset "ax[$k]"; done typeset -p ax with appropriate trapping for the fatal errors. The literal variant gets all of these, leaving the test hash empty. Neither literal nor stripquote throws any errors, so the trap is only needed for the current implementation. Starting with that: typeset -A ax=( [$'\M-\C-@\\']=$'\M-\C-@\\' [$'\M-\C-@`']=$'\M-\C-@`' ['"${(@w)oops"']='"${(@w)oops"' ['(a']='(a' ['\$']='\$' ['\(']='\(' ['\)']='\)' ['\[']='\[' ['\\\']='\\\' ['\\\\']='\\\\' ['\]']='\]' ['\`']='\`' ) The cases that fail with the stripquote variation: typeset -A ax=( [$'\M-\C-@\\']=$'\M-\C-@\\' ['""']='""' ['"$oops"']='"$oops"' ['"safe"']='"safe"' ['"set"']='"set"' [\'\']=\'\' [\''safe'\']=\''safe'\' [\''set'\']=\''set'\' ['\!']='\!' ['\$']='\$' ['\(']='\(' ['\)']='\)' ['\*']='\*' ['\=']='\=' ['\@']='\@' ['\[']='\[' ['\\\']='\\\' ['\\\\']='\\\\' ['\]']='\]' ['\`']='\`' ['\s\a\f\e']='\s\a\f\e' ['\s\e\t']='\s\e\t' ) Current shell using $MATCH workaround: typeset -A ax=( [$'\M-\C-@`']=$'\M-\C-@`' ['"${(@w)oops"']='"${(@w)oops"' ['(']='(' ['(a']='(a' [')']=')' ['[']='[' ['\(']='\(' ['\)']='\)' ['\[']='\[' ['\`']='\`' ['`']='`' ) And stripquote with $MATCH workaround: typeset -A ax=( ['""']='""' ['"$oops"']='"$oops"' ['"safe"']='"safe"' ['"set"']='"set"' [\'\']=\'\' [\''safe'\']=\''safe'\' [\''set'\']=\''set'\' ['\`']='\`' ) Finally, literal with $MATCH workaround: typeset -A ax=( [$'\M-\C-@\\']=$'\M-\C-@\\' ['\']='\' ['\!']='\!' ['\$']='\$' ['\(']='\(' ['\)']='\)' ['\*']='\*' ['\=']='\=' ['\@']='\@' ['\[']='\[' ['\\\']='\\\' ['\`']='\`' ['\s\a\f\e']='\s\a\f\e' ['\s\e\t']='\s\e\t' [']']=']' ) I agree that the literal variation is probably the one most useful to re-apply in cases like read, print -v, etc. [-- Attachment #2: test_unset.zsh --] [-- Type: application/octet-stream, Size: 2033 bytes --] exec 2>&1 # to see fatal error messages in proper sequence unset ax bx oops oops='this is a problem' typeset -A ax bx typeset -a kx=( '' '\' '`' '(' '[' ')' ']' '=' '"' "'" '!' '@' '$' '*' $'\0' '\\' '\`' '\(' '\[' '\)' '\]' '\=' '\!' '\@' '\$' '\*' '()safe' '(r)set' '(R)set' '(k)safe' # look like valid subscript flags '(a' '(a)' '(n:foo:)a' '(n:1)a' # look like invalid subscript flags 'set' '"set"' \'set\' '\s\e\t' 'safe' '"safe"' \'safe\' '\s\a\f\e' '\\' '\\\' '\\\\' '""' \'\' 'two words' 'two spaces' ' initial space' 'trailing space ' $'\x80\\' $'\x80\`' $'\x80\~' # broken UTF-8 '?~>#' '$oops' '${oops}' '"$oops"' '"${(@w)oops"' '${(qqq)oops}' '$(echo oops)' ) for ((i=0; i<255; i++)); do printf -v n '\\x%02x' $i eval "kx+=(\$'$n')" done for k in "$kx[@]"; do ax[$k]=$k bx[$k]=$k done typeset -p 1 ax print COUNT: $#ax print $'\nWith (b)' for k in "$kx[@]"; do ( unset "ax[${(b)k}]" ) && unset "ax[${(b)k}]" && print -r -- ${ax[$k]+Missed} ${(V)ax[$k]-Got $k} done typeset -p 1 ax ax=(${(kv)bx}) print $'\nWith (q)' for k in "$kx[@]"; do ( unset "ax[${(q)k}]" ) && unset "ax[${(q)k}]" && print -r -- ${ax[$k]+Missed} ${(V)ax[$k]-Got $k} done typeset -p 1 ax ax=(${(kv)bx}) print $'\nWith (qq)' for k in "$kx[@]"; do ( unset "ax[${(qq)k}]" ) && unset "ax[${(qq)k}]" && print -r -- ${ax[$k]+Missed} ${(V)ax[$k]-Got $k} done typeset -p 1 ax ax=(${(kv)bx}) print $'\nWith (qqq)' for k in "$kx[@]"; do ( unset "ax[${(qqq)k}]" ) && unset "ax[${(qqq)k}]" && print -r -- ${ax[$k]+Missed} ${(V)ax[$k]-Got $k} done typeset -p 1 ax ax=(${(kv)bx}) print $'\nWithout backslashes' for k in "$kx[@]"; do ( unset "ax[$k]" ) && unset "ax[$k]" && print -r -- ${ax[$k]+Missed} ${(V)ax[$k]-Got $k} done typeset -p 1 ax ax=(${(kv)bx}) print $'\nWith pattern from workers/43269' for k in "$kx[@]"; do ( unset "ax[${(*)k//(#m)[]\\]/\\$MATCH}]" ) && unset "ax[${(*)k//(#m)[]\\]/\\$MATCH}]" && print -r -- ${ax[$k]+Missed} ${(V)ax[$k]-Got $k} done typeset -p 1 ax print COUNT: $#ax ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 18:12 ` Bart Schaefer @ 2021-06-04 8:02 ` Stephane Chazelas 2021-06-04 18:36 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-04 8:02 UTC (permalink / raw) To: Bart Schaefer; +Cc: Peter Stephenson, Zsh hackers list 2021-06-03 11:12:19 -0700, Bart Schaefer: [...] > > If I understand correctly, the "stripquote" variant is the one > > where the user can do: [...] > > That is where quotes are parsed and removed but otherwise > > serve no purpose. > > I guess you can describe it that way, yes. It also means the lexer, a large and complex bit of code whose behaviour also depends on a the setting of a number of options will end up being exposed to user-supplied data (of the users of the zsh scripts, not just the users of zsh), which adds some level of risk. > > That hardly helps with backward compatibility as users who did > > work around the previous behaviour will still have to adapt > > their work around > > The point of my previous analysis is that they probably will not have > to adapt. Your $MATCH workaround operates correctly with the > stripquote version in nearly as many cases as it currently does (it is > not foolproof for the current shell implementation). If I understand correctly, that quote stripping removes one level of quotes from tokens identified as string tokens provided the lexer succeeded in tokenising the input, so any workaround would have to escape/quote any shell special character to guarantee the lexer succeeds. [...] > I agree that the literal variation is probably the one most useful to > re-apply in cases like read, print -v, etc. Note that read/print -v, etc. is radically different. Currently we can do (reliably): read 'hash[$key]' and can't do (reliably) read "hash[$key]" If we align with whatever solution we pick for unset, we're going to break a lot more scripts. I don't think we can touch those at least in the default mode operation. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-04 8:02 ` Stephane Chazelas @ 2021-06-04 18:36 ` Bart Schaefer 2021-06-04 20:21 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-04 18:36 UTC (permalink / raw) To: Bart Schaefer, Peter Stephenson, Zsh hackers list On Fri, Jun 4, 2021 at 1:02 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > It also means the lexer, a large and complex bit of code whose > behaviour also depends on a the setting of a number of options > will end up being exposed to user-supplied data (of the users of > the zsh scripts, not just the users of zsh), which adds some > level of risk. This remark baffles me. The exact same code is used for ${v/pat/repl} among other things, I haven't added any new entry point to the lexer. Still, we're continuing to banter about what I think is the second-best solution anyway. > Currently we can do (reliably): > > read 'hash[$key]' It's unfortunate that this is "reliable" because I'm pretty sure it's totally unintentional and should be considered a bug. "read" should not be re-interpreting $key in its NAME parameter. > and can't do (reliably) > > read "hash[$key]" That one ought to be reliable if using hash elements here is permitted at all. > If we align with whatever solution we pick for unset, we're > going to break a lot more scripts. I don't think we can touch > those at least in the default mode operation. I don't know how many is "a lot" here, because frankly this is something I would never have thought of attempting in the first place (particularly, relying on the buggy behavior of the single-quoted case). But I really want to STOP talking about this in the same thread as the "unset" question. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-04 18:36 ` Bart Schaefer @ 2021-06-04 20:21 ` Stephane Chazelas 2021-06-05 0:20 ` Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-04 20:21 UTC (permalink / raw) To: Bart Schaefer; +Cc: Peter Stephenson, Zsh hackers list 2021-06-04 11:36:09 -0700, Bart Schaefer: > On Fri, Jun 4, 2021 at 1:02 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > > > It also means the lexer, a large and complex bit of code whose > > behaviour also depends on a the setting of a number of options > > will end up being exposed to user-supplied data (of the users of > > the zsh scripts, not just the users of zsh), which adds some > > level of risk. > > This remark baffles me. The exact same code is used for ${v/pat/repl} > among other things, I haven't added any new entry point to the lexer. But there (and I expect most other contexts where that is used), the repl is not data supplied by the user of the script, it will be typically fixed code like $replacement as supplied by the *author* of the script. While in: while IFS=, read -r a b c; do unset "hash[$b]" done < data And with the stripquotes patch, you'd have the *contents* of $b (from external data), not the literal $b string fed to the lexer as if it was meant to be zsh code. [...] > > Currently we can do (reliably): > > > > read 'hash[$key]' > > It's unfortunate that this is "reliable" because I'm pretty sure it's > totally unintentional and should be considered a bug. "read" should > not be re-interpreting $key in its NAME parameter. I'd tend to agree with that statement, and it's this kind of thing that makes things taking lvalues or arithmetic expressions very dangerous. As discussed several times (IIRC, Oliver Kiddle brought it up in 2014 in the aftermaths of shellshock). ((x == 1)) or printf %d x or read "array[x]" are also command injection vulnerabilities for the same kind of reason. Here for instance when x='psvar[$(reboot)1]' even though there's probably not a good reason why $(reboot) would be expanded when x is referenced in an arithmetic expression. > > > and can't do (reliably) > > > > read "hash[$key]" > > That one ought to be reliable if using hash elements here is permitted at all. > > > If we align with whatever solution we pick for unset, we're > > going to break a lot more scripts. I don't think we can touch > > those at least in the default mode operation. > > I don't know how many is "a lot" here I don't have an answer to that either. > because frankly this is > something I would never have thought of attempting in the first place > (particularly, relying on the buggy behavior of the single-quoted > case). But I really want to STOP talking about this in the same > thread as the "unset" question. Well, it would be good if all those things would be fixed once and for all and in a consistent way. There's also the fact that we can't do: hash['foo bar baz'$'\n'...]=value (one needs things like hash[${(pl[1][\n])}]=x to set elements of key $'\n') or array[1 + 1]=4 That is pretty annoying. Hard to fix without breaking backward portability, but could we introduce a better design under a new option (setopt bettersubscriptsandarithmetics), or a "use 6.0" à la perl? See also https://github.com/ksh93/ksh/issues/152 where Martijn Dekker has done some improvements on that front in ksh93. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-04 20:21 ` Stephane Chazelas @ 2021-06-05 0:20 ` Bart Schaefer 2021-06-05 17:05 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-05 0:20 UTC (permalink / raw) To: Zsh hackers list On Fri, Jun 4, 2021 at 1:21 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > while IFS=, read -r a b c; do > unset "hash[$b]" > done < data > > And with the stripquotes patch, you'd have the *contents* of $b > (from external data), not the literal $b string fed to the lexer > as if it was meant to be zsh code. It's doing only pure lexical analysis of a single "string" token, there's no chance of it interpolating anything. > There's also the fact that we can't do: > > hash['foo bar > baz'$'\n'...]=value Hmm, you mean because subscripts (even hash subscripts) are parsed as if in $(( )) and $'...' does not expand there. > array[1 + 1]=4 > > That is pretty annoying. That's just shell whitespace rules, though. If you have nobadpattern set, you get zsh: command not found: array[1 And if the parser doesn't eventually find ]= then what? Re-parse the whole thing to word-split it? I don't think there's any way to retconn word-splitting. We could perhaps overload ={...} (or something) so that instead of : ${array[1 + 1]:=4} you could write ={array[1 + 1]}=4 but I don't think that's really that much prettier (and unlikely as it would be to appear in any existing script, that's obviously not backwards-compatible). ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-05 0:20 ` Bart Schaefer @ 2021-06-05 17:05 ` Stephane Chazelas 2021-06-10 0:14 ` Square brackets in command position Bart Schaefer 0 siblings, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-05 17:05 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list 2021-06-04 17:20:29 -0700, Bart Schaefer: [...] > > array[1 + 1]=4 > > > > That is pretty annoying. > > That's just shell whitespace rules, though. If you have nobadpattern > set, you get > > zsh: command not found: array[1 > > And if the parser doesn't eventually find ]= then what? Re-parse the > whole thing to word-split it? [...] Note that all other shells that support a[1]=value (ksh, pdksh, bash) do it. In effect that makes them non-POSIX as there's nothing in the spec POSIX that says a[1 + 1]=2 May not run the a[1 with +] and =2 as arguments. I did bring that up a few years ago at: https://www.mail-archive.com/austin-group-l@opengroup.org/msg04563.html (see also https://austingroupbugs.net/view.php?id=1279) There is variation in behaviour between shells if the first word starts with name[ and either a matching ] or matching ] followed by = is not found, and how that matching is done or what kind of quoting are supported inside the [...]. But all in all, even if it may not be pretty when you look closely, that does work fine there as long as you're not trying to fool the parser. In zsh, that would be even less problematic as (at least when not in sh emulation), zsh complains about unmatched [s or ]s (except for [ alone) Even: $ /bin/[ a = a ] zsh: bad pattern: /bin/[ So we wouldn't be breaking anything if we started to accept: hash[foo bar; baz]=x or array[1 + 1]=x We would if we started to accept. hash['x]y'-$'\n']=x like ksh93/bash do though. For literal hash[key]=value assignment, I never remember what needs to be quoted and how. I generally resort to doing var=key; hash[$var]=x or hash+=(key x) which I know are fine. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Square brackets in command position 2021-06-05 17:05 ` Stephane Chazelas @ 2021-06-10 0:14 ` Bart Schaefer 0 siblings, 0 replies; 58+ messages in thread From: Bart Schaefer @ 2021-06-10 0:14 UTC (permalink / raw) To: Zsh hackers list On Sat, Jun 5, 2021 at 10:05 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2021-06-04 17:20:29 -0700, Bart Schaefer: > [...] > > > array[1 + 1]=4 > > > > > > That is pretty annoying. > > > > That's just shell whitespace rules, though. > > Note that all other shells that support a[1]=value (ksh, pdksh, > bash) do it. Chip will probably correct me, but from some experiments with bash it appears that (with the exception of '[' all alone, to invoke "test") an (unquoted) '[' in a word in command position always begins an expression that must be bounded by a closing (unquoted) ']' (even if that means continuing to the PS2 prompt). Thus for example $ a[ 1 + 1 ]b globs to a file named "a b" in the current directory and does a path search to run that as a command. Note that $ [a ]b globs to "ab" or " b" (leading space there if that's hard to see), it's not what's to the left of '[' that triggers this. Thus it's not just assignments that have command-position magic, character classes can have embedded (unquoted) spaces there as well. This breaks if you are looking for a file named e.g. "a+=" because ']=' turns the whole thing into an array element assignment. > There is variation in behaviour between shells if the first word > starts with name[ and either a matching ] or matching ] followed > by = is not found, and how that matching is done or what kind of > quoting are supported inside the [...]. That doesn't give us any good ideas on where to start. Of course zsh's parsing hash keys using the same quoting rules as arithmetic was a quick way to slurp hash[string] through the existing array[math] parser at the time, and in retrospect probably should have been given its own rules. > In zsh, that would be even less problematic as (at least when > not in sh emulation), zsh complains about unmatched [s or ]s As mentioned before, that's the BAD_PATTERN setopt, which can be turned off independent of emulation, so it might be a stretch to say we wouldn't be breaking anything. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 2:04 ` [PATCH (not final)] (take three?) unset "array[$anything]" Bart Schaefer 2021-06-03 2:42 ` Bart Schaefer @ 2021-06-03 6:05 ` Stephane Chazelas 2021-06-03 6:43 ` Bart Schaefer 1 sibling, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2021-06-03 6:05 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list 2021-06-02 19:04:01 -0700, Bart Schaefer: > On Wed, Jun 2, 2021 at 8:59 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > I've just had a hand-slaps-forehead > > moment ... take 3 to follow in another thread. > > What I realized is that for any unset of an array element, the closing > bracket must always be the last character of the argument. There's no > reason to parse the subscript or skip over matching brackets; if a '[' > is found, just make sure the last character is ']' and the subscript > must be everything in between. D'oh, I had assumed that had been like that in the beginning but changed to allow backslash processing for some reason or other such as alignment with something else. [...] > Given this realization, it's easy to make { unset "hash[$key]" } work > "like it always should have". The trouble comes in with (not) > breaking past workarounds. I think I'd be in favour of breaking these workarounds on the ground that it would fix far more existing script (that just do unset "hash[$key]") than it would fix. There's also the question of what to do with read "hash[$key]" getopts "hash[$key]" and getln, print[f] -v, sysread, strftime -s... I've not tested all of them but at least for read and print -v, key=foo typeset -A a print -v 'a[$key]' x gives: typeset -A a=( [foo]=x ) and not typeset -A a=( ['$key']=x ) With those, one can't do: read "hash[$key]" (which would be a command injection vulnerability) And the (unintuitive) work around is: read 'hash[$key]' It's the same in bash unless you set the assoc_expand_once option there, not in ksh93. My suggestion of making unset a keyword so that unset hash[$key] works as expected while unset "hash[$key]" works as before won't fly as we can't possibly make all builtins that take lvalues keywords. [...] > Therefore I think the best option is to choose one of the latter two, > possibly depending on which one induces the least damage to any > workarounds for the current behavior that are known in the wild, > though aesthetically I'd rather use the literal version. [...] Well, the only one that doesn't break workarounds is to leave it asis (and potentially add support for unset 'hash[]'), or introduce a new syntax (like unset -k "$key" hash) or an option like bash's assoc_expand_once to switch to the new behaviour. We can't really expect people to carry on doing: () { set -o localoptions +o multibyte -o extendedglob unset "hash[${key//(#m)[][\\()\`]/\\$MATCH}]"; } to unset a hash key, so I don't thing it's reasonable to leave it asis. Another option would be to align unset with the other builtins, but that's even more problematic as codes that were doing: unset "hash[$key]" would change from being buggy (choking on []()`\ and other characters containing the encoding of those) to being command injection vulnerabilities (like with bash without assoc_expand_once): $ echo x | key='$(uname>&2)x' bash -c 'typeset -A a; a[$key]=1; unset "a[$key]"; typeset -p a' Linux declare -A a=(["\$(uname>&2)x"]="1" ) $ echo x | key='$(uname>&2)x' bash -O assoc_expand_once -c 'typeset -A a; a[$key]=1; unset "a[$key]"; typeset -p a' declare -A a=() (zsh has the same problem with read/print -v...) And again, there is also the problem of hashes used in arithmetic expressions, including: key='evil $(reboot)' print -v 'array[++hash[\$key]]' x (here you need both the single quotes and the \ to avoid evaluations of code in the contents of $key) I must admit I don't really have an idea of how to get out of this mess. bash, which must have gone through a similar exercise, hasn't fixed everything with its assoc_expand_once (see https://unix.stackexchange.com/questions/627474/how-to-use-associative-arrays-safely-inside-arithmetic-expressions/627475#627475 again). -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 6:05 ` [PATCH (not final)] (take three?) unset "array[$anything]" Stephane Chazelas @ 2021-06-03 6:43 ` Bart Schaefer 2021-06-03 7:31 ` Stephane Chazelas 0 siblings, 1 reply; 58+ messages in thread From: Bart Schaefer @ 2021-06-03 6:43 UTC (permalink / raw) To: Zsh hackers list On Wed, Jun 2, 2021 at 11:05 PM Stephane Chazelas <stephane@chazelas.org> wrote: > > 2021-06-02 19:04:01 -0700, Bart Schaefer: > > On Wed, Jun 2, 2021 at 8:59 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > What I realized is that for any unset of an array element, the closing > > bracket must always be the last character of the argument. > > D'oh, I had assumed that had been like that in the beginning No ... it used to do skipparens() which would always stop too soon if there was an embedded ']', and I changed it to parse_subscript() to make '\]' possible. > I think I'd be in favour of breaking these workarounds on the > ground that it would fix far more existing script (that just do > unset "hash[$key]") than it would [break]. See my follow-up message comparing variations. > There's also the question of what to do with ... a bunch of things that aren't documented to work with associative array elements but accidentally do ... > My suggestion of making unset a keyword so that unset hash[$key] > works as expected while unset "hash[$key]" works as before > won't fly as we can't possibly make all builtins that take > lvalues keywords. Strictly speaking they don't take lvalues, they take shell parameter names and use them as lvalues. > We can't really expect people to carry on doing: > > () { set -o localoptions +o multibyte -o extendedglob > unset "hash[${key//(#m)[][\\()\`]/\\$MATCH}]"; } There's already an unreleased change for the (*) parameter flag to locally enable extendedglob: unset "hash[${(*)key//(#m)[][\\()\`]/\\$MATCH}]"; } but still unreasonable. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH (not final)] (take three?) unset "array[$anything]" 2021-06-03 6:43 ` Bart Schaefer @ 2021-06-03 7:31 ` Stephane Chazelas 0 siblings, 0 replies; 58+ messages in thread From: Stephane Chazelas @ 2021-06-03 7:31 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list 2021-06-02 23:43:50 -0700, Bart Schaefer: [...] > > There's also the question of what to do with > > ... a bunch of things that aren't documented to work with associative > array elements but accidentally do ... [...] That's quite a natural thing to want to do though and is supported by all shells with arrays (except yash it seems). See for instance CSV processing such as: typeset -A field IFS=, { read -rA headers (($#headers)) && while read -r 'field['$^headers']'; do typeset -p field do-something-with "$field[firstname]" "$field[lastname]" done } < file.csv That code above currently fails if the header contains some special characters and has a code injection vulnerability (like if the first line of the csv is $(reboot)), and making read a keyword wouldn't help. It can currently be avoided with: typeset -A field IFS=, { read -rA headers (($#headers)) && while read -r 'field[$headers['{1..$#headers}']]'; do typeset -p field do-something-with "$field[firstname]" "$field[lastname]" done } < file.csv -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-01 16:05 ` Bart Schaefer 2021-06-02 2:51 ` [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] Bart Schaefer 2021-06-02 9:11 ` [PATCH] (?) typeset array[position=index]=value Stephane Chazelas @ 2021-06-10 0:21 ` Bart Schaefer 2 siblings, 0 replies; 58+ messages in thread From: Bart Schaefer @ 2021-06-10 0:21 UTC (permalink / raw) To: Zsh hackers list On Tue, Jun 1, 2021 at 9:05 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > I should at least examine whether getasg() ought to be using > parse_subscript() even though the corresponding parse.c block is using > skipparens(). This got lumped into the "unset" thread via workers/48993 ... should I split it off and commit it separately? ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-05-31 21:37 ` [PATCH] (?) typeset array[position=index]=value Bart Schaefer 2021-06-01 5:32 ` Stephane Chazelas @ 2021-06-05 4:29 ` Mikael Magnusson 2021-06-05 5:49 ` Bart Schaefer 1 sibling, 1 reply; 58+ messages in thread From: Mikael Magnusson @ 2021-06-05 4:29 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list On 5/31/21, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Mon, May 31, 2021 at 11:18 AM Bart Schaefer > <schaefer@brasslantern.com> wrote: >> >> On Wed, May 5, 2021 at 4:45 AM Stephane Chazelas <stephane@chazelas.org> >> wrote: >> > [typeset] chokes on lvalue='array[n=1]' or >> > lvalue='assoc[keywith=characters]' >> >> Hmm, I wonder if that should be considered a bug. > > This copies (tweaked for context) the code from parse.c at line 2006 > or thereabouts. > > All tests still pass, but as you can see from the comment this is not > yet handling x+=y which there doesn't seem to be any reason for > typeset NOT to support; I think it would require only another flag in > struct asgment, but I haven't attempted that. > > Commentary? Is there some fundamental reason we couldn't just make the obvious thing work instead? % typeset -A foo % foo=(a b c d) % foo[a]=() # what actually happens zsh: foo: attempt to set slice of associative array # what seems reasonable to happen % typeset -p foo typeset -A foo=( [c]=d ) Relatedly, this also seems very inconsistent: % typeset -A foo % foo=(a b c d) % bar=(a b c d) % unset 'foo[a]' % unset 'bar[2]' % typeset -p foo bar typeset -A foo=( [c]=d ) typeset -a bar=( a '' c d ) So for regular arrays, unset will just set the element to the empty string, for assoc arrays it removes the key and the value. For regular arrays, assigning () to the element unsets(?) the element, and for assoc arrays it is an error. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-05 4:29 ` Mikael Magnusson @ 2021-06-05 5:49 ` Bart Schaefer 2021-06-05 11:06 ` Mikael Magnusson 2021-06-18 10:53 ` Mikael Magnusson 0 siblings, 2 replies; 58+ messages in thread From: Bart Schaefer @ 2021-06-05 5:49 UTC (permalink / raw) To: Mikael Magnusson; +Cc: Zsh hackers list On Fri, Jun 4, 2021 at 9:29 PM Mikael Magnusson <mikachu@gmail.com> wrote: > > On 5/31/21, Bart Schaefer <schaefer@brasslantern.com> wrote: > > All tests still pass, but as you can see from the comment this is not > > yet handling x+=y which there doesn't seem to be any reason for > > typeset NOT to support; I think it would require only another flag in > > struct asgment, but I haven't attempted that. > > > > Commentary? > > Is there some fundamental reason we couldn't just make the obvious > thing work instead? For one thing, because the above isn't about what happens to the value, it's about whether assignment can understand the key? > % typeset -A foo > % foo=(a b c d) > % foo[a]=() > # what actually happens > zsh: foo: attempt to set slice of associative array > # what seems reasonable to happen > % typeset -p foo > typeset -A foo=( [c]=d ) The problem is with the follow-up questions, namely ,what should happen with foo=( [a]=() ) foo[a]=( z ) foo[a]=( x y ) etc. > Relatedly, this also seems very inconsistent: [...] > So for regular arrays, unset will just set the element to the empty > string, for assoc arrays it removes the key and the value. That's because zsh doesn't support sparse arrays, and the NULL element indicates the end of the array, so you can't put a NULL element in the middle. If we'd thought about it long ago, it might be an error to unset a regular array element, but we're stuck now. > For regular arrays, assigning () to the element unsets(?) the element No. For regular arrays assigning an array to an element splices the rvalue array into the lvalue array. Splicing the empty array shortens the regular array, it doesn't cause elements to become unset (unless you consider that the former $#'th element is now unset because that position no longer exists). > and for assoc arrays it is an error. Because you can't perform a splice on a hash table. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-05 5:49 ` Bart Schaefer @ 2021-06-05 11:06 ` Mikael Magnusson 2021-06-05 16:22 ` Bart Schaefer 2021-06-18 10:53 ` Mikael Magnusson 1 sibling, 1 reply; 58+ messages in thread From: Mikael Magnusson @ 2021-06-05 11:06 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list On 6/5/21, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Fri, Jun 4, 2021 at 9:29 PM Mikael Magnusson <mikachu@gmail.com> wrote: >> >> On 5/31/21, Bart Schaefer <schaefer@brasslantern.com> wrote: >> > All tests still pass, but as you can see from the comment this is not >> > yet handling x+=y which there doesn't seem to be any reason for >> > typeset NOT to support; I think it would require only another flag in >> > struct asgment, but I haven't attempted that. >> > >> > Commentary? >> >> Is there some fundamental reason we couldn't just make the obvious >> thing work instead? > > For one thing, because the above isn't about what happens to the > value, it's about whether assignment can understand the key? Well, there was 4 different threads about saying "unset foo[$bar]" and I didn't know exactly where to reply so I went to the source, maybe I went one thread too far back though. >> % typeset -A foo >> % foo=(a b c d) >> % foo[a]=() >> # what actually happens >> zsh: foo: attempt to set slice of associative array >> # what seems reasonable to happen >> % typeset -p foo >> typeset -A foo=( [c]=d ) > > The problem is with the follow-up questions, namely ,what should happen > with > > foo=( [a]=() ) remove foo[a] > foo[a]=( z ) set foo[a] to z > foo[a]=( x y ) error > etc. there probably aren't more cases >> Relatedly, this also seems very inconsistent: > [...] >> So for regular arrays, unset will just set the element to the empty >> string, for assoc arrays it removes the key and the value. > > That's because zsh doesn't support sparse arrays, and the NULL element > indicates the end of the array, so you can't put a NULL element in the > middle. If we'd thought about it long ago, it might be an error to > unset a regular array element, but we're stuck now. > >> For regular arrays, assigning () to the element unsets(?) the element > > No. For regular arrays assigning an array to an element splices the > rvalue array into the lvalue array. Splicing the empty array > shortens the regular array, it doesn't cause elements to become unset > (unless you consider that the former $#'th element is now unset > because that position no longer exists). By this logic, unset "foo[bar]" should have had the same effect as foo[bar]="" (also too late). >> and for assoc arrays it is an error. > > Because you can't perform a splice on a hash table. Well, with my naive end user hat, this seems much more logical to remove an element than fiddling with the unset keyword, which doesn't even do the same on a normal array. If you come from knowing that bar[5]=() removes the 5th element, then it's reasonable to assume that foo[baz]=() would work too. Since we've established that both unset and assignment to an element behaves in no way the same already for regular arrays and associative arrays, is there any reason to not make assignment to associative array slices (with the empty slice) just remove the element? The parsing already works and it wouldn't break any existing cases. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-05 11:06 ` Mikael Magnusson @ 2021-06-05 16:22 ` Bart Schaefer 0 siblings, 0 replies; 58+ messages in thread From: Bart Schaefer @ 2021-06-05 16:22 UTC (permalink / raw) To: Mikael Magnusson; +Cc: Zsh hackers list On Sat, Jun 5, 2021 at 4:06 AM Mikael Magnusson <mikachu@gmail.com> wrote: > > On 6/5/21, Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > foo=( [a]=() ) > remove foo[a] That doesn't make sense. foo=( [a]=z ) replaces the entire hash table with a single element. foo=( [a]=() [b]=z ) would just be a very verbose way of saying foo=( b z ) ?? > > foo[a]=( z ) > set foo[a] to z > > foo[a]=( x y ) > error How is that any more consistent than what we have now? > > etc. > there probably aren't more cases foo[(R)*a*]=( ... ) If we can assign one element to foo[a] with parens, why can't we assign a list of elements? > Since we've established that both unset and assignment to an element > behaves in no way the same already for regular arrays and associative > arrays, is there any reason to not make assignment to associative > array slices (with the empty slice) just remove the element? The > parsing already works and it wouldn't break any existing cases. We can continue to have that discussion, but IMO that's orthogonal to whether subscripts are parsed correctly by unset. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] (?) typeset array[position=index]=value 2021-06-05 5:49 ` Bart Schaefer 2021-06-05 11:06 ` Mikael Magnusson @ 2021-06-18 10:53 ` Mikael Magnusson 1 sibling, 0 replies; 58+ messages in thread From: Mikael Magnusson @ 2021-06-18 10:53 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list On 6/5/21, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Fri, Jun 4, 2021 at 9:29 PM Mikael Magnusson <mikachu@gmail.com> wrote: >> >> Relatedly, this also seems very inconsistent: > [...] >> So for regular arrays, unset will just set the element to the empty >> string, for assoc arrays it removes the key and the value. > > That's because zsh doesn't support sparse arrays, and the NULL element > indicates the end of the array, so you can't put a NULL element in the > middle. If we'd thought about it long ago, it might be an error to > unset a regular array element, but we're stuck now. > >> For regular arrays, assigning () to the element unsets(?) the element > > No. For regular arrays assigning an array to an element splices the > rvalue array into the lvalue array. Splicing the empty array > shortens the regular array, it doesn't cause elements to become unset > (unless you consider that the former $#'th element is now unset > because that position no longer exists). I just happened to come across the following, and I'm struggling to fit the result into any sort of logic: % a=( {1..50} ) % unset 'a[5,45]' % typeset -p a typeset -a a=( 1 2 3 4 '' 46 47 48 49 50 ) So did we just remove the given elements, or set them to the empty string? Yes, we did both :). (note, this isn't intended as an argument for or against any previous points, i'm just mentioning it because it still seems inconsistent to me) -- Mikael Magnusson ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more). 2021-05-05 11:45 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2021-05-31 0:58 ` Lawrence Velázquez 2021-05-31 18:18 ` Bart Schaefer @ 2024-03-08 15:30 ` Stephane Chazelas 2024-03-09 8:41 ` [PATCH v5] " Stephane Chazelas 2024-03-09 13:03 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2 siblings, 2 replies; 58+ messages in thread From: Stephane Chazelas @ 2024-03-08 15:30 UTC (permalink / raw) To: Bart Schaefer, Zsh hackers list 2021-05-05 12:45:21 +0100, Stephane Chazelas: > 2021-04-30 16:13:34 -0700, Bart Schaefer: > [...] > > I went back and looked at the patch again. > > Thanks. Here's a third version with further improvements > addressing some of the comments here. [...] That v3 patch had (at least) a couple of bugs: - in ERE mode, replacement was not inserted properly when pattern matched an empty string not at the start of the subject (like in regexp-replace var '\>' new) - it would run in an infinite loop when there's no match in ERE mode. I see Bart ended up committing the v2 version of my patch (from 48747) a few months later in: commit bb61da36aaeeaa70413cdf5bc66d7a71194f93e5 Author: Stephane Chazelas <stephane.chazelas@gmail.com> AuthorDate: Mon Sep 6 14:43:01 2021 -0700 Commit: Bart Schaefer <schaefer@ipost.com> CommitDate: Mon Sep 6 14:43:01 2021 -0700 45180: clarify doc for POSIX EREs, fix an issue with PCRE when the replacement was empty or generated more than one element That one didn't have the second problem but had the first and also failed to add the replacement in: regexp-replace var '^' replacement for instance when $var is initially empty. So here's a v4 that should address that, some of the objections to v2 and uses namerefs to replace that illegible usage of positional parameters for local variables (that's diff against current HEAD, not pre-v2). I went for the: typeset -g -- "$1" typeset -nu -- var=$1 suggested by Bart to avoid possible clashes with local variable names. That might have side effects if called as regexp-replace 'a[2]' re? diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index d4408f0f7..0e3deed4f 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -1,91 +1,95 @@ -# Replace all occurrences of a regular expression in a variable. The -# variable is modified directly. Respects the setting of the -# option RE_MATCH_PCRE. +# Replace all occurrences of a regular expression in a scalar variable. +# The variable is modified directly. Respects the setting of the option +# RE_MATCH_PCRE, but otherwise sets the zsh emulation mode. # -# First argument: *name* (not contents) of variable. -# Second argument: regular expression -# Third argument: replacement string. This can contain all forms of -# $ and backtick substitutions; in particular, $MATCH will be replaced -# by the portion of the string matched by the regular expression. - -# we use positional parameters instead of variables to avoid -# clashing with the user's variable. Make sure we start with 3 and only -# 3 elements: -argv=("$1" "$2" "$3") - -# $4 records whether pcre is enabled as that information would otherwise -# be lost after emulate -L zsh -4=0 -[[ -o re_match_pcre ]] && 4=1 +# Arguments: +# +# 1. *name* (not contents) of variable or more generally any lvalue; +# expected to be scalar. +# +# 2. regular expression +# +# 3. replacement string. This can contain all forms of +# $ and backtick substitutions; in particular, $MATCH will be +# replaced by the portion of the string matched by the regular +# expression. Parsing errors are fatal to the shell process. + +if (( $# < 2 || $# > 3 )); then + setopt localoptions functionargzero + print -ru2 "Usage: $0 <varname> <regexp> [<replacement>]" + return 2 +fi -emulate -L zsh +# ensure variable exists in the caller's scope before referencing it +# to make sure we don't end up referencing one of our own. +typeset -g -- "$1" || return 2 +typeset -nu -- var=$1 || return 2 +local -i use_pcre=0 +[[ -o re_match_pcre ]] && use_pcre=1 -local MATCH MBEGIN MEND +emulate -L zsh + +local regexp=$2 replacement=$3 result MATCH MBEGIN MEND local -a match mbegin mend -if (( $4 )); then +if (( use_pcre )); then # if using pcre, we're using pcre_match and a running offset # That's needed for ^, \A, \b, and look-behind operators to work # properly. zmodload zsh/pcre || return 2 - pcre_compile -- "$2" && pcre_study || return 2 + pcre_compile -- "$regexp" && pcre_study || return 2 + + local -i offset=0 start stop + local new ZPCRE_OP + local -a finds - # $4 is the current *byte* offset, $5, $6 reserved for later use - 4=0 6= + while pcre_match -b -n $offset -- "$var"; do + # we need to perform the evaluation in a scalar assignment so that + # if it generates an array, the elements are converted to string (by + # joining with the first chararacter of $IFS as usual) + new=${(Xe)replacement} - local ZPCRE_OP - while pcre_match -b -n $4 -- "${(P)1}"; do - # append offsets and computed replacement to the array - # we need to perform the evaluation in a scalar assignment so that if - # it generates an array, the elements are converted to string (by - # joining with the first character of $IFS as usual) - 5=${(e)3} - argv+=(${(s: :)ZPCRE_OP} "$5") + finds+=( ${(s[ ])ZPCRE_OP} "$new" ) # for 0-width matches, increase offset by 1 to avoid # infinite loop - 4=$((argv[-2] + (argv[-3] == argv[-2]))) + (( offset = finds[-2] + (finds[-3] == finds[-2]) )) done - (($# > 6)) || return # no match + (( $#finds )) || return # no match - set +o multibyte + unsetopt multibyte - # $5 contains the result, $6 the current offset - 5= 6=1 - for 2 3 4 in "$@[7,-1]"; do - 5+=${(P)1[$6,$2]}$4 - 6=$(($3 + 1)) + offset=1 + for start stop new in "$finds[@]"; do + result+=${var[offset,start]}$new + (( offset = stop + 1 )) done - 5+=${(P)1[$6,-1]} -else + result+=${var[offset,-1]} + +else # no PCRE + # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where # available) won't work properly. - - # $4 is the string to be matched - 4=${(P)1} - - while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and substituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - if ((MEND < MBEGIN)); then - # zero-width match, skip one character for the next match - ((MEND++)) - 5+=${4[1]} - fi - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break + local subject=$var + local -i ok + while [[ $subject =~ $regexp ]]; do + # append initial part and substituted match + result+=$subject[1,MBEGIN-1]${(Xe)replacement} + # truncate remaining string + if (( MEND < MBEGIN )); then + # zero-width match, skip one character for the next match + (( MEND++ )) + result+=$subject[MBEGIN] fi + subject=$subject[MEND+1,-1] + ok=1 + [[ -n $subject ]] && break done - [[ -n $6 ]] || return # no match - 5+=$4 + (( ok )) || return + result+=$subject fi -eval $1=\$5 +var=$result ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v5] regexp-replace and ^, word boundary or look-behind operators (and more). 2024-03-08 15:30 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas @ 2024-03-09 8:41 ` Stephane Chazelas 2024-03-09 9:21 ` MBEGIN when =~ finds bytes inside characters (Was: [PATCH v5] regexp-replace and ^, word boundary or look-behind operators (and more).) Stephane Chazelas 2024-03-09 13:03 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 1 sibling, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2024-03-09 8:41 UTC (permalink / raw) To: Bart Schaefer, Zsh hackers list 2024-03-08 15:30:50 +0000, Stephane Chazelas: [...] > - it would run in an infinite loop when there's no match in ERE > mode. [...] > + [[ -n $subject ]] && break [...] D'oh, should be -z instead of -n or || instead of &&. So, here's a v5 patch (previous one should have been a v4, I forgot to update the subject): diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index d4408f0f7..db8f63404 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -1,91 +1,95 @@ -# Replace all occurrences of a regular expression in a variable. The -# variable is modified directly. Respects the setting of the -# option RE_MATCH_PCRE. +# Replace all occurrences of a regular expression in a scalar variable. +# The variable is modified directly. Respects the setting of the option +# RE_MATCH_PCRE, but otherwise sets the zsh emulation mode. # -# First argument: *name* (not contents) of variable. -# Second argument: regular expression -# Third argument: replacement string. This can contain all forms of -# $ and backtick substitutions; in particular, $MATCH will be replaced -# by the portion of the string matched by the regular expression. - -# we use positional parameters instead of variables to avoid -# clashing with the user's variable. Make sure we start with 3 and only -# 3 elements: -argv=("$1" "$2" "$3") - -# $4 records whether pcre is enabled as that information would otherwise -# be lost after emulate -L zsh -4=0 -[[ -o re_match_pcre ]] && 4=1 +# Arguments: +# +# 1. *name* (not contents) of variable or more generally any lvalue; +# expected to be scalar. +# +# 2. regular expression +# +# 3. replacement string. This can contain all forms of +# $ and backtick substitutions; in particular, $MATCH will be +# replaced by the portion of the string matched by the regular +# expression. Parsing errors are fatal to the shell process. + +if (( $# < 2 || $# > 3 )); then + setopt localoptions functionargzero + print -ru2 "Usage: $0 <varname> <regexp> [<replacement>]" + return 2 +fi -emulate -L zsh +# ensure variable exists in the caller's scope before referencing it +# to make sure we don't end up referencing one of our own. +typeset -g -- "$1" || return 2 +typeset -nu -- var=$1 || return 2 +local -i use_pcre=0 +[[ -o re_match_pcre ]] && use_pcre=1 -local MATCH MBEGIN MEND +emulate -L zsh + +local regexp=$2 replacement=$3 result MATCH MBEGIN MEND local -a match mbegin mend -if (( $4 )); then +if (( use_pcre )); then # if using pcre, we're using pcre_match and a running offset # That's needed for ^, \A, \b, and look-behind operators to work # properly. zmodload zsh/pcre || return 2 - pcre_compile -- "$2" && pcre_study || return 2 + pcre_compile -- "$regexp" && pcre_study || return 2 + + local -i offset=0 start stop + local new ZPCRE_OP + local -a finds - # $4 is the current *byte* offset, $5, $6 reserved for later use - 4=0 6= + while pcre_match -b -n $offset -- "$var"; do + # we need to perform the evaluation in a scalar assignment so that + # if it generates an array, the elements are converted to string (by + # joining with the first chararacter of $IFS as usual) + new=${(Xe)replacement} - local ZPCRE_OP - while pcre_match -b -n $4 -- "${(P)1}"; do - # append offsets and computed replacement to the array - # we need to perform the evaluation in a scalar assignment so that if - # it generates an array, the elements are converted to string (by - # joining with the first character of $IFS as usual) - 5=${(e)3} - argv+=(${(s: :)ZPCRE_OP} "$5") + finds+=( ${(s[ ])ZPCRE_OP} "$new" ) # for 0-width matches, increase offset by 1 to avoid # infinite loop - 4=$((argv[-2] + (argv[-3] == argv[-2]))) + (( offset = finds[-2] + (finds[-3] == finds[-2]) )) done - (($# > 6)) || return # no match + (( $#finds )) || return # no match - set +o multibyte + unsetopt multibyte - # $5 contains the result, $6 the current offset - 5= 6=1 - for 2 3 4 in "$@[7,-1]"; do - 5+=${(P)1[$6,$2]}$4 - 6=$(($3 + 1)) + offset=1 + for start stop new in "$finds[@]"; do + result+=${var[offset,start]}$new + (( offset = stop + 1 )) done - 5+=${(P)1[$6,-1]} -else + result+=${var[offset,-1]} + +else # no PCRE + # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where # available) won't work properly. - - # $4 is the string to be matched - 4=${(P)1} - - while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and substituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - if ((MEND < MBEGIN)); then - # zero-width match, skip one character for the next match - ((MEND++)) - 5+=${4[1]} - fi - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break + local subject=$var + local -i ok + while [[ $subject =~ $regexp ]]; do + # append initial part and substituted match + result+=$subject[1,MBEGIN-1]${(Xe)replacement} + # truncate remaining string + if (( MEND < MBEGIN )); then + # zero-width match, skip one character for the next match + (( MEND++ )) + result+=$subject[MBEGIN] fi + subject=$subject[MEND+1,-1] + ok=1 + [[ -z $subject ]] && break done - [[ -n $6 ]] || return # no match - 5+=$4 + (( ok )) || return + result+=$subject fi -eval $1=\$5 +var=$result ^ permalink raw reply [flat|nested] 58+ messages in thread
* MBEGIN when =~ finds bytes inside characters (Was: [PATCH v5] regexp-replace and ^, word boundary or look-behind operators (and more).) 2024-03-09 8:41 ` [PATCH v5] " Stephane Chazelas @ 2024-03-09 9:21 ` Stephane Chazelas 0 siblings, 0 replies; 58+ messages in thread From: Stephane Chazelas @ 2024-03-09 9:21 UTC (permalink / raw) To: Bart Schaefer, Zsh hackers list 2024-03-09 08:41:58 +0000, Stephane Chazelas: [...] > + while [[ $subject =~ $regexp ]]; do > + # append initial part and substituted match > + result+=$subject[1,MBEGIN-1]${(Xe)replacement} [...] BTW, likely not zsh's fault but here on Ubuntu 22.04 With: $ a=$'ABC/\U0010fffe/DEF' $ print -r - ${(q)a} ABC/$'\364\217\277\276'/DEF So with a string containing a 4-byte multibyte character. $ regexp-replace a $'\276' $'\277' $ print -r - ${(q)a} ABC/$'\364\217\277\276'/D$'\277'F See $'\277' not replacing $'\276' but E instead. It's my bad as a user to be doing that with multibyte enabled in a locale with a multibyte charset. $ a=$'ABC/\U0010fffe/DEF' $ set +o multibyte $ regexp-replace a $'\276' $'\277' $ print -r - ${(q+)a} $'ABC/\U0010ffff/DEF' $ set -o multibyte $ print -r - ${(q)a} ABC/$'\364\217\277\277'/DEF Is OK The problem here is: $ [[ $a =~ $'\276' ]] $ echo $MBEGIN $MEND 8 8 $ [[ $a =~ D ]] $ echo $MBEGIN $MEND 7 7 And could very well be caused by a bug in my regex library, maybe a variation of https://sourceware.org/bugzilla/show_bug.cgi?id=31075 for regex. If the problem is in the system's regexps, I can't think of anything zsh could do about it except maybe checking that subject and regexp decode as text properly, and error out if not like it does in pcre mode. zsh pattern matching seems to be handling it better. $ [[ $a = (#b)*($'\276')* ]] && echo match; typeset mbegin mend mbegin=( -1 -1 ) mend=( -1 -1 ) $ [[ $a = (#b)*(D)* ]] && echo match; typeset mbegin mend match mbegin=( 7 ) mend=( 7 ) I wonder if PCRE2_MATCH_INVALID_UTF/PCRE2_NO_UTF_CHECK could be used to improve matching with invalid UTF-8 for the pcre mode, at least for the pcre builtins where offsets are byte-wide rather than character-wise. -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more). 2024-03-08 15:30 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2024-03-09 8:41 ` [PATCH v5] " Stephane Chazelas @ 2024-03-09 13:03 ` Stephane Chazelas 2024-03-10 19:52 ` [PATCH v6] " Stephane Chazelas 1 sibling, 1 reply; 58+ messages in thread From: Stephane Chazelas @ 2024-03-09 13:03 UTC (permalink / raw) To: Bart Schaefer, Zsh hackers list 2024-03-08 15:30:50 +0000, Stephane Chazelas: [...] > So here's a v4 that should address that, some of the objections > to v2 and uses namerefs to replace that illegible usage > of positional parameters for local variables [...] Scratch that, we do have to use positional parameters or namespacing as expansions are performed in the replacement so the users could do: regexp-replace var $regexp '$start[$#MATCH]$regexp' For instance, and expect the $start / $regexp in the replacement to be *their* variable, not the local variables of regexp-replace. I'll send a v6 likely using namespaced variables rather than going back to using positional parameters, once I understand the point of using .regexp_replace.myvar over _regexp_replace_myvar (TBH, ATM I don't). -- Stephane ^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH v6] regexp-replace and ^, word boundary or look-behind operators (and more). 2024-03-09 13:03 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas @ 2024-03-10 19:52 ` Stephane Chazelas 0 siblings, 0 replies; 58+ messages in thread From: Stephane Chazelas @ 2024-03-10 19:52 UTC (permalink / raw) To: Zsh hackers list 2024-03-09 13:03:10 +0000, Stephane Chazelas: [...] > I'll send a v6 likely using namespaced variables rather than > going back to using positional parameters, once I understand the > point of using .regexp_replace.myvar over _regexp_replace_myvar [...] So here it is. I ended up using none of the new features (nameref and namespace) as they were not overly useful in this instance and that means the code can be used as-is in older versions. I'm using $_regexp_replace_localvarname for namespacing. I compared performance with ${.regexp_replace.localvarname} and they were similar (the latter about 2-3% slower in my limited tests). diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index d4408f0f7..630a5ceab 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -1,91 +1,99 @@ -# Replace all occurrences of a regular expression in a variable. The -# variable is modified directly. Respects the setting of the -# option RE_MATCH_PCRE. +# Replace all occurrences of a regular expression in a scalar variable. +# The variable is modified directly. Respects the setting of the option +# RE_MATCH_PCRE, but otherwise sets the zsh emulation mode. # -# First argument: *name* (not contents) of variable. -# Second argument: regular expression -# Third argument: replacement string. This can contain all forms of -# $ and backtick substitutions; in particular, $MATCH will be replaced -# by the portion of the string matched by the regular expression. - -# we use positional parameters instead of variables to avoid -# clashing with the user's variable. Make sure we start with 3 and only -# 3 elements: -argv=("$1" "$2" "$3") - -# $4 records whether pcre is enabled as that information would otherwise -# be lost after emulate -L zsh -4=0 -[[ -o re_match_pcre ]] && 4=1 +# Arguments: +# +# 1. *name* (not contents) of variable or more generally any lvalue; +# expected to be scalar. +# +# 2. regular expression +# +# 3. replacement string. This can contain all forms of +# $ and backtick substitutions; in particular, $MATCH will be +# replaced by the portion of the string matched by the regular +# expression. Parsing errors are fatal to the shell process. + +if (( $# < 2 || $# > 3 )); then + setopt localoptions functionargzero + print -ru2 "Usage: $0 <varname> <regexp> [<replacement>]" + return 2 +fi + +local _regexp_replace_use_pcre=0 +[[ -o re_match_pcre ]] && _regexp_replace_use_pcre=1 emulate -L zsh +local _regexp_replace_subject=${(P)1} \ + _regexp_replace_regexp=$2 \ + _regexp_replace_replacement=$3 \ + _regexp_replace_result \ + MATCH MBEGIN MEND -local MATCH MBEGIN MEND local -a match mbegin mend -if (( $4 )); then +if (( _regexp_replace_use_pcre )); then # if using pcre, we're using pcre_match and a running offset # That's needed for ^, \A, \b, and look-behind operators to work # properly. zmodload zsh/pcre || return 2 - pcre_compile -- "$2" && pcre_study || return 2 + pcre_compile -- "$_regexp_replace_regexp" && pcre_study || return 2 + + local _regexp_replace_offset=0 _regexp_replace_start _regexp_replace_stop _regexp_replace_new ZPCRE_OP + local -a _regexp_replace_finds - # $4 is the current *byte* offset, $5, $6 reserved for later use - 4=0 6= + while pcre_match -b -n $_regexp_replace_offset -- "$_regexp_replace_subject"; do + # we need to perform the evaluation in a scalar assignment so that + # if it generates an array, the elements are converted to string (by + # joining with the first chararacter of $IFS as usual) + _regexp_replace_new=${(Xe)_regexp_replace_replacement} - local ZPCRE_OP - while pcre_match -b -n $4 -- "${(P)1}"; do - # append offsets and computed replacement to the array - # we need to perform the evaluation in a scalar assignment so that if - # it generates an array, the elements are converted to string (by - # joining with the first character of $IFS as usual) - 5=${(e)3} - argv+=(${(s: :)ZPCRE_OP} "$5") + _regexp_replace_finds+=( ${(s[ ])ZPCRE_OP} "$_regexp_replace_new" ) # for 0-width matches, increase offset by 1 to avoid # infinite loop - 4=$((argv[-2] + (argv[-3] == argv[-2]))) + (( _regexp_replace_offset = _regexp_replace_finds[-2] + (_regexp_replace_finds[-3] == _regexp_replace_finds[-2]) )) done - (($# > 6)) || return # no match + (( $#_regexp_replace_finds )) || return # no match - set +o multibyte + unsetopt multibyte - # $5 contains the result, $6 the current offset - 5= 6=1 - for 2 3 4 in "$@[7,-1]"; do - 5+=${(P)1[$6,$2]}$4 - 6=$(($3 + 1)) + _regexp_replace_offset=1 + for _regexp_replace_start _regexp_replace_stop _regexp_replace_new in "$_regexp_replace_finds[@]"; do + _regexp_replace_result+=${_regexp_replace_subject[_regexp_replace_offset,_regexp_replace_start]}$_regexp_replace_new + (( _regexp_replace_offset = _regexp_replace_stop + 1 )) done - 5+=${(P)1[$6,-1]} -else + _regexp_replace_result+=${_regexp_replace_subject[_regexp_replace_offset,-1]} + +else # no PCRE # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where # available) won't work properly. - # $4 is the string to be matched - 4=${(P)1} - - while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and substituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - if ((MEND < MBEGIN)); then - # zero-width match, skip one character for the next match - ((MEND++)) - 5+=${4[1]} - fi - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break + local _regexp_replace_ok=0 + while [[ $_regexp_replace_subject =~ $_regexp_replace_regexp ]]; do + # append initial part and substituted match + _regexp_replace_result+=$_regexp_replace_subject[1,MBEGIN-1]${(Xe)_regexp_replace_replacement} + # truncate remaining string + if (( MEND < MBEGIN )); then + # zero-width match, skip one character for the next match + (( MEND++ )) + _regexp_replace_result+=$_regexp_replace_subject[MBEGIN] fi + _regexp_replace_subject=$_regexp_replace_subject[MEND+1,-1] + _regexp_replace_ok=1 + [[ -z $_regexp_replace_subject ]] && break done - [[ -n $6 ]] || return # no match - 5+=$4 + (( _regexp_replace_ok )) || return + _regexp_replace_result+=$_regexp_replace_subject fi -eval $1=\$5 +# assign result to target variable if at least one substitution was +# made. At this point, if the variable was originally array or assoc, it +# is converted to scalar. If $1 doesn't contain a valid lvalue +# specification, an exception is raised (exits the shell process if +# non-interactive). +: ${(P)1::="$_regexp_replace_result"} + ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2024-03-10 19:52 UTC | newest] Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-16 21:10 regexp-replace and ^, word boundary or look-behind operators Stephane Chazelas 2019-12-16 21:27 ` Stephane Chazelas 2019-12-17 7:38 ` Stephane Chazelas 2019-12-17 11:11 ` [PATCH] " Stephane Chazelas 2019-12-18 0:22 ` Daniel Shahaf 2019-12-18 8:31 ` Stephane Chazelas 2020-01-01 14:03 ` [PATCH v2] " Stephane Chazelas 2021-04-30 6:11 ` Stephane Chazelas 2021-04-30 23:13 ` Bart Schaefer 2021-05-05 11:45 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2021-05-31 0:58 ` Lawrence Velázquez 2021-05-31 18:18 ` Bart Schaefer 2021-05-31 21:37 ` [PATCH] (?) typeset array[position=index]=value Bart Schaefer 2021-06-01 5:32 ` Stephane Chazelas 2021-06-01 16:05 ` Bart Schaefer 2021-06-02 2:51 ` [PATCH] (take two?) typeset array[position=index]=value / unset hash[$stuff] Bart Schaefer 2021-06-02 10:06 ` Stephane Chazelas 2021-06-02 14:52 ` Bart Schaefer 2021-06-02 16:02 ` Stephane Chazelas 2021-06-02 9:11 ` [PATCH] (?) typeset array[position=index]=value Stephane Chazelas 2021-06-02 13:34 ` Daniel Shahaf 2021-06-02 14:20 ` Stephane Chazelas 2021-06-02 15:59 ` Bart Schaefer 2021-06-03 2:04 ` [PATCH (not final)] (take three?) unset "array[$anything]" Bart Schaefer 2021-06-03 2:42 ` Bart Schaefer 2021-06-03 6:12 ` Bart Schaefer 2021-06-03 8:54 ` Peter Stephenson 2021-06-03 13:13 ` Stephane Chazelas 2021-06-03 14:41 ` Peter Stephenson 2021-06-04 19:25 ` Bart Schaefer 2021-06-05 18:18 ` Peter Stephenson 2021-06-09 23:31 ` Bart Schaefer 2021-06-13 16:51 ` Peter Stephenson 2021-06-13 18:04 ` Bart Schaefer 2021-06-13 19:48 ` Peter Stephenson 2021-06-13 21:44 ` Bart Schaefer 2021-06-14 7:19 ` Stephane Chazelas 2021-06-03 18:12 ` Bart Schaefer 2021-06-04 8:02 ` Stephane Chazelas 2021-06-04 18:36 ` Bart Schaefer 2021-06-04 20:21 ` Stephane Chazelas 2021-06-05 0:20 ` Bart Schaefer 2021-06-05 17:05 ` Stephane Chazelas 2021-06-10 0:14 ` Square brackets in command position Bart Schaefer 2021-06-03 6:05 ` [PATCH (not final)] (take three?) unset "array[$anything]" Stephane Chazelas 2021-06-03 6:43 ` Bart Schaefer 2021-06-03 7:31 ` Stephane Chazelas 2021-06-10 0:21 ` [PATCH] (?) typeset array[position=index]=value Bart Schaefer 2021-06-05 4:29 ` Mikael Magnusson 2021-06-05 5:49 ` Bart Schaefer 2021-06-05 11:06 ` Mikael Magnusson 2021-06-05 16:22 ` Bart Schaefer 2021-06-18 10:53 ` Mikael Magnusson 2024-03-08 15:30 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2024-03-09 8:41 ` [PATCH v5] " Stephane Chazelas 2024-03-09 9:21 ` MBEGIN when =~ finds bytes inside characters (Was: [PATCH v5] regexp-replace and ^, word boundary or look-behind operators (and more).) Stephane Chazelas 2024-03-09 13:03 ` [PATCH v3] regexp-replace and ^, word boundary or look-behind operators (and more) Stephane Chazelas 2024-03-10 19:52 ` [PATCH v6] " Stephane Chazelas
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).