From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from zero.zsh.org (zero.zsh.org [IPv6:2a02:898:31:0:48:4558:7a:7368]) by inbox.vuxu.org (Postfix) with ESMTP id 9792D24D1A for ; Sat, 9 Mar 2024 09:42:36 +0100 (CET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1709973756; b=F8KxQoTXyLzJT81KOzgOhgHZycs0jVy4I1JF+HjxHwHIxU3Tq6RbZ7LdryBJmvHXuuQbp3w7Fz DYbJ9C6isJ0/jTgZ+O9XCy/Z1CVKV3YJ+yNBv8AIckB824YvgfDndg2S8d+pP7cCJ630wyY8io DRCsvCM3lXGFrq/Xen7oVcqX6nuA4HuzS+FsY+6SspR0ICo2LpFKuuompPWnvA5ismqIHgJwsI aHZwS/ZKqIiy8dHuNFcJFiR9AX2WaCZxSNuQR/7mhPcJHDKfKHPrec5Dq7UThKEC9b0562oYFD Dbh6NLKU4XSTLqhqrtpm+66UMNhcVCpEpl4/dBwvhQsw6g==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (relay9-d.mail.gandi.net) smtp.remote-ip=217.70.183.199; dmarc=none header.from=chazelas.org; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1709973756; bh=lQe4hf41mYEB53biRdCAWnWyFx/8T2Sbbri2tJYUNfc=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:To:From:Date:DKIM-Signature; b=VhdR24axpEKe+Ob49KBxpAwRrCSBCn6pd8mf2zLHXR7kRLjfuJYPh2Kn9iG5jVBc0ps61aHFEz BHIt4MwEdmXjX2n4TizvTFhHf52TFDIXsjLVHfbNliwlA32ftZhV0wUdkH9gE6R5FfbavkKK9G HB5fkKis3Lr+i8a4IsmAqp7RID1+cxgyrFLsde4/ciXb4JQ3SQDb7vwudI7SDatMytaNakZXxk ZXWUbkbvCACVlC/dFlPcOwIXLRPuH7M29jkgByI0Ec+wh576AqlQIMvLKRd/4QDoA0WasF+9Qa hsyc6wiqRqcLT17LfQk3YjMnOrnAfPE3yd/JK91WICAuCg==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:To:From:Date:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=VgWe+xgGqy9ErGTF7Y73NawAfWMebwkZyD2vAG0ZmDI=; b=NtcF8HLb+GT9BU34PB15+FJtbS +u/cjLfhZoajakG0bBq7adbsD3HivmV4cZ2KzdPsh0MKeVUGuXwFp3zf8igpaCbgLyJ9LN6jFwtP+ RYiSmfbqRqsKNDf3F2TQ6R3Kwd6lrvqaNLYG8p1h+c/YwXie9CYiQFoeG2quH0c+nsfxyWhd/ej7G FV/aiKlVB0SuINzvlrZipUBEhrMtEsoc73ez0kvqnsEGC97D4ACZn2Xr6Wcellegx6u8n1LspwrN3 WgZ75unBJtwe/+8YpaX6I6ogvbAQo9czHNrxcfYE2N0TcHDDunZcxjq9suJkrEAbcBoIXTxR99dDP Njk0yMJA==; Received: by zero.zsh.org with local id 1risHq-000DzA-Eg; Sat, 09 Mar 2024 08:42:34 +0000 Authentication-Results: zsh.org; iprev=pass (relay9-d.mail.gandi.net) smtp.remote-ip=217.70.183.199; dmarc=none header.from=chazelas.org; arc=none Received: from relay9-d.mail.gandi.net ([217.70.183.199]:54937) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1risHH-000Ddr-QF; Sat, 09 Mar 2024 08:42:00 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id B9554FF803; Sat, 9 Mar 2024 08:41:58 +0000 (UTC) Date: Sat, 9 Mar 2024 08:41:58 +0000 From: Stephane Chazelas To: Bart Schaefer , Zsh hackers list Subject: [PATCH v5] regexp-replace and ^, word boundary or look-behind operators (and more). Message-ID: <20240309084158.jiyx2is3tbrwyzia@chazelas.org> Mail-Followup-To: Bart Schaefer , Zsh hackers list References: <20191216211013.6opkv5sy4wvp3yn2@chaz.gmail.com> <20191216212706.i3xvf6hn5h3jwkjh@chaz.gmail.com> <20191217073846.4usg2hnsk66bhqvl@chaz.gmail.com> <20191217111113.z242f4g6sx7xdwru@chaz.gmail.com> <2ea6feb3-a686-4d83-ab27-6a582424487c@www.fastmail.com> <20200101140343.qwfx2xaojumuds3d@chaz.gmail.com> <20210430061117.buyhdhky5crqjrf2@chazelas.org> <20210505114521.bemoiekpophssbug@chazelas.org> <20240308153050.u63fqtcjyr2yewye@chazelas.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240308153050.u63fqtcjyr2yewye@chazelas.org> X-GND-Sasl: stephane@chazelas.org X-Seq: 52718 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: 2024-03-08 15:30:50 +0000, Stephane Chazelas: [...] > - it would run in an infinite loop when there's no match in ERE > mode. [...] > + [[ -n $subject ]] && break [...] D'oh, should be -z instead of -n or || instead of &&. So, here's a v5 patch (previous one should have been a v4, I forgot to update the subject): diff --git a/Functions/Misc/regexp-replace b/Functions/Misc/regexp-replace index d4408f0f7..db8f63404 100644 --- a/Functions/Misc/regexp-replace +++ b/Functions/Misc/regexp-replace @@ -1,91 +1,95 @@ -# Replace all occurrences of a regular expression in a variable. The -# variable is modified directly. Respects the setting of the -# option RE_MATCH_PCRE. +# Replace all occurrences of a regular expression in a scalar variable. +# The variable is modified directly. Respects the setting of the option +# RE_MATCH_PCRE, but otherwise sets the zsh emulation mode. # -# First argument: *name* (not contents) of variable. -# Second argument: regular expression -# Third argument: replacement string. This can contain all forms of -# $ and backtick substitutions; in particular, $MATCH will be replaced -# by the portion of the string matched by the regular expression. - -# we use positional parameters instead of variables to avoid -# clashing with the user's variable. Make sure we start with 3 and only -# 3 elements: -argv=("$1" "$2" "$3") - -# $4 records whether pcre is enabled as that information would otherwise -# be lost after emulate -L zsh -4=0 -[[ -o re_match_pcre ]] && 4=1 +# Arguments: +# +# 1. *name* (not contents) of variable or more generally any lvalue; +# expected to be scalar. +# +# 2. regular expression +# +# 3. replacement string. This can contain all forms of +# $ and backtick substitutions; in particular, $MATCH will be +# replaced by the portion of the string matched by the regular +# expression. Parsing errors are fatal to the shell process. + +if (( $# < 2 || $# > 3 )); then + setopt localoptions functionargzero + print -ru2 "Usage: $0 []" + return 2 +fi -emulate -L zsh +# ensure variable exists in the caller's scope before referencing it +# to make sure we don't end up referencing one of our own. +typeset -g -- "$1" || return 2 +typeset -nu -- var=$1 || return 2 +local -i use_pcre=0 +[[ -o re_match_pcre ]] && use_pcre=1 -local MATCH MBEGIN MEND +emulate -L zsh + +local regexp=$2 replacement=$3 result MATCH MBEGIN MEND local -a match mbegin mend -if (( $4 )); then +if (( use_pcre )); then # if using pcre, we're using pcre_match and a running offset # That's needed for ^, \A, \b, and look-behind operators to work # properly. zmodload zsh/pcre || return 2 - pcre_compile -- "$2" && pcre_study || return 2 + pcre_compile -- "$regexp" && pcre_study || return 2 + + local -i offset=0 start stop + local new ZPCRE_OP + local -a finds - # $4 is the current *byte* offset, $5, $6 reserved for later use - 4=0 6= + while pcre_match -b -n $offset -- "$var"; do + # we need to perform the evaluation in a scalar assignment so that + # if it generates an array, the elements are converted to string (by + # joining with the first chararacter of $IFS as usual) + new=${(Xe)replacement} - local ZPCRE_OP - while pcre_match -b -n $4 -- "${(P)1}"; do - # append offsets and computed replacement to the array - # we need to perform the evaluation in a scalar assignment so that if - # it generates an array, the elements are converted to string (by - # joining with the first character of $IFS as usual) - 5=${(e)3} - argv+=(${(s: :)ZPCRE_OP} "$5") + finds+=( ${(s[ ])ZPCRE_OP} "$new" ) # for 0-width matches, increase offset by 1 to avoid # infinite loop - 4=$((argv[-2] + (argv[-3] == argv[-2]))) + (( offset = finds[-2] + (finds[-3] == finds[-2]) )) done - (($# > 6)) || return # no match + (( $#finds )) || return # no match - set +o multibyte + unsetopt multibyte - # $5 contains the result, $6 the current offset - 5= 6=1 - for 2 3 4 in "$@[7,-1]"; do - 5+=${(P)1[$6,$2]}$4 - 6=$(($3 + 1)) + offset=1 + for start stop new in "$finds[@]"; do + result+=${var[offset,start]}$new + (( offset = stop + 1 )) done - 5+=${(P)1[$6,-1]} -else + result+=${var[offset,-1]} + +else # no PCRE + # in ERE, we can't use an offset so ^, (and \<, \b, \B, [[:<:]] where # available) won't work properly. - - # $4 is the string to be matched - 4=${(P)1} - - while [[ -n $4 ]]; do - if [[ $4 =~ $2 ]]; then - # append initial part and substituted match - 5+=${4[1,MBEGIN-1]}${(e)3} - # truncate remaining string - if ((MEND < MBEGIN)); then - # zero-width match, skip one character for the next match - ((MEND++)) - 5+=${4[1]} - fi - 4=${4[MEND+1,-1]} - # indicate we did something - 6=1 - else - break + local subject=$var + local -i ok + while [[ $subject =~ $regexp ]]; do + # append initial part and substituted match + result+=$subject[1,MBEGIN-1]${(Xe)replacement} + # truncate remaining string + if (( MEND < MBEGIN )); then + # zero-width match, skip one character for the next match + (( MEND++ )) + result+=$subject[MBEGIN] fi + subject=$subject[MEND+1,-1] + ok=1 + [[ -z $subject ]] && break done - [[ -n $6 ]] || return # no match - 5+=$4 + (( ok )) || return + result+=$subject fi -eval $1=\$5 +var=$result