From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17024 invoked by alias); 7 Sep 2014 05:39:11 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 33121 Received: (qmail 21494 invoked from network); 7 Sep 2014 05:39:09 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=iyVLVveFbnLxxCprD06h5bMB9vsZHc4mFZ+/aPSfIj8=; b=XLJ9zD3MuiIf7QmCj6+DN//z3j8eGuUvZ51XL50prD3Hg/Imv1tf/5xdzLsUUuGIXk n4GW4M8VSx7vG8YjoWcf8RYAvHRXjlItTtHWAT07Bhgzo1HxDHeGgaZ/PT0hQpQ2A/41 PcHA12Mch6+FGF644AlivnkOpWrmth9UZBaQF85efzdds2vgk/em2aZ0A3gSPY38CaOi FvEqfVoESBd+Eim/XGO2NL/nIDQX5gLpGLRrOsryQsW5vOVdlsNvMpLfkVfxr2ijq1vD QNjr4aakWIdgetL/9PQ1irrw74gVM7Bsceg37FbWl8YN0GLNuVe5htNri8BY/XDmw1AV S1Hw== MIME-Version: 1.0 X-Received: by 10.112.184.161 with SMTP id ev1mr659005lbc.82.1410068345649; Sat, 06 Sep 2014 22:39:05 -0700 (PDT) Date: Sun, 7 Sep 2014 01:39:05 -0400 Message-ID: Subject: pcre_match() option "-n" broken under zsh 5.0.6 From: Sess To: zsh-workers@zsh.org Content-Type: multipart/alternative; boundary=001a11c31d0ed9fdee0502731e72 --001a11c31d0ed9fdee0502731e72 Content-Type: text/plain; charset=UTF-8 zsh 5.0.6 appears to have broken pcre_match() with respect to option "-n". As a minimal length example, running a slightly embellished variant of the "man zshmodules" example yields an infinite loop: e.g., % string="The following zip codes: 78884 90210 99513" % pcre_compile -m "\d{5}" % accum=() % pcre_match -b -- $string % while [[ $? -eq 0 ]] do . print "match: $MATCH; ZPCRE_OP: $ZPCRE_OP" . b=($=ZPCRE_OP) . accum+=$MATCH . pcre_match -b -n $b[2] -- $string % done match: 78884; ZPCRE_OP: 25 30 match: 78884; ZPCRE_OP: 25 30 match: 78884; ZPCRE_OP: 25 30 match: 78884; ZPCRE_OP: 25 30 match: 78884; ZPCRE_OP: 25 30 match: 78884; ZPCRE_OP: 25 30 . . . The new behaviour seems rather... unproductive. Under zsh 5.0.5, the same example successfully terminates with standard output: match: 78884; ZPCRE_OP: 25 30 match: 90210; ZPCRE_OP: 31 36 match: 99513; ZPCRE_OP: 37 42 A unit test might prove helpful here. While on the topic, it might also be helpful to note that the manpage documentation for pcre_match() is rather incorrect. It reads: For example, a ZPCRE_OP set to "32 45" indicates that the matched portion began on byte offset 32 and ended on byte offset 44. Here, byte offset position 45 is the position directly after the matched portion. But that isn't the case. The first word of ZPCRE_OP is the offset of the byte preceding the first byte of the matched substring, while the second word of ZPCRE_OP is the offset of the last byte of the matched substring -- the diametric opposite. Hence, such documentation should read: For example, a ZPCRE_OP set to "32 45" indicates that the matched portion began on byte offset 33 and ended on byte offset 45. Here, byte offset position 32 is the position directly before the matched portion. Given that, one would assume line "pcre_match -b -n $b[2] -- $string" of the manpage example to also be incorrect. Specifically, since "$b[2]" is the offset of the last byte of the prior match, passing such offset to option "-n" should force pcre_match() to begin searching one byte earlier than intended. But that isn't the case. pcre_match() searches correctly, as verifiable by replacing "\d{5}" by "\d{2}" in such example. This implies option "-n" to begin searching at the byte following the passed byte offset (rather than at such offset), implying such option to also be incorrectly documented. It reads: A -n option starts searching for a match from the byte offset position in string. Correcting for clarity and grammar, that should read: If the -n option is given, a match will be searched for starting at the byte following the passed byte offset in the string. In any case, thanks all for the continued grit, fortitude, and hard shell work. Humbly yours, Cecil --001a11c31d0ed9fdee0502731e72--