zsh-users
 help / color / mirror / code / Atom feed
From: Bart Schaefer <schaefer@brasslantern.com>
To: Moritz Bunkus <moritz@bunkus.org>, zsh-users@zsh.org
Subject: Several PCRE module oddities
Date: Sat, 19 Jul 2014 15:21:16 -0700	[thread overview]
Message-ID: <140719152116.ZM13686@torch.brasslantern.com> (raw)
In-Reply-To: <20140719121937.GN12213@bunkus.org>
In-Reply-To: <20140719123158.GO12213@bunkus.org>
In-Reply-To: <20140719123620.GP12213@bunkus.org>
In-Reply-To: <20140719123645.GQ12213@bunkus.org>

Replying to all of these PCRE-related items all together ...

Preliminary remark:  In none of your test scripts do I see you setting
the RE_MATCH_PCRE option.  You need to set that, in addition to loading
the zsh/pcre module, or [[ str =~ pat ]] continues to use zsh/regex.  I
think that explains your "zsh -f" behavior confusion (see below).

On Jul 19,  2:19pm, Moritz Bunkus wrote:
}
} today I stumbled across this paragraph in zsh's info documentation
} again. Citing "Description of Options":
} 
} > CASE_MATCH <D>
} >      Make regular expressions using the zsh/regex module (including
} >      matches with =~) sensitive to case.
} 
} It does not apply to =~ if the zsh/pcre module is loaded.

See the first and third hunks of the patch below, though I suppose we
should get general agreement on whether it should work this way, because
there's no way to turn it off on a per-pattern basis (unlike turning it
*on* with "pcre_compile -i").

Also of course if you pcre_compile with one setting of CASE_MATCH and then
change it before calling pcre_match, you get the behavior from compile
time, so that ought to be explicitly documented.  That doesn't apply to
the inline condition operator, which recompiles every time it's used.


On Jul 19,  2:31pm, Moritz Bunkus wrote:
}
} line=
} if [[ $line =~ '^$' ]] print is empty case 1
} 
} pcre_compile '^$'
} pcre_match "$line" && print is empty case 2
} pcre_match $line   && print is empty case 3
} ----------------------------------------
} 
} 1. =~ matches as expected
} 
} 2. pcre_match "$line" does NOT match and doesn't emit an error message
} 
} 3. pcre_match $line does NOT match either and emits an error message
} 
} This is not only inconsistent but also simply wrong. Both 2. and
} 3. should match, and 2. shouldn't emit an error message.

Actually only (2) is strange here, see second hunk of patch below (there
may be a better way to fix this).  As for examples (1) and (3):

[[ $line =~ ^$ ]] is a special syntactic construct which treats $line
(unquoted parameter reference) as a token before expanding the value.
You also don't need the single-quotes around ^$ for this reason.

The calls to pcre_match, on the other hand, are normal shell commands,
which means the parameter references are expanded and unquoted values
are completely removed from the argument list, before the command is
even invoked.  So "not enough arguments" is exactly as expected, and
completely consistent with other shell commands.


On Jul 19,  2:36pm, Moritz Bunkus wrote:
}
} This case gets even weirder. The previous output I posted was gathered
} from zsh running with my normal RC files. The output actually differs if
} run with -f:
} 
} Meaning without any RCs case 2 matches, too!

This is because you haven't set the RE_MATCH_PCRE option, so zsh/regex
is being used for case 2.


Here's the patch.

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index cb9f8ef..2333438 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -87,6 +87,8 @@ bin_pcre_compile(char *nam, char **args, Options ops, UNUSED(int func))
     
     if (zpcre_utf8_enabled())
 	pcre_opts |= PCRE_UTF8;
+    if (!isset(CASEMATCH))
+	pcre_opts |= PCRE_CASELESS;
 
     pcre_hints = NULL;  /* Is this necessary? */
     
@@ -311,7 +313,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     unmetafy(plaintext, NULL);
     subject_len = (int)strlen(plaintext);
 
-    if (offset_start < 0 || offset_start >= subject_len)
+    if (offset_start < 0 ||
+	(subject_len ? offset_start >= subject_len : offset_start > 0))
 	ret = PCRE_ERROR_NOMATCH;
     else
 	ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize);
@@ -345,6 +348,8 @@ cond_pcre_match(char **a, int id)
 
     if (zpcre_utf8_enabled())
 	pcre_opts |= PCRE_UTF8;
+    if (!isset(CASEMATCH))
+	pcre_opts |= PCRE_CASELESS;
 
     lhstr = cond_str(a,0,0);
     rhre = cond_str(a,1,0);


  reply	other threads:[~2014-07-19 22:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-19 12:19 case-insensitivity of =~ operator Moritz Bunkus
2014-07-19 12:31 ` inconsistency in empty argument matching =~/pcre_match Moritz Bunkus
2014-07-19 12:36   ` Moritz Bunkus
2014-07-19 22:21     ` Bart Schaefer [this message]
2014-07-20  7:40       ` Several PCRE module oddities Moritz Bunkus
2014-07-20 10:24       ` Roman Neuhauser
2014-07-20 16:14       ` Bart Schaefer
2014-07-20 17:19         ` Peter Stephenson
2014-07-19 12:36 ` case-insensitivity of =~ operator Moritz Bunkus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=140719152116.ZM13686@torch.brasslantern.com \
    --to=schaefer@brasslantern.com \
    --cc=moritz@bunkus.org \
    --cc=zsh-users@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).