zsh-workers
 help / color / mirror / code / Atom feed
From: Oliver Kiddle <opk@zsh.org>
To: Zsh workers <zsh-workers@zsh.org>
Subject: PATCH: pcre callouts
Date: Tue, 31 Oct 2023 01:04:19 +0100	[thread overview]
Message-ID: <72311-1698710659.978677@cDMN.pAu_.Ex7V> (raw)

PCRE supports callouts similar to Perl's (?{ code }) but with different
syntax. There are string and numeric formats, and it seems logical
enough to evaluate the string forms as shell code.

So, e.g. (?C{foo}) or (?C'foo') will call the foo function. In Perl,
$_ is set to the string being examined. I've used .pcre.subject. Would
something else be better and should it perhaps start and end a new scope
to make that local? As in Perl, the return status can be used to make it
treat the callout as not matching.

This won't do anything for numeric callouts. They look mostly useful for
debugging. They could perhaps call a standard function passing the
number and string as parameters.

Oliver

diff --git a/Doc/Zsh/mod_pcre.yo b/Doc/Zsh/mod_pcre.yo
index da73ac85a..41fab4475 100644
--- a/Doc/Zsh/mod_pcre.yo
+++ b/Doc/Zsh/mod_pcre.yo
@@ -69,6 +69,11 @@ print -l $accum)
 )
 enditem()
 
+If the regular expression contains callouts, these are executed as shell code.
+During the execution of the callout, the string the regular expression is
+matching against is available in the parameter tt(.pcre.subject). If there is a
+non-zero return status from the shell code, the callout does not match.
+
 The option tt(-d) uses the alternative breadth-first DFA search algorithm of
 pcre. This sets tt(match), or the array given with tt(-a), to all the matches
 found from the same start point in the subject.
diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index f5cda6d38..e321f18a4 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -128,6 +128,30 @@ bin_pcre_study(char *nam, UNUSED(char **args), UNUSED(Options ops), UNUSED(int f
     return 0;
 }
 
+static int
+pcre_callout(pcre2_callout_block_8 *block, void *)
+{
+    Eprog prog;
+    int ret=0;
+
+    if (!block->callout_number &&
+	    ((prog = parse_string((char *) block->callout_string, 0))))
+    {
+	int ef = errflag, lv = lastval;
+
+	setsparam(".pcre.subject",
+		metafy((char *) block->subject, block->subject_length, META_DUP));
+	execode(prog, 1, 0, "pcre");
+	ret = lastval | errflag;
+
+	/* Restore any user interrupt error status */
+	errflag = ef | (errflag & ERRFLAG_INT);
+	lastval = lv;
+    }
+
+    return ret;
+}
+
 static int
 zpcre_get_substrings(pcre2_code *pat, char *arg, pcre2_match_data *mdata,
 	int captured_count, char *matchvar, char *substravar, char *namedassoc,
@@ -339,6 +363,9 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     plaintext = ztrdup(*args);
     unmetafy(plaintext, &subject_len);
 
+    pcre2_match_context_8 *mcontext = pcre2_match_context_create(NULL);
+    pcre2_set_callout(mcontext, &pcre_callout, 0);
+
     if (offset_start > 0 && offset_start >= subject_len)
 	ret = PCRE2_ERROR_NOMATCH;
     else if (use_dfa) {
@@ -347,7 +374,7 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
 	pcre_mdata = pcre2_match_data_create(capcount, NULL);
 	do {
 	    ret = pcre2_dfa_match(pcre_pattern, (PCRE2_SPTR) plaintext, subject_len,
-		offset_start, 0, pcre_mdata, NULL, (int *) workspace, wscount);
+		offset_start, 0, pcre_mdata, mcontext, (int *) workspace, wscount);
 	    if (ret == PCRE2_ERROR_DFA_WSSIZE) {
 		old = wscount;
 		wscount += wscount / 2;
@@ -362,7 +389,7 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     } else {
 	pcre_mdata = pcre2_match_data_create_from_pattern(pcre_pattern, NULL);
 	ret = pcre2_match(pcre_pattern, (PCRE2_SPTR) plaintext, subject_len,
-		offset_start, 0, pcre_mdata, NULL);
+		offset_start, 0, pcre_mdata, mcontext);
     }
 
     if (ret==0) return_value = 0;
@@ -380,6 +407,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     
     if (pcre_mdata)
 	pcre2_match_data_free(pcre_mdata);
+    if (mcontext)
+	pcre2_match_context_free(mcontext);
     zsfree(plaintext);
 
     return return_value;


             reply	other threads:[~2023-10-31  0:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-31  0:04 Oliver Kiddle [this message]
2023-10-31  3:26 ` Bart Schaefer
2023-10-31  3:40   ` Bart Schaefer
2023-10-31 13:31     ` Mikael Magnusson
2023-10-31 15:57       ` Bart Schaefer
2023-11-01  2:04   ` Oliver Kiddle
2023-11-03  3:47     ` Bart Schaefer
2023-11-03  9:50       ` Oliver Kiddle
2023-11-04 20:57         ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72311-1698710659.978677@cDMN.pAu_.Ex7V \
    --to=opk@zsh.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).