zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: pcre callouts
@ 2023-10-31  0:04 Oliver Kiddle
  2023-10-31  3:26 ` Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Kiddle @ 2023-10-31  0:04 UTC (permalink / raw)
  To: Zsh workers

PCRE supports callouts similar to Perl's (?{ code }) but with different
syntax. There are string and numeric formats, and it seems logical
enough to evaluate the string forms as shell code.

So, e.g. (?C{foo}) or (?C'foo') will call the foo function. In Perl,
$_ is set to the string being examined. I've used .pcre.subject. Would
something else be better and should it perhaps start and end a new scope
to make that local? As in Perl, the return status can be used to make it
treat the callout as not matching.

This won't do anything for numeric callouts. They look mostly useful for
debugging. They could perhaps call a standard function passing the
number and string as parameters.

Oliver

diff --git a/Doc/Zsh/mod_pcre.yo b/Doc/Zsh/mod_pcre.yo
index da73ac85a..41fab4475 100644
--- a/Doc/Zsh/mod_pcre.yo
+++ b/Doc/Zsh/mod_pcre.yo
@@ -69,6 +69,11 @@ print -l $accum)
 )
 enditem()
 
+If the regular expression contains callouts, these are executed as shell code.
+During the execution of the callout, the string the regular expression is
+matching against is available in the parameter tt(.pcre.subject). If there is a
+non-zero return status from the shell code, the callout does not match.
+
 The option tt(-d) uses the alternative breadth-first DFA search algorithm of
 pcre. This sets tt(match), or the array given with tt(-a), to all the matches
 found from the same start point in the subject.
diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index f5cda6d38..e321f18a4 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -128,6 +128,30 @@ bin_pcre_study(char *nam, UNUSED(char **args), UNUSED(Options ops), UNUSED(int f
     return 0;
 }
 
+static int
+pcre_callout(pcre2_callout_block_8 *block, void *)
+{
+    Eprog prog;
+    int ret=0;
+
+    if (!block->callout_number &&
+	    ((prog = parse_string((char *) block->callout_string, 0))))
+    {
+	int ef = errflag, lv = lastval;
+
+	setsparam(".pcre.subject",
+		metafy((char *) block->subject, block->subject_length, META_DUP));
+	execode(prog, 1, 0, "pcre");
+	ret = lastval | errflag;
+
+	/* Restore any user interrupt error status */
+	errflag = ef | (errflag & ERRFLAG_INT);
+	lastval = lv;
+    }
+
+    return ret;
+}
+
 static int
 zpcre_get_substrings(pcre2_code *pat, char *arg, pcre2_match_data *mdata,
 	int captured_count, char *matchvar, char *substravar, char *namedassoc,
@@ -339,6 +363,9 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     plaintext = ztrdup(*args);
     unmetafy(plaintext, &subject_len);
 
+    pcre2_match_context_8 *mcontext = pcre2_match_context_create(NULL);
+    pcre2_set_callout(mcontext, &pcre_callout, 0);
+
     if (offset_start > 0 && offset_start >= subject_len)
 	ret = PCRE2_ERROR_NOMATCH;
     else if (use_dfa) {
@@ -347,7 +374,7 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
 	pcre_mdata = pcre2_match_data_create(capcount, NULL);
 	do {
 	    ret = pcre2_dfa_match(pcre_pattern, (PCRE2_SPTR) plaintext, subject_len,
-		offset_start, 0, pcre_mdata, NULL, (int *) workspace, wscount);
+		offset_start, 0, pcre_mdata, mcontext, (int *) workspace, wscount);
 	    if (ret == PCRE2_ERROR_DFA_WSSIZE) {
 		old = wscount;
 		wscount += wscount / 2;
@@ -362,7 +389,7 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     } else {
 	pcre_mdata = pcre2_match_data_create_from_pattern(pcre_pattern, NULL);
 	ret = pcre2_match(pcre_pattern, (PCRE2_SPTR) plaintext, subject_len,
-		offset_start, 0, pcre_mdata, NULL);
+		offset_start, 0, pcre_mdata, mcontext);
     }
 
     if (ret==0) return_value = 0;
@@ -380,6 +407,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
     
     if (pcre_mdata)
 	pcre2_match_data_free(pcre_mdata);
+    if (mcontext)
+	pcre2_match_context_free(mcontext);
     zsfree(plaintext);
 
     return return_value;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-10-31  0:04 PATCH: pcre callouts Oliver Kiddle
@ 2023-10-31  3:26 ` Bart Schaefer
  2023-10-31  3:40   ` Bart Schaefer
  2023-11-01  2:04   ` Oliver Kiddle
  0 siblings, 2 replies; 9+ messages in thread
From: Bart Schaefer @ 2023-10-31  3:26 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh workers

On Mon, Oct 30, 2023 at 5:04 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> So, e.g. (?C{foo}) or (?C'foo') will call the foo function. In Perl,
> $_ is set to the string being examined. I've used .pcre.subject. Would
> something else be better

You could actually use ${.pcre._} I suppose.  I'm undecided on whether
that's better.

> and should it perhaps start and end a new scope
> to make that local?

We do have a precedent for that now with ${|...} creating a scope.

> This won't do anything for numeric callouts. They look mostly useful for
> debugging. They could perhaps call a standard function passing the
> number and string as parameters.

What's an example of using a number callout outside of zsh?

I see you're calling parse_string() here:

> +    if (!block->callout_number &&
> +           ((prog = parse_string((char *) block->callout_string, 0))))

How are you solving the problem of finding the end of the callout?
That is, (?C{code}) looks like it would have the same parsing problems
I wrestled with for ${|code}.  Is it just that you can skip everything
from "(?" to the matching ")" without having to worry about
(un)balanced braces inside, etc.?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-10-31  3:26 ` Bart Schaefer
@ 2023-10-31  3:40   ` Bart Schaefer
  2023-10-31 13:31     ` Mikael Magnusson
  2023-11-01  2:04   ` Oliver Kiddle
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2023-10-31  3:40 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh workers

On Mon, Oct 30, 2023 at 8:26 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> You could actually use ${.pcre._} I suppose.  I'm undecided on whether
> that's better.
>
> We do have a precedent for that now with ${|...} creating a scope.

It occurs to me that if you create a scope, you could also create $_
in that scope with PM_HIDE set, and then use $_ directly with no
namespace needed.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-10-31  3:40   ` Bart Schaefer
@ 2023-10-31 13:31     ` Mikael Magnusson
  2023-10-31 15:57       ` Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Mikael Magnusson @ 2023-10-31 13:31 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Oliver Kiddle, Zsh workers

On 10/31/23, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Mon, Oct 30, 2023 at 8:26 PM Bart Schaefer <schaefer@brasslantern.com>
> wrote:
>>
>> You could actually use ${.pcre._} I suppose.  I'm undecided on whether
>> that's better.
>>
>> We do have a precedent for that now with ${|...} creating a scope.
>
> It occurs to me that if you create a scope, you could also create $_
> in that scope with PM_HIDE set, and then use $_ directly with no
> namespace needed.

Wouldn't any code inside need to immediately store the value of $_ in
another parameter to avoid overwriting it after the first command?
Seems like an unnecessary extra step if that's the case.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-10-31 13:31     ` Mikael Magnusson
@ 2023-10-31 15:57       ` Bart Schaefer
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2023-10-31 15:57 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: Oliver Kiddle, Zsh workers

On Tue, Oct 31, 2023 at 6:31 AM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> On 10/31/23, Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > It occurs to me that if you create a scope, you could also create $_
> > in that scope with PM_HIDE set, and then use $_ directly with no
> > namespace needed.
>
> Wouldn't any code inside need to immediately store the value of $_ in
> another parameter to avoid overwriting it after the first command?

PM_HIDE is "local -h" ... if that's set the global value of $_ will
change after every command but the local scope won't be able to see
it.  Seems to work in a quick function example I tried.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-10-31  3:26 ` Bart Schaefer
  2023-10-31  3:40   ` Bart Schaefer
@ 2023-11-01  2:04   ` Oliver Kiddle
  2023-11-03  3:47     ` Bart Schaefer
  1 sibling, 1 reply; 9+ messages in thread
From: Oliver Kiddle @ 2023-11-01  2:04 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh workers

Bart Schaefer wrote:
> You could actually use ${.pcre._} I suppose.  I'm undecided on whether
> that's better.

That still needs braces which makes it harder to embed in a callout.
Perhaps callouts are unlikely to be used other than with functions. If
so, it might just as well be .pcre.subject.

> > and should it perhaps start and end a new scope
> > to make that local?
>
> We do have a precedent for that now with ${|...} creating a scope.

The patch below tries this approach along with your suggestion of using
PM_HIDE for $_. The shell's $_ is not a feature I use much so I may not
be the best person to decide on whether it can be used here.

> > This won't do anything for numeric callouts. They look mostly useful for
> > debugging. They could perhaps call a standard function passing the
> > number and string as parameters.
>
> What's an example of using a number callout outside of zsh?

I'm not finding a whole lot of uses but it isn't entirely easy to
search for. It's fairly common for projects to include the pcre sources
directly within their own source tree so source code searches are just
finding copies of the pcre test code. Some Windows scripting language
called AutoHotKey uses them and has a pcre_callout function which you
need to define. I found one other use which also looked like a language
and also used a function.

> I see you're calling parse_string() here:
>
> > +    if (!block->callout_number &&
> > +           ((prog = parse_string((char *) block->callout_string, 0))))
>
> How are you solving the problem of finding the end of the callout?

That's pcre's problem as it parses the regular expressions. It has
always needed to parse callouts but without the patch, they are ignored.
You can't really use the feature without using shell quoting for the
whole regex. The [[ str =~ regex ]] form can be used without quoting to
a limited extent but that appears to be looking for the matching ")" so
seems sane enough.

Oliver

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index e321f18a4..173ab4a69 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -138,11 +138,20 @@ pcre_callout(pcre2_callout_block_8 *block, void *)
 	    ((prog = parse_string((char *) block->callout_string, 0))))
     {
 	int ef = errflag, lv = lastval;
+	Param pm;
+	char *subject = metafy((char *) block->subject,
+		block->subject_length, META_DUP);
 
-	setsparam(".pcre.subject",
-		metafy((char *) block->subject, block->subject_length, META_DUP));
+	startparamscope();
+	if ((pm = createparam("_", PM_LOCAL|PM_HIDE|PM_UNSET))) {
+	    pm->level = locallevel;
+	    setsparam("_", ztrdup(subject));
+	}
+	setsparam(".pcre.subject", subject);
+	setiparam(".pcre.pos", block->current_position + 1);
 	execode(prog, 1, 0, "pcre");
 	ret = lastval | errflag;
+	endparamscope();
 
 	/* Restore any user interrupt error status */
 	errflag = ef | (errflag & ERRFLAG_INT);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-11-01  2:04   ` Oliver Kiddle
@ 2023-11-03  3:47     ` Bart Schaefer
  2023-11-03  9:50       ` Oliver Kiddle
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2023-11-03  3:47 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh workers

On Tue, Oct 31, 2023 at 7:05 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> @@ -138,11 +138,20 @@ pcre_callout(pcre2_callout_block_8 *block, void *)

I didn't try applying the previous patch, but now that this one has
been pushed to sourceforge:

pcre.c: In function ‘pcre_callout’:
pcre.c:132:44: error: parameter name omitted
  132 | pcre_callout(pcre2_callout_block_8 *block, void *)
      |                                            ^~~~~~


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-11-03  3:47     ` Bart Schaefer
@ 2023-11-03  9:50       ` Oliver Kiddle
  2023-11-04 20:57         ` Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Kiddle @ 2023-11-03  9:50 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh workers

Bart Schaefer wrote:
> I didn't try applying the previous patch, but now that this one has
> been pushed to sourceforge:

I'm less sure on whether to push the later one for setting a scope.
And perhaps whether to add a -f option to pcre_match for a fixed callout
function that can avoid shell evaluation and handle numeric callouts.

> pcre.c: In function ‘pcre_callout’:
> pcre.c:132:44: error: parameter name omitted
>   132 | pcre_callout(pcre2_callout_block_8 *block, void *)
>       |                                            ^~~~~~

Sorry, I think the following form should be right.

Oliver

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index e6b59831f..e48ae3ae5 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -129,7 +129,7 @@ bin_pcre_study(char *nam, UNUSED(char **args), UNUSED(Options ops), UNUSED(int f
 }
 
 static int
-pcre_callout(pcre2_callout_block_8 *block, void *)
+pcre_callout(pcre2_callout_block_8 *block, UNUSED(void *callout_data))
 {
     Eprog prog;
     int ret=0;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PATCH: pcre callouts
  2023-11-03  9:50       ` Oliver Kiddle
@ 2023-11-04 20:57         ` Bart Schaefer
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2023-11-04 20:57 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh workers

On Fri, Nov 3, 2023 at 2:50 AM Oliver Kiddle <opk@zsh.org> wrote:
>
> -pcre_callout(pcre2_callout_block_8 *block, void *)
> +pcre_callout(pcre2_callout_block_8 *block, UNUSED(void *callout_data))

Yes, that fixes it.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-04 20:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-31  0:04 PATCH: pcre callouts Oliver Kiddle
2023-10-31  3:26 ` Bart Schaefer
2023-10-31  3:40   ` Bart Schaefer
2023-10-31 13:31     ` Mikael Magnusson
2023-10-31 15:57       ` Bart Schaefer
2023-11-01  2:04   ` Oliver Kiddle
2023-11-03  3:47     ` Bart Schaefer
2023-11-03  9:50       ` Oliver Kiddle
2023-11-04 20:57         ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).