zsh-workers
 help / color / mirror / code / Atom feed
* Feature request (#b)...(...)<not empty>
@ 2017-02-25  8:37 Sebastian Gniazdowski
  2017-02-25 15:13 ` Daniel Shahaf
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25  8:37 UTC (permalink / raw)
  To: zsh-workers

Hello,
found a limitation in regex-like specification, maybe it's easy to add
extension?

A pattern: (a|)(b|)(c|). Order of a, b, c matters, but any combination
of them will be accepted. The problem: |) in all three will also be
accepted. That's not a combination of a, b, c. This is onset of
something higher-level: "combination of".

So maybe: ((a|)(b|)(c|))(#n). #n is for not-empty, resembles [[ -n.

It looks like typical pattern situation: try something, decide if to
reject it. So maybe it's easy to add?

-- 
  Sebastian Gniazdowski
  psprint2@fastmail.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25  8:37 Feature request (#b)...(...)<not empty> Sebastian Gniazdowski
@ 2017-02-25 15:13 ` Daniel Shahaf
  2017-02-25 15:36   ` Sebastian Gniazdowski
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Shahaf @ 2017-02-25 15:13 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: zsh-workers

Sebastian Gniazdowski wrote on Sat, Feb 25, 2017 at 00:37:10 -0800:
> Hello,
> found a limitation in regex-like specification, maybe it's easy to add
> extension?
> 
> A pattern: (a|)(b|)(c|). Order of a, b, c matters, but any combination
> of them will be accepted. The problem: |) in all three will also be
> accepted. That's not a combination of a, b, c. This is onset of
> something higher-level: "combination of".
> 
> So maybe: ((a|)(b|)(c|))(#n). #n is for not-empty, resembles [[ -n.
> 
> It looks like typical pattern situation: try something, decide if to
> reject it. So maybe it's easy to add?

Some regexp flavours support zero-width lookahead, which would allow
this.  For example, in Perl,

     % print -l {a,}{b,}{c,} | perl -lnE 'print if /(?=.)(a|)(b|)(c|)/' | wc -l
     7

However, I have no opinion as to whether that should be added: I don't
know how much effort would be involved, and for the given example there
are alternative solutions that require no syntax changes.

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 15:13 ` Daniel Shahaf
@ 2017-02-25 15:36   ` Sebastian Gniazdowski
  2017-02-25 16:47     ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25 15:36 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: zsh-workers

On Sat, Feb 25, 2017, at 07:13 AM, Daniel Shahaf wrote:
> However, I have no opinion as to whether that should be added: I don't
> know how much effort would be involved, and for the given example there
> are alternative solutions that require no syntax changes.

I have format codes: bold, fg color, bg color. Accept any combination
preserving order. 20 colors in total. I currently do:

[range spanning all codes]([fg range]|)([bg range]|)

This way first position cannot be null and things work. However, this
will accept fgfg fgfgbg bgfg bgfgbg. But works and is fast. How would
you do it?

I think your neutral comment tends to hide fact how powerful (#n) and
(#z) would be, but I might not convey this message anyway.

-- 
  Sebastian Gniazdowski
  psprint2@fastmail.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 15:36   ` Sebastian Gniazdowski
@ 2017-02-25 16:47     ` Bart Schaefer
  2017-02-25 17:24       ` Sebastian Gniazdowski
  2017-02-25 17:54       ` Sebastian Gniazdowski
  0 siblings, 2 replies; 10+ messages in thread
From: Bart Schaefer @ 2017-02-25 16:47 UTC (permalink / raw)
  To: zsh-workers

On Feb 25, 12:37am, Sebastian Gniazdowski wrote:
}
} So maybe: ((a|)(b|)(c|))(#n). #n is for not-empty, resembles [[ -n.
} 
} It looks like typical pattern situation: try something, decide if to
} reject it. So maybe it's easy to add?

It's not typical -- as Daniel described, it requires doing either
look-ahead or look-behind, which a simple regular expression scanner
does not do.  That is, when scanning a string such as "fgfgbg" the
scanner has already accepted and moved past the first "fg" before it
examines the second "f".  In order to restrict what was matched
before, the code must either stop at that point and try all possible
matches for the "fgbg" remainder of the string -- and then remember
what NOT to do when it backs up to try again -- or it must remember
everything it did up to that point and re-evaluate that after the
second "fg" is scanned.

Since this could be nested, the tree of "extra looks" has arbitrary
branching.  This is different from simply failing to match what is
coming up next and backtracking the whole process.  This is also why
perl lookahead/behind patterns have several restrictions, e.g. that
the "look" be a fixed length string, so that most possible branches
can be pruned off that tree at expression compile time.

I've greatly simplified that explanation; I'm guessing you haven't yet
studied the computer science concept of "finite automata".  You may
find it interesting.


On Feb 25,  7:36am, Sebastian Gniazdowski wrote:
}
} [range spanning all codes]([fg range]|)([bg range]|)
} 
} This way first position cannot be null and things work. However, this
} will accept fgfg fgfgbg bgfg bgfgbg. But works and is fast. How would
} you do it?

In this case since there are only two patterns at the tail of the string
I'd probably just brute-force it:

   ([all-except-fg-or-bg]|[fg]|[bg])#([fgbg]|[bgfg]|[fg]|[bg])

Zsh's scanner will always take the left branch of an alternation if both
branches will succeed, so arranging this with longest matches first will
accomplish what you need, I think; but I suspect everything might be a bit
oversimplified here.

} I think your neutral comment tends to hide fact how powerful (#n) and
} (#z) would be, but I might not convey this message anyway.

Yes, they're so powerful that they require an entirely different sort
of expression parser to make them work.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 16:47     ` Bart Schaefer
@ 2017-02-25 17:24       ` Sebastian Gniazdowski
  2017-02-25 17:54         ` Daniel Shahaf
  2017-02-25 20:38         ` Bart Schaefer
  2017-02-25 17:54       ` Sebastian Gniazdowski
  1 sibling, 2 replies; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25 17:24 UTC (permalink / raw)
  To: zsh-workers

On Sat, Feb 25, 2017, at 08:47 AM, Bart Schaefer wrote:
> On Feb 25, 12:37am, Sebastian Gniazdowski wrote:
> (...) or it must remember
> everything it did up to that point and re-evaluate that after the
> second "fg" is scanned.

I thought that ((a|)(b|)(c|)) could be left as it is. So after it scans
"bold" (a), it normally proceeds to scan "fg" (i.e. b), etc. until it
finishes the outer parenthesis. No change here. Different order, e.g.
ba, isn't accepted, i.e. first "fg" then "bold" – the say API is
restricted, order is: bold, foreground, background (lacking any of them
but of course not all).

>  (...) This is different from simply failing to match what is
> coming up next and backtracking the whole process. 

This matches what (#n) would do. Reject empty result of parentheses, so
indeed... I was thinking in terms of parentheses.
((a|)(b|)(c|))<HERE>(#n) <- in this place we're somewhat at "current"
node. First recently finished. Expected some nodes to be there, grouping
elements of parentheses. And typical rejection of nodes to be already in
place.
 
>    ([all-except-fg-or-bg]|[fg]|[bg])#([fgbg]|[bgfg]|[fg]|[bg])
> 
> Zsh's scanner will always take the left branch of an alternation if both
> branches will succeed, so arranging this with longest matches first will
> accomplish what you need, I think; but I suspect everything might be a
> bit oversimplified here.

I'm in good position because it's API. User is required not to do
restricted things, it's like e.g. not accepting NULLs in C, so I can
leave it as it is.

-- 
  Sebastian Gniazdowski
  psprint2@fastmail.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 17:24       ` Sebastian Gniazdowski
@ 2017-02-25 17:54         ` Daniel Shahaf
  2017-02-25 20:38         ` Bart Schaefer
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Shahaf @ 2017-02-25 17:54 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: zsh-workers

Sebastian Gniazdowski wrote on Sat, Feb 25, 2017 at 09:24:40 -0800:
> On Sat, Feb 25, 2017, at 08:47 AM, Bart Schaefer wrote:
> > On Feb 25, 12:37am, Sebastian Gniazdowski wrote:
> > (...) or it must remember
> > everything it did up to that point and re-evaluate that after the
> > second "fg" is scanned.
> 
> I thought that ((a|)(b|)(c|)) could be left as it is. So after it scans
> "bold" (a), it normally proceeds to scan "fg" (i.e. b), etc. until it
> finishes the outer parenthesis. No change here. Different order, e.g.
> ba, isn't accepted, i.e. first "fg" then "bold" – the say API is
> restricted, order is: bold, foreground, background (lacking any of them
> but of course not all).

As an aside, in xterm, it's possible to omit all of bold, foreground,
background: $'\e[31m' is accepted, but so is $'\e[m'.

I assume that's a feature.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 16:47     ` Bart Schaefer
  2017-02-25 17:24       ` Sebastian Gniazdowski
@ 2017-02-25 17:54       ` Sebastian Gniazdowski
  2017-02-25 21:05         ` Bart Schaefer
  1 sibling, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25 17:54 UTC (permalink / raw)
  To: zsh-workers

On Sat, Feb 25, 2017, at 08:47 AM, Bart Schaefer wrote:
>    ([all-except-fg-or-bg]|[fg]|[bg])#([fgbg]|[bgfg]|[fg]|[bg])
> 
> Zsh's scanner will always take the left branch of an alternation if both
> branches will succeed, so arranging this with longest matches first will

Yes and this is interesting, I think I'm about something similar:

% a="abc"; echo ${a//(#b)ab(c)(#c0,1)/x} ${a//(#b)ab(c|)/x}
${a//(#b)ab(|c)/x} "<- last one shows | isn't greedy"
x x xc <- last one shows | isn't greedy
% [[ abc = ab(|c) ]] && echo "I've tried multiple options"
I've tried multiple options

I'm not sure if regexes work this way.. Maybe I'm loosing it for a
moment, but:

[[ abc =~ ab(|c) ]] && echo "I've tried multiple options"
zsh: failed to compile regex: empty (sub)expression

not sure how to test regexes. My point is:
- having | in parentheses behave this way looks like something
implementation-driven, not finite automata driven,
- allowing principle of greediness to be deliberately not followed
creates more rich behavior

Maybe (#n) can be implemented this way, by looking at code and not
following model. To in result extend number of states (number of
behaviors) of the system.

-- 
Sebastian Gniazdowski
psprint2@gmail.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 17:24       ` Sebastian Gniazdowski
  2017-02-25 17:54         ` Daniel Shahaf
@ 2017-02-25 20:38         ` Bart Schaefer
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2017-02-25 20:38 UTC (permalink / raw)
  To: zsh-workers

On Feb 25,  9:24am, Sebastian Gniazdowski wrote:
}
} I thought that ((a|)(b|)(c|)) could be left as it is. So after it scans
} "bold" (a), it normally proceeds to scan "fg" (i.e. b), etc. until it
} finishes the outer parenthesis. No change here.

The problem isn't just what's inside the parens, it's what's outside.
Consider what happens if you have a pattern like this and then make it
part of a larger pattern that repeats zero or more times, or to be a
sub-pattern within another set of parens that also has (#n), etc.
Even if you can make it work by keeping more state than a simple FA,
it becomes easy to write pathologically complex expressions.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 17:54       ` Sebastian Gniazdowski
@ 2017-02-25 21:05         ` Bart Schaefer
  2017-02-25 22:52           ` Daniel Shahaf
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2017-02-25 21:05 UTC (permalink / raw)
  To: zsh-workers

On Feb 25,  9:54am, Sebastian Gniazdowski wrote:
}
} ${a//(#b)ab(|c)/x} "<- last one shows | isn't greedy"
} x x xc <- last one shows | isn't greedy

Yes, you're correct.  Putting the shorter (in this case empty) alternate
first in the parens makes that particular subexpression non-greedy.  This
was a deliberate choice -- though I can't find that it's been documented
anywhere, on a brief search.

} I'm not sure if regexes work this way.. Maybe I'm loosing it for a
} moment, but:
} 
} [[ abc =~ ab(|c) ]] && echo "I've tried multiple options"
} zsh: failed to compile regex: empty (sub)expression

Yes, that's also correct.  Zsh patterns are not regexes and the default
regex library doesn't allow that syntax.  I believe you can use that
syntax if you load the zsh/pcre module and setopt REMATCH_PCRE.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Feature request (#b)...(...)<not empty>
  2017-02-25 21:05         ` Bart Schaefer
@ 2017-02-25 22:52           ` Daniel Shahaf
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Shahaf @ 2017-02-25 22:52 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Sat, Feb 25, 2017 at 13:05:25 -0800:
> On Feb 25,  9:54am, Sebastian Gniazdowski wrote:
> }
> } ${a//(#b)ab(|c)/x} "<- last one shows | isn't greedy"
> } x x xc <- last one shows | isn't greedy
> 
> Yes, you're correct.  Putting the shorter (in this case empty) alternate
> first in the parens makes that particular subexpression non-greedy.  This
> was a deliberate choice -- though I can't find that it's been documented
> anywhere, on a brief search.

diff --git a/Doc/Zsh/roadmap.yo b/Doc/Zsh/roadmap.yo
index bd064e2..94ef74d 100644
--- a/Doc/Zsh/roadmap.yo
+++ b/Doc/Zsh/roadmap.yo
@@ -139,6 +139,9 @@ startitem()
 item(tt(**))(
 for matching over multiple directories
 )
+item(tt(|))(
+for matching either of two alternatives
+)
 item(tt(~), tt(^))(
 the ability to exclude patterns from matching when the tt(EXTENDED_GLOB)
 option is set
diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 43ecd31..319409d 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -2100,6 +2100,7 @@ Matches either var(x) or var(y).
 This operator has lower precedence than any other.
 The `tt(|)' character
 must be within parentheses, to avoid interpretation as a pipeline.
+The alternatives are tried in order from left to right.
 )
 item(tt(^)var(x))(
 (Requires tt(EXTENDED_GLOB) to be set.)


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-25 22:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-25  8:37 Feature request (#b)...(...)<not empty> Sebastian Gniazdowski
2017-02-25 15:13 ` Daniel Shahaf
2017-02-25 15:36   ` Sebastian Gniazdowski
2017-02-25 16:47     ` Bart Schaefer
2017-02-25 17:24       ` Sebastian Gniazdowski
2017-02-25 17:54         ` Daniel Shahaf
2017-02-25 20:38         ` Bart Schaefer
2017-02-25 17:54       ` Sebastian Gniazdowski
2017-02-25 21:05         ` Bart Schaefer
2017-02-25 22:52           ` Daniel Shahaf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).