* Feature request (#b)...(...)<not empty>
@ 2017-02-25 8:37 Sebastian Gniazdowski
2017-02-25 15:13 ` Daniel Shahaf
0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25 8:37 UTC (permalink / raw)
To: zsh-workers
Hello,
found a limitation in regex-like specification, maybe it's easy to add
extension?
A pattern: (a|)(b|)(c|). Order of a, b, c matters, but any combination
of them will be accepted. The problem: |) in all three will also be
accepted. That's not a combination of a, b, c. This is onset of
something higher-level: "combination of".
So maybe: ((a|)(b|)(c|))(#n). #n is for not-empty, resembles [[ -n.
It looks like typical pattern situation: try something, decide if to
reject it. So maybe it's easy to add?
--
Sebastian Gniazdowski
psprint2@fastmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 8:37 Feature request (#b)...(...)<not empty> Sebastian Gniazdowski
@ 2017-02-25 15:13 ` Daniel Shahaf
2017-02-25 15:36 ` Sebastian Gniazdowski
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Shahaf @ 2017-02-25 15:13 UTC (permalink / raw)
To: Sebastian Gniazdowski; +Cc: zsh-workers
Sebastian Gniazdowski wrote on Sat, Feb 25, 2017 at 00:37:10 -0800:
> Hello,
> found a limitation in regex-like specification, maybe it's easy to add
> extension?
>
> A pattern: (a|)(b|)(c|). Order of a, b, c matters, but any combination
> of them will be accepted. The problem: |) in all three will also be
> accepted. That's not a combination of a, b, c. This is onset of
> something higher-level: "combination of".
>
> So maybe: ((a|)(b|)(c|))(#n). #n is for not-empty, resembles [[ -n.
>
> It looks like typical pattern situation: try something, decide if to
> reject it. So maybe it's easy to add?
Some regexp flavours support zero-width lookahead, which would allow
this. For example, in Perl,
% print -l {a,}{b,}{c,} | perl -lnE 'print if /(?=.)(a|)(b|)(c|)/' | wc -l
7
However, I have no opinion as to whether that should be added: I don't
know how much effort would be involved, and for the given example there
are alternative solutions that require no syntax changes.
Cheers,
Daniel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 15:13 ` Daniel Shahaf
@ 2017-02-25 15:36 ` Sebastian Gniazdowski
2017-02-25 16:47 ` Bart Schaefer
0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25 15:36 UTC (permalink / raw)
To: Daniel Shahaf; +Cc: zsh-workers
On Sat, Feb 25, 2017, at 07:13 AM, Daniel Shahaf wrote:
> However, I have no opinion as to whether that should be added: I don't
> know how much effort would be involved, and for the given example there
> are alternative solutions that require no syntax changes.
I have format codes: bold, fg color, bg color. Accept any combination
preserving order. 20 colors in total. I currently do:
[range spanning all codes]([fg range]|)([bg range]|)
This way first position cannot be null and things work. However, this
will accept fgfg fgfgbg bgfg bgfgbg. But works and is fast. How would
you do it?
I think your neutral comment tends to hide fact how powerful (#n) and
(#z) would be, but I might not convey this message anyway.
--
Sebastian Gniazdowski
psprint2@fastmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 15:36 ` Sebastian Gniazdowski
@ 2017-02-25 16:47 ` Bart Schaefer
2017-02-25 17:24 ` Sebastian Gniazdowski
2017-02-25 17:54 ` Sebastian Gniazdowski
0 siblings, 2 replies; 10+ messages in thread
From: Bart Schaefer @ 2017-02-25 16:47 UTC (permalink / raw)
To: zsh-workers
On Feb 25, 12:37am, Sebastian Gniazdowski wrote:
}
} So maybe: ((a|)(b|)(c|))(#n). #n is for not-empty, resembles [[ -n.
}
} It looks like typical pattern situation: try something, decide if to
} reject it. So maybe it's easy to add?
It's not typical -- as Daniel described, it requires doing either
look-ahead or look-behind, which a simple regular expression scanner
does not do. That is, when scanning a string such as "fgfgbg" the
scanner has already accepted and moved past the first "fg" before it
examines the second "f". In order to restrict what was matched
before, the code must either stop at that point and try all possible
matches for the "fgbg" remainder of the string -- and then remember
what NOT to do when it backs up to try again -- or it must remember
everything it did up to that point and re-evaluate that after the
second "fg" is scanned.
Since this could be nested, the tree of "extra looks" has arbitrary
branching. This is different from simply failing to match what is
coming up next and backtracking the whole process. This is also why
perl lookahead/behind patterns have several restrictions, e.g. that
the "look" be a fixed length string, so that most possible branches
can be pruned off that tree at expression compile time.
I've greatly simplified that explanation; I'm guessing you haven't yet
studied the computer science concept of "finite automata". You may
find it interesting.
On Feb 25, 7:36am, Sebastian Gniazdowski wrote:
}
} [range spanning all codes]([fg range]|)([bg range]|)
}
} This way first position cannot be null and things work. However, this
} will accept fgfg fgfgbg bgfg bgfgbg. But works and is fast. How would
} you do it?
In this case since there are only two patterns at the tail of the string
I'd probably just brute-force it:
([all-except-fg-or-bg]|[fg]|[bg])#([fgbg]|[bgfg]|[fg]|[bg])
Zsh's scanner will always take the left branch of an alternation if both
branches will succeed, so arranging this with longest matches first will
accomplish what you need, I think; but I suspect everything might be a bit
oversimplified here.
} I think your neutral comment tends to hide fact how powerful (#n) and
} (#z) would be, but I might not convey this message anyway.
Yes, they're so powerful that they require an entirely different sort
of expression parser to make them work.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 16:47 ` Bart Schaefer
@ 2017-02-25 17:24 ` Sebastian Gniazdowski
2017-02-25 17:54 ` Daniel Shahaf
2017-02-25 20:38 ` Bart Schaefer
2017-02-25 17:54 ` Sebastian Gniazdowski
1 sibling, 2 replies; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25 17:24 UTC (permalink / raw)
To: zsh-workers
On Sat, Feb 25, 2017, at 08:47 AM, Bart Schaefer wrote:
> On Feb 25, 12:37am, Sebastian Gniazdowski wrote:
> (...) or it must remember
> everything it did up to that point and re-evaluate that after the
> second "fg" is scanned.
I thought that ((a|)(b|)(c|)) could be left as it is. So after it scans
"bold" (a), it normally proceeds to scan "fg" (i.e. b), etc. until it
finishes the outer parenthesis. No change here. Different order, e.g.
ba, isn't accepted, i.e. first "fg" then "bold" – the say API is
restricted, order is: bold, foreground, background (lacking any of them
but of course not all).
> (...) This is different from simply failing to match what is
> coming up next and backtracking the whole process.
This matches what (#n) would do. Reject empty result of parentheses, so
indeed... I was thinking in terms of parentheses.
((a|)(b|)(c|))<HERE>(#n) <- in this place we're somewhat at "current"
node. First recently finished. Expected some nodes to be there, grouping
elements of parentheses. And typical rejection of nodes to be already in
place.
> ([all-except-fg-or-bg]|[fg]|[bg])#([fgbg]|[bgfg]|[fg]|[bg])
>
> Zsh's scanner will always take the left branch of an alternation if both
> branches will succeed, so arranging this with longest matches first will
> accomplish what you need, I think; but I suspect everything might be a
> bit oversimplified here.
I'm in good position because it's API. User is required not to do
restricted things, it's like e.g. not accepting NULLs in C, so I can
leave it as it is.
--
Sebastian Gniazdowski
psprint2@fastmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 17:24 ` Sebastian Gniazdowski
@ 2017-02-25 17:54 ` Daniel Shahaf
2017-02-25 20:38 ` Bart Schaefer
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Shahaf @ 2017-02-25 17:54 UTC (permalink / raw)
To: Sebastian Gniazdowski; +Cc: zsh-workers
Sebastian Gniazdowski wrote on Sat, Feb 25, 2017 at 09:24:40 -0800:
> On Sat, Feb 25, 2017, at 08:47 AM, Bart Schaefer wrote:
> > On Feb 25, 12:37am, Sebastian Gniazdowski wrote:
> > (...) or it must remember
> > everything it did up to that point and re-evaluate that after the
> > second "fg" is scanned.
>
> I thought that ((a|)(b|)(c|)) could be left as it is. So after it scans
> "bold" (a), it normally proceeds to scan "fg" (i.e. b), etc. until it
> finishes the outer parenthesis. No change here. Different order, e.g.
> ba, isn't accepted, i.e. first "fg" then "bold" – the say API is
> restricted, order is: bold, foreground, background (lacking any of them
> but of course not all).
As an aside, in xterm, it's possible to omit all of bold, foreground,
background: $'\e[31m' is accepted, but so is $'\e[m'.
I assume that's a feature.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 16:47 ` Bart Schaefer
2017-02-25 17:24 ` Sebastian Gniazdowski
@ 2017-02-25 17:54 ` Sebastian Gniazdowski
2017-02-25 21:05 ` Bart Schaefer
1 sibling, 1 reply; 10+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-25 17:54 UTC (permalink / raw)
To: zsh-workers
On Sat, Feb 25, 2017, at 08:47 AM, Bart Schaefer wrote:
> ([all-except-fg-or-bg]|[fg]|[bg])#([fgbg]|[bgfg]|[fg]|[bg])
>
> Zsh's scanner will always take the left branch of an alternation if both
> branches will succeed, so arranging this with longest matches first will
Yes and this is interesting, I think I'm about something similar:
% a="abc"; echo ${a//(#b)ab(c)(#c0,1)/x} ${a//(#b)ab(c|)/x}
${a//(#b)ab(|c)/x} "<- last one shows | isn't greedy"
x x xc <- last one shows | isn't greedy
% [[ abc = ab(|c) ]] && echo "I've tried multiple options"
I've tried multiple options
I'm not sure if regexes work this way.. Maybe I'm loosing it for a
moment, but:
[[ abc =~ ab(|c) ]] && echo "I've tried multiple options"
zsh: failed to compile regex: empty (sub)expression
not sure how to test regexes. My point is:
- having | in parentheses behave this way looks like something
implementation-driven, not finite automata driven,
- allowing principle of greediness to be deliberately not followed
creates more rich behavior
Maybe (#n) can be implemented this way, by looking at code and not
following model. To in result extend number of states (number of
behaviors) of the system.
--
Sebastian Gniazdowski
psprint2@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 17:24 ` Sebastian Gniazdowski
2017-02-25 17:54 ` Daniel Shahaf
@ 2017-02-25 20:38 ` Bart Schaefer
1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2017-02-25 20:38 UTC (permalink / raw)
To: zsh-workers
On Feb 25, 9:24am, Sebastian Gniazdowski wrote:
}
} I thought that ((a|)(b|)(c|)) could be left as it is. So after it scans
} "bold" (a), it normally proceeds to scan "fg" (i.e. b), etc. until it
} finishes the outer parenthesis. No change here.
The problem isn't just what's inside the parens, it's what's outside.
Consider what happens if you have a pattern like this and then make it
part of a larger pattern that repeats zero or more times, or to be a
sub-pattern within another set of parens that also has (#n), etc.
Even if you can make it work by keeping more state than a simple FA,
it becomes easy to write pathologically complex expressions.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 17:54 ` Sebastian Gniazdowski
@ 2017-02-25 21:05 ` Bart Schaefer
2017-02-25 22:52 ` Daniel Shahaf
0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2017-02-25 21:05 UTC (permalink / raw)
To: zsh-workers
On Feb 25, 9:54am, Sebastian Gniazdowski wrote:
}
} ${a//(#b)ab(|c)/x} "<- last one shows | isn't greedy"
} x x xc <- last one shows | isn't greedy
Yes, you're correct. Putting the shorter (in this case empty) alternate
first in the parens makes that particular subexpression non-greedy. This
was a deliberate choice -- though I can't find that it's been documented
anywhere, on a brief search.
} I'm not sure if regexes work this way.. Maybe I'm loosing it for a
} moment, but:
}
} [[ abc =~ ab(|c) ]] && echo "I've tried multiple options"
} zsh: failed to compile regex: empty (sub)expression
Yes, that's also correct. Zsh patterns are not regexes and the default
regex library doesn't allow that syntax. I believe you can use that
syntax if you load the zsh/pcre module and setopt REMATCH_PCRE.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Feature request (#b)...(...)<not empty>
2017-02-25 21:05 ` Bart Schaefer
@ 2017-02-25 22:52 ` Daniel Shahaf
0 siblings, 0 replies; 10+ messages in thread
From: Daniel Shahaf @ 2017-02-25 22:52 UTC (permalink / raw)
To: zsh-workers
Bart Schaefer wrote on Sat, Feb 25, 2017 at 13:05:25 -0800:
> On Feb 25, 9:54am, Sebastian Gniazdowski wrote:
> }
> } ${a//(#b)ab(|c)/x} "<- last one shows | isn't greedy"
> } x x xc <- last one shows | isn't greedy
>
> Yes, you're correct. Putting the shorter (in this case empty) alternate
> first in the parens makes that particular subexpression non-greedy. This
> was a deliberate choice -- though I can't find that it's been documented
> anywhere, on a brief search.
diff --git a/Doc/Zsh/roadmap.yo b/Doc/Zsh/roadmap.yo
index bd064e2..94ef74d 100644
--- a/Doc/Zsh/roadmap.yo
+++ b/Doc/Zsh/roadmap.yo
@@ -139,6 +139,9 @@ startitem()
item(tt(**))(
for matching over multiple directories
)
+item(tt(|))(
+for matching either of two alternatives
+)
item(tt(~), tt(^))(
the ability to exclude patterns from matching when the tt(EXTENDED_GLOB)
option is set
diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 43ecd31..319409d 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -2100,6 +2100,7 @@ Matches either var(x) or var(y).
This operator has lower precedence than any other.
The `tt(|)' character
must be within parentheses, to avoid interpretation as a pipeline.
+The alternatives are tried in order from left to right.
)
item(tt(^)var(x))(
(Requires tt(EXTENDED_GLOB) to be set.)
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-02-25 22:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-25 8:37 Feature request (#b)...(...)<not empty> Sebastian Gniazdowski
2017-02-25 15:13 ` Daniel Shahaf
2017-02-25 15:36 ` Sebastian Gniazdowski
2017-02-25 16:47 ` Bart Schaefer
2017-02-25 17:24 ` Sebastian Gniazdowski
2017-02-25 17:54 ` Daniel Shahaf
2017-02-25 20:38 ` Bart Schaefer
2017-02-25 17:54 ` Sebastian Gniazdowski
2017-02-25 21:05 ` Bart Schaefer
2017-02-25 22:52 ` Daniel Shahaf
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).