On Mon, Dec 21, 2020 at 2:36 PM Taco Hoekwater wrote: > > > > On 21 Dec 2020, at 14:08, Mojca Miklavec > wrote: > > > > Dear Taco, > > > > On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote: > >>> On 21 Dec 2020, at 13:16, Mojca Miklavec wrote: > >>> > >>> My only explanation would be that perhaps "^1" is so greedy that the > >>> rest of the pattern doesn't get found. But I don't want to believe > >>> that explanation. > >> > >> Which (of course) means that that is exactly what happens ;) > >> > >> The ones that match are > >> > >> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5 > >> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5 > >> > >> With the ^1, in the “bb” cases the first “b” eats all three “b”s: > >> > >> ababbb fails the r5 at the end > >> > >> abbbab fails the first r2 already (since the second r5 therein never > happens) > > > > Is this a deliberate choice, a limitation of the grammar > > expressiveness, some misuse on my side that could/should/needs to be > > implemented in a different way, or does it count as a "bug" on the > > lpeg side? > > > > For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just > > because "b+" would eat all three "b"s at once (the regexp "b+b" in > > fact finds "bbb", and I would expect a less-than-totally-greedy hit > > with lpeg as well). Or is my reasoning wrong here? > > PEGs are greedy by design, which is a consequence of the fact that PEGS do > not backtrack, which goes back to the underlying assumptive rule of PEGs > that there is one (and only one!) ‘correct’ way to parse the input. > Allowing backtracking destroys that assumption and by doing so would > complicate the system to a level that would make it comparable to PCRE > (with all the associated penalties on processing speed and a much greater > codebase). > greedy vs non-greedy is one of the things that I always keep in mind when I start with lpeg, and regularly I fail to apply -- because I think in the "perl regex way". Anyway, http://www.gammon.com.au/lpeg has some good lines: e.g. this one (from the lpeg site) find the pattern anywhere in the line: function anywhere (p) return lpeg.P { p + 1 * lpeg.V(1) } end print (lpeg.match (anywhere ("dog"), target)) -- luigi