Off-topic: Struggles with LPEG grammar

From: Mojca Miklavec <mojca.miklavec.lists@gmail.com>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Off-topic: Struggles with LPEG grammar
Date: Mon, 21 Dec 2020 13:16:31 +0100	[thread overview]
Message-ID: <CALBOmsbPGuSSRnmBkmEbnw9WYcYJAUkR-0rK99fK9YP3RsUN7A@mail.gmail.com> (raw)

Hi,

I'm sorry for being slightly off-topic here, but this list might still
be the best place to resolve lpeg-related questions :)

0.) Disclaimer: the challenge that triggered this curiosity came from
Advent of Code 2020. In case you are taking part and you wan't to
avoid spoilers, please stop reading here! (You have been warned.)
    https://adventofcode.com/2020/day/19

1.) My question: I don't understand why I cannot get ^1 to work "as
advertised". Isn't this supposed to mean "one or more occurences of
the pattern"? If I change "lpeg.P('b')" into "lpeg.P('b')^1" in the
example below, the strings that match the initial grammar no longer
match the modified grammar. (I would naively imagine that the secord
pattern would get more rather than less matches.)

2.) Background: Most definitely the task on that page is supposed to
be solved in a different way, but many people use Advent of Code as an
opportunity to learn a new programming language, and when I read the
task description, I wanted to figure out if I could solve it using the
cute little lpeg. My initial attempt worked correctly (at least to
solve the first puzzle), but then I realized that I cannot easily
change the pattern from "matches a letter b" into "matches any number
of b-s", and I fail to figure out why. Any hints would be greatly
appreciated.

Below is a not-so-minimal example. I can certainly try to reduce it
further, but I would first like to ask whether I'm doing something
obviously wrong by trying to replace
    r5 = lpeg.P('b')
by
    r5 = lpeg.P('b')^1
in order to allow more than one occurrences of the letter b?
My only explanation would be that perhaps "^1" is so greedy that the
rest of the pattern doesn't get found. But I don't want to believe
that explanation.

local lpeg = require "lpeg"

--[[
0: 4 1 5
1: 2 3 | 3 2
2: 4 4 | 5 5
3: 4 5 | 5 4
4: "a"
5: "b"
]]--

local parser = lpeg.P{
    "r0";
    r0 = lpeg.V"r4" * lpeg.V"r1" * lpeg.V"r5",
    r1 = lpeg.V"r2" * lpeg.V"r3" + lpeg.V"r3" * lpeg.V"r2",
    r2 = lpeg.V"r4" * lpeg.V"r4" + lpeg.V"r5" * lpeg.V"r5",
    r3 = lpeg.V"r4" * lpeg.V"r5" + lpeg.V"r5" * lpeg.V"r4",
    r4 = lpeg.P('a'),
    r5 = lpeg.P('b'),
} * -1

local parser1 = lpeg.P{
    "r0";
    r0 = lpeg.V"r4" * lpeg.V"r1" * lpeg.V"r5",
    r1 = lpeg.V"r2" * lpeg.V"r3" + lpeg.V"r3" * lpeg.V"r2",
    r2 = lpeg.V"r4" * lpeg.V"r4" + lpeg.V"r5" * lpeg.V"r5",
    r3 = lpeg.V"r4" * lpeg.V"r5" + lpeg.V"r5" * lpeg.V"r4",
    r4 = lpeg.P('a'),
    r5 = lpeg.P('b')^1, -- modified part that doesn't seem to work
  } * -1

strings = {
  "ababbb",
  "bababa",
  "abbbab",
  "aaabbb",
  "aaaabbb",
};

local total = 0
local total1 = 0
for _, s in ipairs(strings) do
    if lpeg.match(parser, s) then
        total = total + 1
    end
    if lpeg.match(parser1, s) then
        total = total + 1
    end
end
print('total:', total, total1)

In this example, total=2, total1=0.
What I don't understand is why total1 is zero.

Thank you,
    Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________