From: Taco Hoekwater <taco@elvenkind.com>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Off-topic: Struggles with LPEG grammar
Date: Mon, 21 Dec 2020 14:36:35 +0100 [thread overview]
Message-ID: <2FDD073C-DC12-4235-BBDC-0A5F4EE6B1F7@elvenkind.com> (raw)
In-Reply-To: <CALBOmsYx45L+SQLbFz3gW6MxMQa4mu-gJWbdL2iMXWbRkpA7mw@mail.gmail.com>
> On 21 Dec 2020, at 14:08, Mojca Miklavec <mojca.miklavec.lists@gmail.com> wrote:
>
> Dear Taco,
>
> On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
>>> On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
>>>
>>> My only explanation would be that perhaps "^1" is so greedy that the
>>> rest of the pattern doesn't get found. But I don't want to believe
>>> that explanation.
>>
>> Which (of course) means that that is exactly what happens ;)
>>
>> The ones that match are
>>
>> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
>> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
>>
>> With the ^1, in the “bb” cases the first “b” eats all three “b”s:
>>
>> ababbb fails the r5 at the end
>>
>> abbbab fails the first r2 already (since the second r5 therein never happens)
>
> Is this a deliberate choice, a limitation of the grammar
> expressiveness, some misuse on my side that could/should/needs to be
> implemented in a different way, or does it count as a "bug" on the
> lpeg side?
>
> For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
> because "b+" would eat all three "b"s at once (the regexp "b+b" in
> fact finds "bbb", and I would expect a less-than-totally-greedy hit
> with lpeg as well). Or is my reasoning wrong here?
PEGs are greedy by design, which is a consequence of the fact that PEGS do not backtrack, which goes back to the underlying assumptive rule of PEGs that there is one (and only one!) ‘correct’ way to parse the input. Allowing backtracking destroys that assumption and by doing so would complicate the system to a level that would make it comparable to PCRE (with all the associated penalties on processing speed and a much greater codebase).
Another way of thinking of it (perhaps that helps): PEGs are _declarative_ on the input, whereas REs are _interpretive_.
Yet another way of thinking about it: PEGs rigorously define the (programming) language of the input.
It takes a bit of rewiring your brain to get comfortable with PEGs when you are used to REs, and unpredictable input is much harder to tackle in PEGs. OTOH, using PEGs are much better if there is an explicit grammar.
In your example, the fact that you even considered ^1 means that you were still thinking too much in terms of regular expressions, but I know it is very hard to learn how to do something that is _only a little_ different from something you know well already.
Best wishes,
Taco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
next prev parent reply other threads:[~2020-12-21 13:36 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-21 12:16 Mojca Miklavec
2020-12-21 12:46 ` Taco Hoekwater
2020-12-21 12:50 ` Taco Hoekwater
2020-12-21 13:08 ` Mojca Miklavec
2020-12-21 13:36 ` Taco Hoekwater [this message]
2020-12-21 13:47 ` luigi scarso
2020-12-21 14:10 ` Mojca Miklavec
2020-12-21 14:18 ` Taco Hoekwater
2020-12-21 14:44 ` Hans Hagen
2020-12-21 13:38 ` Arthur Reutenauer
2020-12-21 13:52 ` Hans Hagen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2FDD073C-DC12-4235-BBDC-0A5F4EE6B1F7@elvenkind.com \
--to=taco@elvenkind.com \
--cc=ntg-context@ntg.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).