ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Taco Hoekwater <taco@elvenkind.com>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Off-topic: Struggles with LPEG grammar
Date: Mon, 21 Dec 2020 14:36:35 +0100	[thread overview]
Message-ID: <2FDD073C-DC12-4235-BBDC-0A5F4EE6B1F7@elvenkind.com> (raw)
In-Reply-To: <CALBOmsYx45L+SQLbFz3gW6MxMQa4mu-gJWbdL2iMXWbRkpA7mw@mail.gmail.com>



> On 21 Dec 2020, at 14:08, Mojca Miklavec <mojca.miklavec.lists@gmail.com> wrote:
> 
> Dear Taco,
> 
> On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
>>> On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
>>> 
>>> My only explanation would be that perhaps "^1" is so greedy that the
>>> rest of the pattern doesn't get found. But I don't want to believe
>>> that explanation.
>> 
>> Which (of course) means that that is exactly what happens ;)
>> 
>> The ones that match are
>> 
>> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
>> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
>> 
>> With the ^1, in the “bb” cases the first “b” eats all three “b”s:
>> 
>> ababbb fails the r5 at the end
>> 
>> abbbab fails the first r2 already (since the second r5 therein never happens)
> 
> Is this a deliberate choice, a limitation of the grammar
> expressiveness, some misuse on my side that could/should/needs to be
> implemented in a different way, or does it count as a "bug" on the
> lpeg side?
> 
> For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
> because "b+" would eat all three "b"s at once (the regexp "b+b" in
> fact finds "bbb", and I would expect a less-than-totally-greedy hit
> with lpeg as well). Or is my reasoning wrong here?

PEGs are greedy by design, which is a consequence of the fact that PEGS do not backtrack, which goes back to the underlying assumptive rule of PEGs that there is one (and only one!) ‘correct’ way to parse the input. Allowing backtracking destroys that assumption and by doing so would complicate the system to a level that would make it comparable to PCRE (with all the associated penalties on processing speed and a much greater codebase).

Another way of thinking of it (perhaps that helps): PEGs are _declarative_ on the input, whereas REs are _interpretive_.

Yet another way of thinking about it: PEGs rigorously define the (programming) language of the input.

It takes a bit of rewiring your brain to get comfortable with PEGs when you are used to REs, and unpredictable input is much harder to tackle in PEGs. OTOH, using PEGs are much better if there is an explicit grammar.

In your example, the fact that you even considered ^1 means that you were still thinking too much in terms of regular expressions, but I know it is very hard to learn how to do something that is _only a little_ different from something you know well already.

Best wishes,
Taco




___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2020-12-21 13:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-21 12:16 Mojca Miklavec
2020-12-21 12:46 ` Taco Hoekwater
2020-12-21 12:50   ` Taco Hoekwater
2020-12-21 13:08   ` Mojca Miklavec
2020-12-21 13:36     ` Taco Hoekwater [this message]
2020-12-21 13:47       ` luigi scarso
2020-12-21 14:10       ` Mojca Miklavec
2020-12-21 14:18         ` Taco Hoekwater
2020-12-21 14:44         ` Hans Hagen
2020-12-21 13:38     ` Arthur Reutenauer
2020-12-21 13:52 ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2FDD073C-DC12-4235-BBDC-0A5F4EE6B1F7@elvenkind.com \
    --to=taco@elvenkind.com \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).