From: Peter Seibel <peter@javamonkey.com>
Cc: ding@gnus.org
Subject: Re: nnmail-split-header-length-limit is EVIL!
Date: 31 Mar 1999 18:54:48 -0800 [thread overview]
Message-ID: <m31zi5xd6v.fsf@ppp-asfm12--027.sirius.net> (raw)
In-Reply-To: Dale Hagglund's message of "30 Mar 1999 23:17:07 -0800"
>>>>> "Dale" == Dale Hagglund <rdh@best.com> writes:
Dale> Greg Stark <gsstark@mit.edu> writes:
>> Could anyone actually explain why emacs has any trouble running a
>> regexp over a few k of text?
Dale> As far as I know, this usually results from regexps that
Dale> have lots of potential backtracking. You end up with a
Dale> situation where the match-time can be exponential in the
Dale> length of the string being matched.
Dale> I haven't seen any examples of complete regular expressions
Dale> that cause the problems we're talking about here, but since
Dale> they included at least some user-specified sub-regexps, it
Dale> might be hard to address the problem from the regexp side.
Dale> Do we have any known regexp/string pairs that deonstrate the
Dale> problem?
Try eval'ing the expression below with the data line below it. It'll
run quickly as is. Then add a's on the end of one or two at a time and
watch it slow down. Seriously, don't add to many a's all at once or
you'll hang your emacs. C-g doesn't get it out--at least for me. But
the first few you add may not have any noticible effect--combinatorial
explosion never looks bad in the beginning.
(search-forward-regexp "\\(.*\\)*x")
xaaaaaa
The problem is that the * is greedy so after the . matches the 'x'
it'll match everything else in the line. Then the '.' doesn't match
the end of the line but eol isn't an 'x' either so the regex engine
backs up one from the end of the line and in the state of having
matched a bunch of stuff (everything except the last 'a' in the line)
with the '.*' inside the parens and having matched the whole
parenthesized pattern once, it then tries to finish matching by
matching the parenthesized expression twice which it can do by
matching the last 'a' with the second time through the parenthesized
expression. But it still fail to match an x (since it's now at the end
of the line) so it backtracks again, backing up one more character on
the first pass through and then matches the second time though,
matching two characters this time and then bombing out at when it hits
eol. That sort of backtracking and then matching forward and failing
goes on for quite a while until the whole thing succeeds by matching
zero characters zero times to satisfy the '\\(.*\\)*' and then matches
the 'x' satisfying the regex as a whole. The O'Reilly book, _Mastering
Regular Expressions_ by Jeffrey Friedl explains this sort of stuff
quite well. (He also wrote an article in The Perl Journal about this
particular problem of combinatorial explosions in regexps.
-Peter
--
Peter Seibel Perl/Java/English Hacker peter@javamonkey.com
There's no good culture without a dash of bad taste; a monopoly of
good taste suggests restraint -- you're not pushing the envelope.
-- Jean-Louis Gassee
next prev parent reply other threads:[~1999-04-01 2:54 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
1999-03-02 10:57 Hrvoje Niksic
1999-03-06 18:16 ` Lars Magne Ingebrigtsen
1999-03-07 13:27 ` Hrvoje Niksic
1999-03-14 15:53 ` Lars Magne Ingebrigtsen
1999-03-16 7:25 ` Hrvoje Niksic
1999-03-28 15:09 ` Lars Magne Ingebrigtsen
1999-03-29 20:20 ` Hans de Graaff
1999-04-02 14:03 ` Lars Magne Ingebrigtsen
1999-04-03 7:00 ` Hans de Graaff
1999-04-17 6:16 ` Lars Magne Ingebrigtsen
1999-04-13 7:04 ` Hrvoje Niksic
1999-04-17 6:15 ` Lars Magne Ingebrigtsen
1999-03-30 6:05 ` Dale Hagglund
1999-03-31 3:02 ` Greg Stark
1999-03-31 7:17 ` Dale Hagglund
1999-04-01 2:54 ` Peter Seibel [this message]
1999-04-01 7:02 ` Dale Hagglund
1999-04-01 7:13 ` Peter Seibel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m31zi5xd6v.fsf@ppp-asfm12--027.sirius.net \
--to=peter@javamonkey.com \
--cc=ding@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).