Gnus development mailing list
 help / color / mirror / Atom feed
From: Peter Seibel <peter@javamonkey.com>
Cc: ding@gnus.org
Subject: Re: nnmail-split-header-length-limit is EVIL!
Date: 31 Mar 1999 18:54:48 -0800	[thread overview]
Message-ID: <m31zi5xd6v.fsf@ppp-asfm12--027.sirius.net> (raw)
In-Reply-To: Dale Hagglund's message of "30 Mar 1999 23:17:07 -0800"

>>>>> "Dale" == Dale Hagglund <rdh@best.com> writes:

    Dale> Greg Stark <gsstark@mit.edu> writes:
    >> Could anyone actually explain why emacs has any trouble running a
    >> regexp over a few k of text? 

    Dale> As far as I know, this usually results from regexps that
    Dale> have lots of potential backtracking.  You end up with a
    Dale> situation where the match-time can be exponential in the
    Dale> length of the string being matched.

    Dale> I haven't seen any examples of complete regular expressions
    Dale> that cause the problems we're talking about here, but since
    Dale> they included at least some user-specified sub-regexps, it
    Dale> might be hard to address the problem from the regexp side.
    Dale> Do we have any known regexp/string pairs that deonstrate the
    Dale> problem?

Try eval'ing the expression below with the data line below it. It'll
run quickly as is. Then add a's on the end of one or two at a time and
watch it slow down. Seriously, don't add to many a's all at once or
you'll hang your emacs. C-g doesn't get it out--at least for me. But
the first few you add may not have any noticible effect--combinatorial
explosion never looks bad in the beginning.

(search-forward-regexp "\\(.*\\)*x")

xaaaaaa

The problem is that the * is greedy so after the . matches the 'x'
it'll match everything else in the line. Then the '.' doesn't match
the end of the line but eol isn't an 'x' either so the regex engine
backs up one from the end of the line and in the state of having
matched a bunch of stuff (everything except the last 'a' in the line)
with the '.*' inside the parens and having matched the whole
parenthesized pattern once, it then tries to finish matching by
matching the parenthesized expression twice which it can do by
matching the last 'a' with the second time through the parenthesized
expression. But it still fail to match an x (since it's now at the end
of the line) so it backtracks again, backing up one more character on
the first pass through and then matches the second time though,
matching two characters this time and then bombing out at when it hits
eol. That sort of backtracking and then matching forward and failing
goes on for quite a while until the whole thing succeeds by matching
zero characters zero times to satisfy the '\\(.*\\)*' and then matches
the 'x' satisfying the regex as a whole. The O'Reilly book, _Mastering
Regular Expressions_ by Jeffrey Friedl explains this sort of stuff
quite well. (He also wrote an article in The Perl Journal about this
particular problem of combinatorial explosions in regexps.

-Peter

-- 
Peter Seibel        Perl/Java/English Hacker        peter@javamonkey.com

  There's no good culture without a dash of bad taste; a monopoly of
  good taste suggests restraint -- you're not pushing the envelope.

                                        -- Jean-Louis Gassee


  reply	other threads:[~1999-04-01  2:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-03-02 10:57 Hrvoje Niksic
1999-03-06 18:16 ` Lars Magne Ingebrigtsen
1999-03-07 13:27   ` Hrvoje Niksic
1999-03-14 15:53     ` Lars Magne Ingebrigtsen
1999-03-16  7:25       ` Hrvoje Niksic
1999-03-28 15:09         ` Lars Magne Ingebrigtsen
1999-03-29 20:20           ` Hans de Graaff
1999-04-02 14:03             ` Lars Magne Ingebrigtsen
1999-04-03  7:00               ` Hans de Graaff
1999-04-17  6:16                 ` Lars Magne Ingebrigtsen
1999-04-13  7:04               ` Hrvoje Niksic
1999-04-17  6:15                 ` Lars Magne Ingebrigtsen
1999-03-30  6:05           ` Dale Hagglund
1999-03-31  3:02             ` Greg Stark
1999-03-31  7:17               ` Dale Hagglund
1999-04-01  2:54                 ` Peter Seibel [this message]
1999-04-01  7:02                   ` Dale Hagglund
1999-04-01  7:13                     ` Peter Seibel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m31zi5xd6v.fsf@ppp-asfm12--027.sirius.net \
    --to=peter@javamonkey.com \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).