Gnus development mailing list
 help / color / mirror / Atom feed
* nnmail-split-header-length-limit is EVIL!
@ 1999-03-02 10:57 Hrvoje Niksic
  1999-03-06 18:16 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 18+ messages in thread
From: Hrvoje Niksic @ 1999-03-02 10:57 UTC (permalink / raw)


My fancy mail splitting rules have a simple spam protection in the
form of:

(setq nnmail-split-fancy
      '(
        ... many rules here ...
        ("to\\|cc" "hniksic\\|niksic" "private")
        "spam"))

However, `nnmail-split-header-length-limit' feature does me the
"service" of removing the headers longer than 512 bytes, which in my
case means that mail from friends that happens to be Cc'ed to a bunch
of other friends gets thrown into spamhole.  Very nice, da?

The source explains that this is done because Gnus uses
"pathologically complex regexps" in the buffer.  So I thought, "Hey!
Why not write my own split rule that simply *searches* the buffer for
the appropriate header, and then searches for my name?  No regexps, no 
nothing!"

However, that *doesn't work* because Gnus *always* clips the header to
512 bytes, no matter what splitting rule is applied.  And that's evil,
evil, EVIL because it leaves me *no* way to match my headers
correctly.  And I found it only *after* I wrote this beautiful code:

(setq nnmail-split-fancy
      '(
        ... many rules here ...
        (: (lambda ()
             (save-restriction
               (let ((case-fold-search t))
                 (goto-char (point-min))
                 (re-search-forward "^$")
                 (narrow-to-region (point-min) (match-beginning 0))
                 (goto-char (point-min))
                 (or (catch 'found
                       (let (limit)
                         (when (re-search-forward "^\\(To\\|Cc\\):" nil t)
                           ;; Handle continuation
                           (save-excursion
                             (while (progn (forward-line 1)
                                           (memq (char-after) '(?\  ?\t))))
                             (setq limit (point)))
                           (when (search-forward "niksic" limit t)
                             (throw 'found "private")))))
                     "spam")))))))

Is there a way to disable length limits for user-provided functions?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-02 10:57 nnmail-split-header-length-limit is EVIL! Hrvoje Niksic
@ 1999-03-06 18:16 ` Lars Magne Ingebrigtsen
  1999-03-07 13:27   ` Hrvoje Niksic
  0 siblings, 1 reply; 18+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-03-06 18:16 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> However, that *doesn't work* because Gnus *always* clips the header to
> 512 bytes, no matter what splitting rule is applied.  And that's evil,
> evil, EVIL because it leaves me *no* way to match my headers
> correctly.

Yes, but you can set `nnmail-split-header-length-limit' to a really
big number.  (And in 0.81 you can set it to nil to avoid the check
altogether.)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-06 18:16 ` Lars Magne Ingebrigtsen
@ 1999-03-07 13:27   ` Hrvoje Niksic
  1999-03-14 15:53     ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 18+ messages in thread
From: Hrvoje Niksic @ 1999-03-07 13:27 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hrvoje Niksic <hniksic@srce.hr> writes:
> 
> > However, that *doesn't work* because Gnus *always* clips the
> > header to 512 bytes, no matter what splitting rule is applied.
> > And that's evil, evil, EVIL because it leaves me *no* way to match
> > my headers correctly.
> 
> Yes, but you can set `nnmail-split-header-length-limit' to a really
> big number.

That won't do because it will unconditionally remove the safety
measure.  I want the long headers not to be removed for my own,
presumably non-regexp-based, functions.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-07 13:27   ` Hrvoje Niksic
@ 1999-03-14 15:53     ` Lars Magne Ingebrigtsen
  1999-03-16  7:25       ` Hrvoje Niksic
  0 siblings, 1 reply; 18+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-03-14 15:53 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> That won't do because it will unconditionally remove the safety
> measure.  I want the long headers not to be removed for my own,
> presumably non-regexp-based, functions.

Uhm, yes.

One thing just occurred to me -- why do I remove the long lines before 
I know whether they are a problem or not?  I could just catch the
regexp error, and then remove the offending lines, and re-run the
match... 

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-14 15:53     ` Lars Magne Ingebrigtsen
@ 1999-03-16  7:25       ` Hrvoje Niksic
  1999-03-28 15:09         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 18+ messages in thread
From: Hrvoje Niksic @ 1999-03-16  7:25 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> One thing just occurred to me -- why do I remove the long lines
> before I know whether they are a problem or not?  I could just catch
> the regexp error, and then remove the offending lines, and re-run
> the match...

Because there is no clearly defined "regexp error".  At least I recall
you mentioning regexps that take hours to match, or something like
that.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-16  7:25       ` Hrvoje Niksic
@ 1999-03-28 15:09         ` Lars Magne Ingebrigtsen
  1999-03-29 20:20           ` Hans de Graaff
  1999-03-30  6:05           ` Dale Hagglund
  0 siblings, 2 replies; 18+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-03-28 15:09 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> Because there is no clearly defined "regexp error".  At least I recall
> you mentioning regexps that take hours to match, or something like
> that.

Oh, I had forgotten that.  Yes, there were two problems with the long
lines -- the regexp could either overflow, or it could just take
forever to run.  I've now reverted to removing the lines again.  :-(

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-28 15:09         ` Lars Magne Ingebrigtsen
@ 1999-03-29 20:20           ` Hans de Graaff
  1999-04-02 14:03             ` Lars Magne Ingebrigtsen
  1999-03-30  6:05           ` Dale Hagglund
  1 sibling, 1 reply; 18+ messages in thread
From: Hans de Graaff @ 1999-03-29 20:20 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hrvoje Niksic <hniksic@srce.hr> writes:
> 
> > Because there is no clearly defined "regexp error".  At least I recall
> > you mentioning regexps that take hours to match, or something like
> > that.
> 
> Oh, I had forgotten that.  Yes, there were two problems with the long
> lines -- the regexp could either overflow, or it could just take
> forever to run.  I've now reverted to removing the lines again.  :-(

I agree with Hrvoje that the latter behavior is evil. Could there not 
be some kind of time-out mechanism which would abort a regexp after a
given period of time and cause a regexp error which could be trapped?

Setting this to 10 or 15 seconds would be quite acceptable, I think.

Hans


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-28 15:09         ` Lars Magne Ingebrigtsen
  1999-03-29 20:20           ` Hans de Graaff
@ 1999-03-30  6:05           ` Dale Hagglund
  1999-03-31  3:02             ` Greg Stark
  1 sibling, 1 reply; 18+ messages in thread
From: Dale Hagglund @ 1999-03-30  6:05 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Oh, I had forgotten that.  Yes, there were two problems with the
> long lines -- the regexp could either overflow, or it could just
> take forever to run.  I've now reverted to removing the lines again.

This slightly crazy idea just occurred to me.  Say the max header
limit is 500, and some particular header line is, say, 1700 bytes.

What about matching the line against the split regexps in *overlapping
substrings* of at most 500 bytes each?  Each substring overlaps the
end of the previous chunk by, for example, 100 bytes.

You'd have to have some code to combine the split-results from each
substring, but that shouldn't be so hard.

I can't think that split strings depend in practice on matching a
substring longer than 100 bytes, but the ovelap amount could be
controlled by a variable as well.  Naturally, this whole slightly
weird behaviour should probably be controlled by yet another variable.

This might allow gnus to handle arbitrarily long header lines in a
more-or-less natural fashion without falling victim to the odd
pathological regular expression.

Is there something stupid I'm missing here?  Does anyone use split
regexps that would fail to work properly in this scheme?

Dale.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-30  6:05           ` Dale Hagglund
@ 1999-03-31  3:02             ` Greg Stark
  1999-03-31  7:17               ` Dale Hagglund
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Stark @ 1999-03-31  3:02 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=NIL, Size: 401 bytes --]


Dale Hagglund <rdh@best.com> writes:

> This slightly crazy idea just occurred to me.  Say the max header
> limit is 500, and some particular header line is, say, 1700 bytes.

Could anyone actually explain why emacs has any trouble running a regexp over
a few k of text? It seems like there's a real bug here and kludging Gnus to
work around it will only let the real bug go unfixed longer.

greg




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-31  3:02             ` Greg Stark
@ 1999-03-31  7:17               ` Dale Hagglund
  1999-04-01  2:54                 ` Peter Seibel
  0 siblings, 1 reply; 18+ messages in thread
From: Dale Hagglund @ 1999-03-31  7:17 UTC (permalink / raw)


Greg Stark <gsstark@mit.edu> writes:

> Could anyone actually explain why emacs has any trouble running a
> regexp over a few k of text? 

As far as I know, this usually results from regexps that have lots of
potential backtracking.  You end up with a situation where the
match-time can be exponential in the length of the string being
matched.

I haven't seen any examples of complete regular expressions that cause
the problems we're talking about here, but since they included at
least some user-specified sub-regexps, it might be hard to address the
problem from the regexp side.  Do we have any known regexp/string
pairs that deonstrate the problem?

> . . . kludging Gnus to work around it will only let the real bug go
> unfixed longer.

I'm all for fixing it right.  My suggested approach is most definitely
a hack, or, in polite company, a heuristic.

Dale.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-31  7:17               ` Dale Hagglund
@ 1999-04-01  2:54                 ` Peter Seibel
  1999-04-01  7:02                   ` Dale Hagglund
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Seibel @ 1999-04-01  2:54 UTC (permalink / raw)
  Cc: ding

>>>>> "Dale" == Dale Hagglund <rdh@best.com> writes:

    Dale> Greg Stark <gsstark@mit.edu> writes:
    >> Could anyone actually explain why emacs has any trouble running a
    >> regexp over a few k of text? 

    Dale> As far as I know, this usually results from regexps that
    Dale> have lots of potential backtracking.  You end up with a
    Dale> situation where the match-time can be exponential in the
    Dale> length of the string being matched.

    Dale> I haven't seen any examples of complete regular expressions
    Dale> that cause the problems we're talking about here, but since
    Dale> they included at least some user-specified sub-regexps, it
    Dale> might be hard to address the problem from the regexp side.
    Dale> Do we have any known regexp/string pairs that deonstrate the
    Dale> problem?

Try eval'ing the expression below with the data line below it. It'll
run quickly as is. Then add a's on the end of one or two at a time and
watch it slow down. Seriously, don't add to many a's all at once or
you'll hang your emacs. C-g doesn't get it out--at least for me. But
the first few you add may not have any noticible effect--combinatorial
explosion never looks bad in the beginning.

(search-forward-regexp "\\(.*\\)*x")

xaaaaaa

The problem is that the * is greedy so after the . matches the 'x'
it'll match everything else in the line. Then the '.' doesn't match
the end of the line but eol isn't an 'x' either so the regex engine
backs up one from the end of the line and in the state of having
matched a bunch of stuff (everything except the last 'a' in the line)
with the '.*' inside the parens and having matched the whole
parenthesized pattern once, it then tries to finish matching by
matching the parenthesized expression twice which it can do by
matching the last 'a' with the second time through the parenthesized
expression. But it still fail to match an x (since it's now at the end
of the line) so it backtracks again, backing up one more character on
the first pass through and then matches the second time though,
matching two characters this time and then bombing out at when it hits
eol. That sort of backtracking and then matching forward and failing
goes on for quite a while until the whole thing succeeds by matching
zero characters zero times to satisfy the '\\(.*\\)*' and then matches
the 'x' satisfying the regex as a whole. The O'Reilly book, _Mastering
Regular Expressions_ by Jeffrey Friedl explains this sort of stuff
quite well. (He also wrote an article in The Perl Journal about this
particular problem of combinatorial explosions in regexps.

-Peter

-- 
Peter Seibel        Perl/Java/English Hacker        peter@javamonkey.com

  There's no good culture without a dash of bad taste; a monopoly of
  good taste suggests restraint -- you're not pushing the envelope.

                                        -- Jean-Louis Gassee


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-04-01  2:54                 ` Peter Seibel
@ 1999-04-01  7:02                   ` Dale Hagglund
  1999-04-01  7:13                     ` Peter Seibel
  0 siblings, 1 reply; 18+ messages in thread
From: Dale Hagglund @ 1999-04-01  7:02 UTC (permalink / raw)


Peter Seibel <peter@javamonkey.com> writes:

> >>>>> "Dale" == Dale Hagglund <rdh@best.com> writes:
> 
>     Dale> I haven't seen any examples of complete regular
>     Dale> expressions that cause the problems we're talking about
>     Dale> here . . . .  Do we have any known regexp/string pairs
>     Dale> that deonstrate the problem?
> 
> Try eval'ing the expression below with the data line below it. It'll
> run quickly as is. Then add a's on the end of one or two at a time and
> watch it slow down.

> (search-forward-regexp "\\(.*\\)*x")
> 
> xaaaaaa
> 
> [A good description of the exponential backracking in the previous
> regexp and a pointer to Friedl's _Mastering Regular Expressions_
> deleted. --rdh] 

Umm, I've read Friedl's book (I agree it's excellent), and I already
understand the basic reason regexps show exponential time behaviour.

I guess I wasn't precise enough.  I was asking for known regexps from
gnus split processing and corresponding header lines that show
exponential behaviour, not a generic example that demonstrates how it
can happen on contrived input.

(I have no problems with your example as such; it nicely demonstrates
how easily exponential matching time can arise.  But, I'd like to see
a regexp/string pair with this problem that shows up in real use of
gnus.)

Dale.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-04-01  7:02                   ` Dale Hagglund
@ 1999-04-01  7:13                     ` Peter Seibel
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Seibel @ 1999-04-01  7:13 UTC (permalink / raw)
  Cc: ding

>>>>> "Dale" == Dale Hagglund <rdh@best.com> writes:

[snip]

    Dale> I guess I wasn't precise enough.  I was asking for known
    Dale> regexps from gnus split processing and corresponding header
    Dale> lines that show exponential behaviour, not a generic example
    Dale> that demonstrates how it can happen on contrived input.

Ah, gotcha. Unfortunately I can't help you there.

-Peter


-- 
Peter Seibel        Perl/Java/English Hacker        peter@javamonkey.com

  There's no good culture without a dash of bad taste; a monopoly of
  good taste suggests restraint -- you're not pushing the envelope.

                                        -- Jean-Louis Gassee


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-03-29 20:20           ` Hans de Graaff
@ 1999-04-02 14:03             ` Lars Magne Ingebrigtsen
  1999-04-03  7:00               ` Hans de Graaff
  1999-04-13  7:04               ` Hrvoje Niksic
  0 siblings, 2 replies; 18+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-04-02 14:03 UTC (permalink / raw)


Hans de Graaff <graaff@xs4all.nl> writes:

> I agree with Hrvoje that the latter behavior is evil. Could there not 
> be some kind of time-out mechanism which would abort a regexp after a
> given period of time and cause a regexp error which could be trapped?

Hm.  Sounds too complicated, I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-04-02 14:03             ` Lars Magne Ingebrigtsen
@ 1999-04-03  7:00               ` Hans de Graaff
  1999-04-17  6:16                 ` Lars Magne Ingebrigtsen
  1999-04-13  7:04               ` Hrvoje Niksic
  1 sibling, 1 reply; 18+ messages in thread
From: Hans de Graaff @ 1999-04-03  7:00 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> > I agree with Hrvoje that the latter behavior is evil. Could there not 
> > be some kind of time-out mechanism which would abort a regexp after a
> > given period of time and cause a regexp error which could be trapped?
> 
> Hm.  Sounds too complicated, I think.

I know just enough lisp to get Gnus to do what I want, so I'm not sure 
how practical this would be, but could you not to the regexp matching
in an asynchronous subprocess, and then either use the result if not
too much time has passed, and otherwise delete the process and move on 
with an error?

Hans


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-04-02 14:03             ` Lars Magne Ingebrigtsen
  1999-04-03  7:00               ` Hans de Graaff
@ 1999-04-13  7:04               ` Hrvoje Niksic
  1999-04-17  6:15                 ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 18+ messages in thread
From: Hrvoje Niksic @ 1999-04-13  7:04 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hans de Graaff <graaff@xs4all.nl> writes:
> 
> > I agree with Hrvoje that the latter behavior is evil. Could there not 
> > be some kind of time-out mechanism which would abort a regexp after a
> > given period of time and cause a regexp error which could be trapped?
> 
> Hm.  Sounds too complicated, I think.

"Too complicated" is not a problem for brave ones.  "It would not
work" is, however.  I don't think timeouts would work while you're in
the regexp code.  XEmacs has `add-async-timeout', but I don't think
it's healthy to signal errors from its callbacks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-04-13  7:04               ` Hrvoje Niksic
@ 1999-04-17  6:15                 ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 18+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-04-17  6:15 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> "Too complicated" is not a problem for brave ones.  "It would not
> work" is, however.  I don't think timeouts would work while you're in
> the regexp code.

"Too complicated" is a euphemism for "it won't work".  :-)  In this
case, I think one would have to delve into the C layer to set up
something like that.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nnmail-split-header-length-limit is EVIL!
  1999-04-03  7:00               ` Hans de Graaff
@ 1999-04-17  6:16                 ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 18+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-04-17  6:16 UTC (permalink / raw)


Hans de Graaff <graaff@xs4all.nl> writes:

> I know just enough lisp to get Gnus to do what I want, so I'm not sure 
> how practical this would be, but could you not to the regexp matching
> in an asynchronous subprocess, and then either use the result if not
> too much time has passed, and otherwise delete the process and move on 
> with an error?

Nope.  The matching is done by `re-search-*', and since Emacs is not a 
multithreaded operating system (yet), that can't be done.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~1999-04-17  6:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-03-02 10:57 nnmail-split-header-length-limit is EVIL! Hrvoje Niksic
1999-03-06 18:16 ` Lars Magne Ingebrigtsen
1999-03-07 13:27   ` Hrvoje Niksic
1999-03-14 15:53     ` Lars Magne Ingebrigtsen
1999-03-16  7:25       ` Hrvoje Niksic
1999-03-28 15:09         ` Lars Magne Ingebrigtsen
1999-03-29 20:20           ` Hans de Graaff
1999-04-02 14:03             ` Lars Magne Ingebrigtsen
1999-04-03  7:00               ` Hans de Graaff
1999-04-17  6:16                 ` Lars Magne Ingebrigtsen
1999-04-13  7:04               ` Hrvoje Niksic
1999-04-17  6:15                 ` Lars Magne Ingebrigtsen
1999-03-30  6:05           ` Dale Hagglund
1999-03-31  3:02             ` Greg Stark
1999-03-31  7:17               ` Dale Hagglund
1999-04-01  2:54                 ` Peter Seibel
1999-04-01  7:02                   ` Dale Hagglund
1999-04-01  7:13                     ` Peter Seibel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).