Gnus development mailing list
 help / color / mirror / Atom feed
* Article size limit for emphasis and buttonization
@ 1999-11-16 10:15 Hrvoje Niksic
  1999-11-16 10:49 ` Jan Vroonhof
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Hrvoje Niksic @ 1999-11-16 10:15 UTC (permalink / raw)


I've just received a huge email on xemacs-patches, and it took several
seconds to display -- that on my brand new Pentium 3, in an XEmacs
built with all optimizations and no debugging!  I can only imagine how
long it must take on older hardware.  The mail was very large, but not
terribly so; it was some 600K.  However it was all text/plain so
virtually all of it was displayed.

Does it have to be that slow?  I turned on profiling and started
playing.  Here are the results I got:

    Function Name                  Ticks    %/Total   Call Count
    ===========================    =====    =======   ==========
    re-search-forward              609      86.383    303
    (in garbage collection)        38       5.390     
    re-search-backward             30       4.255     6
    mm-decode-coding-region        11       1.560     2
    insert-buffer-substring        5        0.709     55

    (One tick is supposed to be one millisecond, but all of it took
    *much* more than its sum of milliseconds.  Call it gestalt.:-) )

A little investigation showed that most of the time lost in
re-search-forward is due to emphasis analysis and its beautiful
regexp[1].

When I set gnus-treat-emphasize to nil, I got this:

    Function Name                  Ticks    %/Total   Call Count
    ===========================    =====    =======   ==========
    re-search-forward              115      55.556    255
    (in garbage collection)        39       18.841    
    re-search-backward             31       14.976    4
    mm-decode-coding-region        11       5.314     2

Needless to say, this ran several times faster in real-time.  Still a
bit too slow for my taste, though.  The rest of the re-search-forwards
are due to buttonization.  When I set gnus-treat-buttonize to nil, I
got this:

    Function Name               Ticks    %/Total   Call Count
    ========================    =====    =======   ==========
    (in garbage collection)     39       32.500    
    re-search-backward          30       25.000    4
    re-search-forward           25       20.833    193
    mm-decode-coding-region     13       10.833    2
    insert-buffer-substring     5        4.167     55

I didn't bother with further optimization (like temporarily turning
off gc or researching where the remaining 193 re-search-forward come
from) because I found this timing perfectly acceptable for the article
of that size.

To get to my point.  Now that emphasis is on by default, users will
perceive that Gnus has become much slower, in fact extremely slow, at
processing large articles, for no (apparent) cause.

Now that we have the infrastructure to specify the part size limit on
a per-wash-function-basis, we should definitely do so by default.  If
the part is larger than 50K or so, our regexp-based emphasis becomes
really slow.  Buttonization is the next source of slowness, but it can
be safely limited to 100K or so.

Comments?



[1]
The regexp in question is in fact a list of beautiful regexps, that
looks like this:

(("\\(\\s-\\|^\\)\\(_\\(\\(\\w\\|_[^_]\\)+\\)_\\)\\(\\s-\\|[?!.,;]\\)" 2 3 gnus-emphasis-underline) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-italic) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(\\*\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)\\*\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-bold) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline-italic) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_\\*\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)\\*_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline-bold) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(\\*/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/\\*\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-bol
 d!
!
-italic) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_\\*/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/\\*_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline-bold-italic))


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Article size limit for emphasis and buttonization
  1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
@ 1999-11-16 10:49 ` Jan Vroonhof
  1999-11-16 10:59   ` Hrvoje Niksic
  1999-11-16 11:32 ` Tibor Simko
  1999-11-16 17:02 ` Lars Magne Ingebrigtsen
  2 siblings, 1 reply; 7+ messages in thread
From: Jan Vroonhof @ 1999-11-16 10:49 UTC (permalink / raw)


Hrvoje Niksic <hniksic@iskon.hr> writes:

> I've just received a huge email on xemacs-patches, and it took several
> seconds to display -- that on my brand new Pentium 3, in an XEmacs
> built with all optimizations and no debugging!

If are talking about Martin's "man & Make" patch

34 seconds (Ultra 1, with 64 MB)

> The regexp in question is in fact a list of beautiful regexps, that
> looks like this:
> 
>
> (("\\(\\s-\\|^\\)\\(_\\(\\(\\w\\|_[^_]\\)+\\)_\\)\\(\\s-\\|[?!.,;]\\)" 2 3 gnus-emphasis-underline)

Is there any chance this stuff can be sped up. For instance with
non-greedy  regexps, optimized or not using regexps at all?
Doesn't Gnus have to scan over the buffer anyway? So why are these so
much slower?

Since this is all user eye candy: It really should do stuff like this
lazily (yes, I know you hate that).

Jan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Article size limit for emphasis and buttonization
  1999-11-16 10:49 ` Jan Vroonhof
@ 1999-11-16 10:59   ` Hrvoje Niksic
  0 siblings, 0 replies; 7+ messages in thread
From: Hrvoje Niksic @ 1999-11-16 10:59 UTC (permalink / raw)


Jan Vroonhof <vroonhof@math.ethz.ch> writes:

> Hrvoje Niksic <hniksic@iskon.hr> writes:
> 
> > I've just received a huge email on xemacs-patches, and it took several
> > seconds to display -- that on my brand new Pentium 3, in an XEmacs
> > built with all optimizations and no debugging!
> 
> If are talking about Martin's "man & Make" patch

Yes.

> 34 seconds (Ultra 1, with 64 MB)

How long does it take after turning off the emphasis and buttonization
treatments?

> > The regexp in question is in fact a list of beautiful regexps, that
> > looks like this:
> > 
> >
> > (("\\(\\s-\\|^\\)\\(_\\(\\(\\w\\|_[^_]\\)+\\)_\\)\\(\\s-\\|[?!.,;]\\)" 2 3 gnus-emphasis-underline)
> 
> Is there any chance this stuff can be sped up. For instance with
> non-greedy regexps, optimized or not using regexps at all?

I thought about that some time ago, but I concluded it would be too
much work.  I imagined playing with non-regexp specifications of
faces using sexp, e.g. (?* sentence ?*), so that you can optimize by
searching for asterisks with `search-forward', and then seeing what
can be done, etc.

But after thinking more about it, I concluded that:

a) Elisp is slow as hell, which made me wonder if doing it would be
   any faster than the current regexp approach (but OTOH the regexps
   were *less* ugly at the time.)

b) It's too much work.  If it was to be useful, it would have to
   support the equivalent of regexp * operator, and such.  There is no
   way it can be fast in Elisp.

> Doesn't Gnus have to scan over the buffer anyway? So why are these
> so much slower?

I don't understand this.  Much slower than what?

> Since this is all user eye candy: It really should do stuff like
> this lazily (yes, I know you hate that).

I don't hate it if it's done right.  I haven't seen it done right,
yet.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Article size limit for emphasis and buttonization
  1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
  1999-11-16 10:49 ` Jan Vroonhof
@ 1999-11-16 11:32 ` Tibor Simko
  1999-11-16 11:39   ` Hrvoje Niksic
  1999-11-16 17:02 ` Lars Magne Ingebrigtsen
  2 siblings, 1 reply; 7+ messages in thread
From: Tibor Simko @ 1999-11-16 11:32 UTC (permalink / raw)
  Cc: ding

>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@iskon.hr> writes:

    Hrvoje> Does it have to be that slow?

No.

    Hrvoje> set gnus-treat-emphasize [...] gnus-treat-buttonize to nil

Have you tried to set them to an integer like 5000?  Thw two variables
accept integer arguments (``Do this treatment on all body parts that
have a length less than this number'' says the manual).

cheers,

-TS


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Article size limit for emphasis and buttonization
  1999-11-16 11:32 ` Tibor Simko
@ 1999-11-16 11:39   ` Hrvoje Niksic
  1999-11-16 12:38     ` Tibor Simko
  0 siblings, 1 reply; 7+ messages in thread
From: Hrvoje Niksic @ 1999-11-16 11:39 UTC (permalink / raw)


Tibor Simko <tibor.simko@cern.ch> writes:

> >>>>> "Hrvoje" == Hrvoje Niksic <hniksic@iskon.hr> writes:
> 
>     Hrvoje> Does it have to be that slow?
> 
> No.

Really?  I assume you have an implementation that makes it faster
without turning it off.  Could you show it?

>     Hrvoje> set gnus-treat-emphasize [...] gnus-treat-buttonize to nil
> 
> Have you tried to set them to an integer like 5000?

I know that.  :-)

Have you *read* my message?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Article size limit for emphasis and buttonization
  1999-11-16 11:39   ` Hrvoje Niksic
@ 1999-11-16 12:38     ` Tibor Simko
  0 siblings, 0 replies; 7+ messages in thread
From: Tibor Simko @ 1999-11-16 12:38 UTC (permalink / raw)
  Cc: ding

>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@iskon.hr> writes:

    Hrvoje> I assume you have an implementation that makes it faster
    Hrvoje> without turning it off.  

Nope :) I was just "answering" your hypothetical question that came
after reporting the default user point of view (i.e. when using `t').

    Hrvoje> Have you *read* my message?

Yes, I have: you were afraid that people would find pgnus slow with
default settings and this is why I was sort of seconding your
suggestion to set these gnus-treat-* variables to some reasonable
defaults like 5000 I'm using.  But since you have not reported any
tests for intermediate values between t and nil, I supposed that you
had not actually tried that! :).  Sorry.

cheers,

-TS


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Article size limit for emphasis and buttonization
  1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
  1999-11-16 10:49 ` Jan Vroonhof
  1999-11-16 11:32 ` Tibor Simko
@ 1999-11-16 17:02 ` Lars Magne Ingebrigtsen
  2 siblings, 0 replies; 7+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-11-16 17:02 UTC (permalink / raw)


Hrvoje Niksic <hniksic@iskon.hr> writes:

> Now that we have the infrastructure to specify the part size limit on
> a per-wash-function-basis, we should definitely do so by default.  If
> the part is larger than 50K or so, our regexp-based emphasis becomes
> really slow.  Buttonization is the next source of slowness, but it can
> be safely limited to 100K or so.

Ok; I've now changed the defaults for these two treatments.

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~1999-11-16 17:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
1999-11-16 10:49 ` Jan Vroonhof
1999-11-16 10:59   ` Hrvoje Niksic
1999-11-16 11:32 ` Tibor Simko
1999-11-16 11:39   ` Hrvoje Niksic
1999-11-16 12:38     ` Tibor Simko
1999-11-16 17:02 ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).