* Article size limit for emphasis and buttonization
@ 1999-11-16 10:15 Hrvoje Niksic
1999-11-16 10:49 ` Jan Vroonhof
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Hrvoje Niksic @ 1999-11-16 10:15 UTC (permalink / raw)
I've just received a huge email on xemacs-patches, and it took several
seconds to display -- that on my brand new Pentium 3, in an XEmacs
built with all optimizations and no debugging! I can only imagine how
long it must take on older hardware. The mail was very large, but not
terribly so; it was some 600K. However it was all text/plain so
virtually all of it was displayed.
Does it have to be that slow? I turned on profiling and started
playing. Here are the results I got:
Function Name Ticks %/Total Call Count
=========================== ===== ======= ==========
re-search-forward 609 86.383 303
(in garbage collection) 38 5.390
re-search-backward 30 4.255 6
mm-decode-coding-region 11 1.560 2
insert-buffer-substring 5 0.709 55
(One tick is supposed to be one millisecond, but all of it took
*much* more than its sum of milliseconds. Call it gestalt.:-) )
A little investigation showed that most of the time lost in
re-search-forward is due to emphasis analysis and its beautiful
regexp[1].
When I set gnus-treat-emphasize to nil, I got this:
Function Name Ticks %/Total Call Count
=========================== ===== ======= ==========
re-search-forward 115 55.556 255
(in garbage collection) 39 18.841
re-search-backward 31 14.976 4
mm-decode-coding-region 11 5.314 2
Needless to say, this ran several times faster in real-time. Still a
bit too slow for my taste, though. The rest of the re-search-forwards
are due to buttonization. When I set gnus-treat-buttonize to nil, I
got this:
Function Name Ticks %/Total Call Count
======================== ===== ======= ==========
(in garbage collection) 39 32.500
re-search-backward 30 25.000 4
re-search-forward 25 20.833 193
mm-decode-coding-region 13 10.833 2
insert-buffer-substring 5 4.167 55
I didn't bother with further optimization (like temporarily turning
off gc or researching where the remaining 193 re-search-forward come
from) because I found this timing perfectly acceptable for the article
of that size.
To get to my point. Now that emphasis is on by default, users will
perceive that Gnus has become much slower, in fact extremely slow, at
processing large articles, for no (apparent) cause.
Now that we have the infrastructure to specify the part size limit on
a per-wash-function-basis, we should definitely do so by default. If
the part is larger than 50K or so, our regexp-based emphasis becomes
really slow. Buttonization is the next source of slowness, but it can
be safely limited to 100K or so.
Comments?
[1]
The regexp in question is in fact a list of beautiful regexps, that
looks like this:
(("\\(\\s-\\|^\\)\\(_\\(\\(\\w\\|_[^_]\\)+\\)_\\)\\(\\s-\\|[?!.,;]\\)" 2 3 gnus-emphasis-underline) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-italic) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(\\*\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)\\*\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-bold) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline-italic) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_\\*\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)\\*_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline-bold) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(\\*/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/\\*\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-bol
d!
!
-italic) ("\\(\\s-\\|^\\|[-\"]\\|\\s(\\|\\s)\\)\\(_\\*/\\(\\w+\\(\\s-+\\w+\\)*[.,]?\\)/\\*_\\)\\(\\s-\\|[-?!.,;:\"]\\|\\s(\\|\\s)\\)" 2 3 gnus-emphasis-underline-bold-italic))
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Article size limit for emphasis and buttonization
1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
@ 1999-11-16 10:49 ` Jan Vroonhof
1999-11-16 10:59 ` Hrvoje Niksic
1999-11-16 11:32 ` Tibor Simko
1999-11-16 17:02 ` Lars Magne Ingebrigtsen
2 siblings, 1 reply; 7+ messages in thread
From: Jan Vroonhof @ 1999-11-16 10:49 UTC (permalink / raw)
Hrvoje Niksic <hniksic@iskon.hr> writes:
> I've just received a huge email on xemacs-patches, and it took several
> seconds to display -- that on my brand new Pentium 3, in an XEmacs
> built with all optimizations and no debugging!
If are talking about Martin's "man & Make" patch
34 seconds (Ultra 1, with 64 MB)
> The regexp in question is in fact a list of beautiful regexps, that
> looks like this:
>
>
> (("\\(\\s-\\|^\\)\\(_\\(\\(\\w\\|_[^_]\\)+\\)_\\)\\(\\s-\\|[?!.,;]\\)" 2 3 gnus-emphasis-underline)
Is there any chance this stuff can be sped up. For instance with
non-greedy regexps, optimized or not using regexps at all?
Doesn't Gnus have to scan over the buffer anyway? So why are these so
much slower?
Since this is all user eye candy: It really should do stuff like this
lazily (yes, I know you hate that).
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Article size limit for emphasis and buttonization
1999-11-16 10:49 ` Jan Vroonhof
@ 1999-11-16 10:59 ` Hrvoje Niksic
0 siblings, 0 replies; 7+ messages in thread
From: Hrvoje Niksic @ 1999-11-16 10:59 UTC (permalink / raw)
Jan Vroonhof <vroonhof@math.ethz.ch> writes:
> Hrvoje Niksic <hniksic@iskon.hr> writes:
>
> > I've just received a huge email on xemacs-patches, and it took several
> > seconds to display -- that on my brand new Pentium 3, in an XEmacs
> > built with all optimizations and no debugging!
>
> If are talking about Martin's "man & Make" patch
Yes.
> 34 seconds (Ultra 1, with 64 MB)
How long does it take after turning off the emphasis and buttonization
treatments?
> > The regexp in question is in fact a list of beautiful regexps, that
> > looks like this:
> >
> >
> > (("\\(\\s-\\|^\\)\\(_\\(\\(\\w\\|_[^_]\\)+\\)_\\)\\(\\s-\\|[?!.,;]\\)" 2 3 gnus-emphasis-underline)
>
> Is there any chance this stuff can be sped up. For instance with
> non-greedy regexps, optimized or not using regexps at all?
I thought about that some time ago, but I concluded it would be too
much work. I imagined playing with non-regexp specifications of
faces using sexp, e.g. (?* sentence ?*), so that you can optimize by
searching for asterisks with `search-forward', and then seeing what
can be done, etc.
But after thinking more about it, I concluded that:
a) Elisp is slow as hell, which made me wonder if doing it would be
any faster than the current regexp approach (but OTOH the regexps
were *less* ugly at the time.)
b) It's too much work. If it was to be useful, it would have to
support the equivalent of regexp * operator, and such. There is no
way it can be fast in Elisp.
> Doesn't Gnus have to scan over the buffer anyway? So why are these
> so much slower?
I don't understand this. Much slower than what?
> Since this is all user eye candy: It really should do stuff like
> this lazily (yes, I know you hate that).
I don't hate it if it's done right. I haven't seen it done right,
yet.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Article size limit for emphasis and buttonization
1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
1999-11-16 10:49 ` Jan Vroonhof
@ 1999-11-16 11:32 ` Tibor Simko
1999-11-16 11:39 ` Hrvoje Niksic
1999-11-16 17:02 ` Lars Magne Ingebrigtsen
2 siblings, 1 reply; 7+ messages in thread
From: Tibor Simko @ 1999-11-16 11:32 UTC (permalink / raw)
Cc: ding
>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@iskon.hr> writes:
Hrvoje> Does it have to be that slow?
No.
Hrvoje> set gnus-treat-emphasize [...] gnus-treat-buttonize to nil
Have you tried to set them to an integer like 5000? Thw two variables
accept integer arguments (``Do this treatment on all body parts that
have a length less than this number'' says the manual).
cheers,
-TS
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Article size limit for emphasis and buttonization
1999-11-16 11:32 ` Tibor Simko
@ 1999-11-16 11:39 ` Hrvoje Niksic
1999-11-16 12:38 ` Tibor Simko
0 siblings, 1 reply; 7+ messages in thread
From: Hrvoje Niksic @ 1999-11-16 11:39 UTC (permalink / raw)
Tibor Simko <tibor.simko@cern.ch> writes:
> >>>>> "Hrvoje" == Hrvoje Niksic <hniksic@iskon.hr> writes:
>
> Hrvoje> Does it have to be that slow?
>
> No.
Really? I assume you have an implementation that makes it faster
without turning it off. Could you show it?
> Hrvoje> set gnus-treat-emphasize [...] gnus-treat-buttonize to nil
>
> Have you tried to set them to an integer like 5000?
I know that. :-)
Have you *read* my message?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Article size limit for emphasis and buttonization
1999-11-16 11:39 ` Hrvoje Niksic
@ 1999-11-16 12:38 ` Tibor Simko
0 siblings, 0 replies; 7+ messages in thread
From: Tibor Simko @ 1999-11-16 12:38 UTC (permalink / raw)
Cc: ding
>>>>> "Hrvoje" == Hrvoje Niksic <hniksic@iskon.hr> writes:
Hrvoje> I assume you have an implementation that makes it faster
Hrvoje> without turning it off.
Nope :) I was just "answering" your hypothetical question that came
after reporting the default user point of view (i.e. when using `t').
Hrvoje> Have you *read* my message?
Yes, I have: you were afraid that people would find pgnus slow with
default settings and this is why I was sort of seconding your
suggestion to set these gnus-treat-* variables to some reasonable
defaults like 5000 I'm using. But since you have not reported any
tests for intermediate values between t and nil, I supposed that you
had not actually tried that! :). Sorry.
cheers,
-TS
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Article size limit for emphasis and buttonization
1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
1999-11-16 10:49 ` Jan Vroonhof
1999-11-16 11:32 ` Tibor Simko
@ 1999-11-16 17:02 ` Lars Magne Ingebrigtsen
2 siblings, 0 replies; 7+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-11-16 17:02 UTC (permalink / raw)
Hrvoje Niksic <hniksic@iskon.hr> writes:
> Now that we have the infrastructure to specify the part size limit on
> a per-wash-function-basis, we should definitely do so by default. If
> the part is larger than 50K or so, our regexp-based emphasis becomes
> really slow. Buttonization is the next source of slowness, but it can
> be safely limited to 100K or so.
Ok; I've now changed the defaults for these two treatments.
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~1999-11-16 17:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-11-16 10:15 Article size limit for emphasis and buttonization Hrvoje Niksic
1999-11-16 10:49 ` Jan Vroonhof
1999-11-16 10:59 ` Hrvoje Niksic
1999-11-16 11:32 ` Tibor Simko
1999-11-16 11:39 ` Hrvoje Niksic
1999-11-16 12:38 ` Tibor Simko
1999-11-16 17:02 ` Lars Magne Ingebrigtsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).