Gnus development mailing list
 help / color / mirror / Atom feed
* overview file access when spooling and nnml/nnimap performance
@ 1999-09-22 13:48 Hannu Koivisto
  1999-09-22 14:13 ` Kai Großjohann
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Hannu Koivisto @ 1999-09-22 13:48 UTC (permalink / raw)


Greetings,

I'm looking for ways to handle my mail (and news, but that's not
relevant now) faster without getting new hardware.  I have almost
always used nnml because it has scaled better than anything else,
but nowadays I'm not quite satisfied with it either.  Getting new
mail from the spool, even few (5--20) articles, is much slower than
it should, IMHO, be on my PII/266 Linux machine with spools (I
split mail externally with procmail) and nnml folders both on
UW-SCSI disk.  Also entering groups with large amount of articles
is quite slow, but I believe there is really nothing that can be
done for it as Elisp is probably simply too slow for reading the
relevant information from disk and generating the summary buffer.

So, I started to think about what happens when Gnus gets new mail
from the spool.  What seems a real overkill to me is that it seems
to read the whole overview file for each group for which there is
mail to memory, do some magic (that I didn't quite understand and
which probably is the problem here for me thinking this is an
overkill), add new lines and then write it back to disk.  (Please
correct if I'm wrong about this.)  I.e. if I get even just few new
mails to several groups, Gnus ends up reading like 5+Mb data from
disk, fiddling with it and writing it back.  Perhaps this can't be
avoided, in which case, I thought, some sort of database solution
for storing articles would perhaps be the right direction.  This
brings us to nnimap.  How's its performance with some good IMAP
server (does such things exist)?  Somehow I doubt this, but as
someone suggested going for IMAP when someone else was asking for
some sort of database-backend, I'm a bit curious.

-- 
Hannu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 13:48 overview file access when spooling and nnml/nnimap performance Hannu Koivisto
@ 1999-09-22 14:13 ` Kai Großjohann
  1999-09-22 15:32   ` Hannu Koivisto
  1999-09-22 14:54 ` Simon Josefsson
  1999-10-04 10:27 ` Robert Bihlmeyer
  2 siblings, 1 reply; 12+ messages in thread
From: Kai Großjohann @ 1999-09-22 14:13 UTC (permalink / raw)


I think nnimap is even slower than nnml because the NOV parsing done
by Gnus is so fast.  In fact, nnimap now includes a NOV cache feature
which fetches overview info from the IMAP server only once.  After
that, the overview information is stored in a file in ~/News and
subsequently fetched from there.

I use a Cyrus server and I do server-side splitting with Sieve.  So
getting new mail is real fast -- O(n) where n is the number of
subscribed groups.  The splitting happens in the background.  That's a
nice feature of the Gnus/nnimap combination.

You're right, it seems strange that operations on large groups are so
slow.  Recently, I moved all articles from one 3500 message nnml group
to an 8000 message group, and that was real slow.  Towards the end,
moving got a lot faster.

I'm not sure how Gnus' speed could be improved here.  One could try to
read ever larger chunks from the end of the file until the beginning
of the last line is found.  And writing new lines could happen by
appending to the existing file.  Yes, that might be faster for large
overview files, but what's the size where it starts winning.

As to displaying large summary buffers, I find that I normally get by
with just displaying few messages.  I switched from total-expire to
auto-expire, and most old messages are marked as read.  So when
entering a group, I normally see just a few ticked messages, plus the
new ones, plus the old ones that are parents of the other shown ones.

Rather than displaying thousands of messages to search for an old one,
I use nnir to search for it.  (Should use nnir much more often,
though.)

kai
-- 
I like BOTH kinds of music.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 13:48 overview file access when spooling and nnml/nnimap performance Hannu Koivisto
  1999-09-22 14:13 ` Kai Großjohann
@ 1999-09-22 14:54 ` Simon Josefsson
  1999-09-22 15:58   ` Hannu Koivisto
  1999-10-04 10:27 ` Robert Bihlmeyer
  2 siblings, 1 reply; 12+ messages in thread
From: Simon Josefsson @ 1999-09-22 14:54 UTC (permalink / raw)
  Cc: ding

Hannu Koivisto <azure@iki.fi> writes:

> I thought, some sort of database solution for storing articles would
> perhaps be the right direction.

Yes. nnml is already a step in that direction compared to nnmh.

> This brings us to nnimap.  How's its performance with some good IMAP
> server (does such things exist)?

Nnimap can't be a fast IMAP client compared to other IMAP clients,
because Gnus behaves orthogonal to several concepts used in IMAP to
speed things up. This does not mean that Gnus can't be modified to
work smoother together with nnimap, which I'm sure it will.

If you want numbers, entering a large group (~1000 articles) with
nnimap spend 20 % of the time in nnimap and 80 % in Gnus. For nnml
theese numbers are 1 % / 99 %.

This indicate that if you want better performance, you should spend
your time optimizing Gnus instead of working on a backend. The
bottleneck isn't I/O. A simple step, like moving the "range"
calculations to C (which is on the todo list, I believe) would speed
up things tremendously for very large groups (>10000 articles).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 14:13 ` Kai Großjohann
@ 1999-09-22 15:32   ` Hannu Koivisto
  1999-09-22 15:50     ` Kai Großjohann
  1999-09-22 16:52     ` Simon Josefsson
  0 siblings, 2 replies; 12+ messages in thread
From: Hannu Koivisto @ 1999-09-22 15:32 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

| I think nnimap is even slower than nnml because the NOV parsing done
| by Gnus is so fast.  In fact, nnimap now includes a NOV cache feature
| which fetches overview info from the IMAP server only once.  After
| that, the overview information is stored in a file in ~/News and
| subsequently fetched from there.

Hm, I don't quite follow why this gives a reason why nnimap is even
slower than nnml.  After all, Gnus parses NOV stuff less in
nnimap's case as it doesn't have to touch overview info when mail
is getting spooled to folders -- IMAP server does that.  It has to
handle overview info when entering a group but that's same for nnml
and nnimap, isn't it?  Of course, without caching, that overview
file would be read over network, but after all, I was more
concerned about optimizing the spooling phase which in IMAP's case
is handled on the background, like you say below.  Perhaps there
would be some way to get asynchronous external spooling directly to
nnml folders too, but, unlike with nnmh, that would require locking
here and there and is probably far from easy to do.

| I use a Cyrus server and I do server-side splitting with Sieve.  So

Thanks, I'll check those out.  I hope that splitting doesn't
require anything special and could be done with procmail instead of
that Sieve thingy.  I'd rather avoid converting my 9k .procmailrc
into another system.

| getting new mail is real fast -- O(n) where n is the number of
| subscribed groups.  The splitting happens in the background.  That's a
| nice feature of the Gnus/nnimap combination.

Indeed.

| slow.  Recently, I moved all articles from one 3500 message nnml group
| to an 8000 message group, and that was real slow.  Towards the end,

yeah, and when you have 80000 articles instead of 8000, the real
fun begins :)

| of the last line is found.  And writing new lines could happen by
| appending to the existing file.  Yes, that might be faster for large
| overview files, but what's the size where it starts winning.

Yes, that's what I thought too (and as far as I can see, it should
start winning immediately or something is broken), but the fact
that Gnus possibly removes some lines from those overview files or
whatever (this is the part I didn't quite grok when reading the
source) shoots down the idea of appending.

| As to displaying large summary buffers, I find that I normally get by
| with just displaying few messages.  I switched from total-expire to

So do I, normally, but there are few folders that I don't touch for
a long time and then read all accumulated messages.  This would be
fine (it wouldn't hit me often) if I didn't peek into those folders
for something important every now and then.

| Rather than displaying thousands of messages to search for an old one,
| I use nnir to search for it.  (Should use nnir much more often,
| though.)

Hmm, I'll have to check that out too.  I assume it handles
full-text regexp searches, yes?

-- 
Hannu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 15:32   ` Hannu Koivisto
@ 1999-09-22 15:50     ` Kai Großjohann
  1999-09-22 17:00       ` Simon Josefsson
  1999-09-22 16:52     ` Simon Josefsson
  1 sibling, 1 reply; 12+ messages in thread
From: Kai Großjohann @ 1999-09-22 15:50 UTC (permalink / raw)


Hannu Koivisto <azure@iki.fi> writes:

> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> 
> | of the last line is found.  And writing new lines could happen by
> | appending to the existing file.  Yes, that might be faster for large
> | overview files, but what's the size where it starts winning.
> 
> Yes, that's what I thought too (and as far as I can see, it should
> start winning immediately or something is broken), but the fact
> that Gnus possibly removes some lines from those overview files or
> whatever (this is the part I didn't quite grok when reading the
> source) shoots down the idea of appending.

I would be surprised if Gnus did anything but appending _during
splitting_.

> | Rather than displaying thousands of messages to search for an old one,
> | I use nnir to search for it.  (Should use nnir much more often,
> | though.)
> 
> Hmm, I'll have to check that out too.  I assume it handles
> full-text regexp searches, yes?

Depends.  It can use several backends.  Since the IMAP protocol
provides searching, nnir provides whatever IMAP provides.  Which is
full-text regexp searches plus Boolean connectives, if I'm not
mistaken. 

But for nnml, say, you can use a different search engine which might
provide other nifty things like `sounds similar' searches or stemming
or ranking.  Or all of them, as is the case with freeWAIS-sf :-)

I hope that we will be able to integrate a real IR engine with the
Cyrus server, someday.  That would really be KEWL!

kai
-- 
I like BOTH kinds of music.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 14:54 ` Simon Josefsson
@ 1999-09-22 15:58   ` Hannu Koivisto
  1999-09-22 16:17     ` Kai Großjohann
  1999-09-22 17:08     ` Simon Josefsson
  0 siblings, 2 replies; 12+ messages in thread
From: Hannu Koivisto @ 1999-09-22 15:58 UTC (permalink / raw)


Simon Josefsson <jas@pdc.kth.se> writes:

| If you want numbers, entering a large group (~1000 articles) with
| nnimap spend 20 % of the time in nnimap and 80 % in Gnus. For nnml
| theese numbers are 1 % / 99 %.
| 
| This indicate that if you want better performance, you should spend
| your time optimizing Gnus instead of working on a backend. The

Hm, right, thanks for the numbers.  However, perhaps I was a bit
unclear but I never meant that backend would improve group access
at least dramatically; I do agree with you that it's Gnus' problem
(I think I said it was Elisp's problem, but Gnus _and_ Elisp is
actually what I was thinking).  The primary reason why I was
thinking about backend was the spooling phase.  That's the primary
problem I'm seeing now -- it has slowed down a lot while my folders
have grown larger and I don't think Gnus needs to do anything that
would prevent scaling in this case.  Perhaps I'm wrong, but this is
what I'm trying to find out.

-- 
Hannu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 15:58   ` Hannu Koivisto
@ 1999-09-22 16:17     ` Kai Großjohann
  1999-09-22 17:08     ` Simon Josefsson
  1 sibling, 0 replies; 12+ messages in thread
From: Kai Großjohann @ 1999-09-22 16:17 UTC (permalink / raw)


Hannu Koivisto <azure@iki.fi> writes:

> [...] The primary reason why I was
> thinking about backend was the spooling phase.  That's the primary
> problem I'm seeing now -- it has slowed down a lot while my folders
> have grown larger and I don't think Gnus needs to do anything that
> would prevent scaling in this case. [...]

Maybe the filesystem prevents scaling?  Aren't huge directories
hideously slow in Unix?

kai
-- 
I like BOTH kinds of music.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 15:32   ` Hannu Koivisto
  1999-09-22 15:50     ` Kai Großjohann
@ 1999-09-22 16:52     ` Simon Josefsson
  1 sibling, 0 replies; 12+ messages in thread
From: Simon Josefsson @ 1999-09-22 16:52 UTC (permalink / raw)
  Cc: ding

Hannu Koivisto <azure@iki.fi> writes:

> Thanks, I'll check those out.  I hope that splitting doesn't
> require anything special and could be done with procmail instead of
> that Sieve thingy.  I'd rather avoid converting my 9k .procmailrc
> into another system.

There's a web page describing how to use procmail with Cyrus, so it
can be done even though it's not the intended solution.

> | slow.  Recently, I moved all articles from one 3500 message nnml group
> | to an 8000 message group, and that was real slow.  Towards the end,
> 
> yeah, and when you have 80000 articles instead of 8000, the real
> fun begins :)

Auch. You might be running into filesystem scalability issues then. I
can't tell if it would be a solution for you, but I use something
along the lines of

       (list (format-time-string "INBOX.private.%Y-%m")   ....

for all mailboxes that gather lots of mail which aren't expired. I
began doing this when I used nnml, but, as you can see, kept the
configuration with nnimap.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 15:50     ` Kai Großjohann
@ 1999-09-22 17:00       ` Simon Josefsson
  0 siblings, 0 replies; 12+ messages in thread
From: Simon Josefsson @ 1999-09-22 17:00 UTC (permalink / raw)
  Cc: ding

Kai Großjohann <Kai.Grossjohann@CS.Uni-Dortmund.DE> writes:

> > Hmm, I'll have to check that out too.  I assume it handles
> > full-text regexp searches, yes?
> 
> Depends.  It can use several backends.  Since the IMAP protocol
> provides searching, nnir provides whatever IMAP provides.  Which is
> full-text regexp searches plus Boolean connectives, if I'm not
> mistaken. 

Sorry, no regexp searches in IMAP. There are boolean connectives
though, but you can't use them via nnir (yet). A regexp extension of
IMAP's SEARCH would be nice.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 15:58   ` Hannu Koivisto
  1999-09-22 16:17     ` Kai Großjohann
@ 1999-09-22 17:08     ` Simon Josefsson
  1999-09-22 18:15       ` Hannu Koivisto
  1 sibling, 1 reply; 12+ messages in thread
From: Simon Josefsson @ 1999-09-22 17:08 UTC (permalink / raw)
  Cc: ding

Hannu Koivisto <azure@iki.fi> writes:

> The primary reason why I was thinking about backend was the spooling
> phase.  That's the primary problem I'm seeing now -- it has slowed
> down a lot while my folders have grown larger and I don't think Gnus
> needs to do anything that would prevent scaling in this case.
> Perhaps I'm wrong, but this is what I'm trying to find out.

I don't know much about how the spooling phase works, but I would
assume it involves calculating (especially expanding/compacting) gnus
"ranges". And the gnus-range-* functions do very poorly when more
than, say, 10.000 articles are involved.

Try M-x elp-instrument-package RET gnus RET, fetch new mail, and look
at M-x elp-results RET, if gnus-range-* comes up at the top you've
confirmed my guess. If not, you should be able to figure out where all
the time is spent, no?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 17:08     ` Simon Josefsson
@ 1999-09-22 18:15       ` Hannu Koivisto
  0 siblings, 0 replies; 12+ messages in thread
From: Hannu Koivisto @ 1999-09-22 18:15 UTC (permalink / raw)


Simon Josefsson <jas@pdc.kth.se> writes:

| Try M-x elp-instrument-package RET gnus RET, fetch new mail, and look
| at M-x elp-results RET, if gnus-range-* comes up at the top you've
| confirmed my guess. If not, you should be able to figure out where all
| the time is spent, no?

Thanks, I'll read more source and experiment with these to learn
more.

-- 
Hannu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: overview file access when spooling and nnml/nnimap performance
  1999-09-22 13:48 overview file access when spooling and nnml/nnimap performance Hannu Koivisto
  1999-09-22 14:13 ` Kai Großjohann
  1999-09-22 14:54 ` Simon Josefsson
@ 1999-10-04 10:27 ` Robert Bihlmeyer
  2 siblings, 0 replies; 12+ messages in thread
From: Robert Bihlmeyer @ 1999-10-04 10:27 UTC (permalink / raw)


Hi,

>>>>> On 22 Sep 1999 16:48:54 +0300
>>>>> Hannu Koivisto <azure@iki.fi> said:

 Hannu> I thought, some sort of database solution for storing articles
 Hannu> would perhaps be the right direction.

FWIW, newer xemacsen support DB/DBM libraries. See (lispref)Databases.

	Robbe

-- 
Robert Bihlmeyer	reads: Deutsch, English, MIME, Latin-1, NO SPAM!
<robbe@orcus.priv.at>	<http://stud2.tuwien.ac.at/~e9426626/sig.html>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~1999-10-04 10:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-09-22 13:48 overview file access when spooling and nnml/nnimap performance Hannu Koivisto
1999-09-22 14:13 ` Kai Großjohann
1999-09-22 15:32   ` Hannu Koivisto
1999-09-22 15:50     ` Kai Großjohann
1999-09-22 17:00       ` Simon Josefsson
1999-09-22 16:52     ` Simon Josefsson
1999-09-22 14:54 ` Simon Josefsson
1999-09-22 15:58   ` Hannu Koivisto
1999-09-22 16:17     ` Kai Großjohann
1999-09-22 17:08     ` Simon Josefsson
1999-09-22 18:15       ` Hannu Koivisto
1999-10-04 10:27 ` Robert Bihlmeyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).