Gnus development mailing list
 help / color / mirror / Atom feed
* Holes in article sequence
@ 1999-07-07  1:13 François Pinard
  1999-07-07  2:27 ` Hrvoje Niksic
  0 siblings, 1 reply; 16+ messages in thread
From: François Pinard @ 1999-07-07  1:13 UTC (permalink / raw)


Hi, people.

I've often seen complaints about inefficiencies induced by big holes in
the sequence of read article numbers, but I do not know the deep reason
behind such inefficiencies.  Would someone accept to enlighten me?

Is it backend dependent, or not?  Do those inefficiencies exist for `nnml'
in particular?  Is this an I/O related problem?  Do the particular holes at
both end generate inefficiencies?  For example, if I am revisiting a group
I did not see for a long while, will there be an extra inefficiency for
the size of the hole, besides the fact I have many articles to catchup?
Would the problem be merely related to the fact that Gnus expands lists
in memory instead of keeping them in interval form?

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-07  1:13 Holes in article sequence François Pinard
@ 1999-07-07  2:27 ` Hrvoje Niksic
  1999-07-07 17:49   ` Rob Browning
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Hrvoje Niksic @ 1999-07-07  2:27 UTC (permalink / raw)


François Pinard <pinard@iro.umontreal.ca> writes:

> I've often seen complaints about inefficiencies induced by big holes
> in the sequence of read article numbers, but I do not know the deep
> reason behind such inefficiencies.  Would someone accept to
> enlighten me?

The one inefficiency I know of is when you use total-expiry and nnml.
Because of the way total expiry works, Gnus has to stat all the files
in the article range, and delete the ones that should be deleted
(usually those older than N days.)

Now, if you have a big hole in your article range -- e.g. your article
range is 128 - 12300 because you ticked article no. 128 -- total
expiry will stat tens of thousands of non-existant files every time 
you leave the group.  That tends to get a bit slow.

The correct solution would be to expand Gnus' concept of "active
range" to be a list of ranges instead of a single range.  Then
total-expiry could work sanely.  David Moore had interesting ideas
about this, but he disappeared before doing anything.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-07  2:27 ` Hrvoje Niksic
@ 1999-07-07 17:49   ` Rob Browning
  1999-07-09 17:06   ` Lars Magne Ingebrigtsen
  1999-07-11 14:13   ` Hallvard B Furuseth
  2 siblings, 0 replies; 16+ messages in thread
From: Rob Browning @ 1999-07-07 17:49 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> The correct solution would be to expand Gnus' concept of "active
> range" to be a list of ranges instead of a single range.  Then
> total-expiry could work sanely.  David Moore had interesting ideas
> about this, but he disappeared before doing anything.

True, though for now, you can (as I think Kai suggested to me) use
daemon-expiry:

  ;;; expiry handling
  (remove-hook 'gnus-summary-prepare-exit-hook 'gnus-summary-expire-articles)
  (setq gnus-use-demon t)
  (gnus-demon-add-handler 'gnus-group-expire-all-groups 30 30)

I've started toying with total-expire for a few groups, and this made
things quite acceptable, performance wise.

-- 
Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-07  2:27 ` Hrvoje Niksic
  1999-07-07 17:49   ` Rob Browning
@ 1999-07-09 17:06   ` Lars Magne Ingebrigtsen
  1999-07-09 17:11     ` Kai.Grossjohann
  1999-07-09 17:25     ` François Pinard
  1999-07-11 14:13   ` Hallvard B Furuseth
  2 siblings, 2 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-07-09 17:06 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> Now, if you have a big hole in your article range -- e.g. your article
> range is 128 - 12300 because you ticked article no. 128 -- total
> expiry will stat tens of thousands of non-existant files every time 
> you leave the group.  That tends to get a bit slow.

One could speed things up by having the backend look at what articles
exist in the group, and then iterate over that instead of the other
way around.  That would be much, much quicker for groups with big
"holes" in them.  It might be slower in other situations, though.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 17:06   ` Lars Magne Ingebrigtsen
@ 1999-07-09 17:11     ` Kai.Grossjohann
  1999-07-09 18:20       ` Lars Magne Ingebrigtsen
  1999-07-09 17:25     ` François Pinard
  1 sibling, 1 reply; 16+ messages in thread
From: Kai.Grossjohann @ 1999-07-09 17:11 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hrvoje Niksic <hniksic@srce.hr> writes:
> 
> > Now, if you have a big hole in your article range -- e.g. your article
> > range is 128 - 12300 because you ticked article no. 128 -- total
> > expiry will stat tens of thousands of non-existant files every time 
> > you leave the group.  That tends to get a bit slow.
> 
> One could speed things up by having the backend look at what articles
> exist in the group, and then iterate over that instead of the other
> way around.  That would be much, much quicker for groups with big
> "holes" in them.  It might be slower in other situations, though.

What about having the backend look for the articles actually in the
group, and then intersecting this set with the set of
potentially-to-be-expired articles, then only iterating over that
intersection?

kai
-- 
Life is hard and then you die.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 17:06   ` Lars Magne Ingebrigtsen
  1999-07-09 17:11     ` Kai.Grossjohann
@ 1999-07-09 17:25     ` François Pinard
  1999-07-09 18:26       ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 16+ messages in thread
From: François Pinard @ 1999-07-09 17:25 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Hrvoje Niksic <hniksic@srce.hr> writes:

> > Now, if you have a big hole in your article range -- e.g. your article
> > range is 128 - 12300 because you ticked article no. 128 -- total
> > expiry will stat tens of thousands of non-existant files every time 
> > you leave the group.  That tends to get a bit slow.

> One could speed things up by having the backend look at what articles
> exist in the group, and then iterate over that instead of the other
> way around.  That would be much, much quicker for groups with big
> "holes" in them.  It might be slower in other situations, though.

If this is the only problem with big holes, it would be worth moving
iterating the other way around, and forget hole problems forever :-).

If you fear that it could a bit slow, but are able to keep the old code as
well as the new, maybe you could let the number of articles decide which
method to use, or else, have some collaboration from the backend to tell
you when it is better to change methods.  Or maybe both: if the backend has
nothing to say, use some heuristic on the number of articles.  Of course, if
the other-way-around is not supported by the backend, just use the old way.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 17:11     ` Kai.Grossjohann
@ 1999-07-09 18:20       ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-07-09 18:20 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE writes:

> What about having the backend look for the articles actually in the
> group, and then intersecting this set with the set of
> potentially-to-be-expired articles, then only iterating over that
> intersection?

Yes.  But the potentially costly operation is the "find all the
articles in the current group" thing.  For instance, nnfolder will
have to search through the folder, tallying all the messages.  If this
is in response to a total-expiry thing, then that is ok, since
nnfolder will have to do that anyway, but if it's a different expiry
policy, then the list that nnfolder is handed is much, much smaller.

So what is the best way kinda depends on the number of articles the
backend is asked to expire.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 17:25     ` François Pinard
@ 1999-07-09 18:26       ` Lars Magne Ingebrigtsen
  1999-07-09 22:34         ` Kai.Grossjohann
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-07-09 18:26 UTC (permalink / raw)


François Pinard <pinard@iro.umontreal.ca> writes:

> If you fear that it could a bit slow, but are able to keep the old code as
> well as the new, maybe you could let the number of articles decide which
> method to use,

Hey!  Great minds think alike!  :-)

> or else, have some collaboration from the backend to tell
> you when it is better to change methods.  Or maybe both: if the backend has
> nothing to say, use some heuristic on the number of articles.  Of course, if
> the other-way-around is not supported by the backend, just use the old way.

Yes; giving the backend more info might resolve the issue.

(*Time passes while Lars looks at some code.*)

Aha!  nnml already does it the other way around -- it looks at the
actual articles in the group, and then works on the intersection
between the articles it's asked to expire and the ones it knows
exists.  nnfolder, on the other hand, just iterates over the entire
list of expirable articles, which results in a call to
`nnfolder-goto-article' for each article in the list.  That must be
slooow...

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 18:26       ` Lars Magne Ingebrigtsen
@ 1999-07-09 22:34         ` Kai.Grossjohann
  1999-07-11  9:03           ` Lars Magne Ingebrigtsen
  1999-07-12  3:02         ` Hrvoje Niksic
  1999-07-12 21:16         ` jonkv
  2 siblings, 1 reply; 16+ messages in thread
From: Kai.Grossjohann @ 1999-07-09 22:34 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Aha!  nnml already does it the other way around -- it looks at the
> actual articles in the group, and then works on the intersection
> between the articles it's asked to expire and the ones it knows
> exists.

Does that mean that total-expire is now fast in nnml groups, even with
holes?

kai
-- 
Life is hard and then you die.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 22:34         ` Kai.Grossjohann
@ 1999-07-11  9:03           ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-07-11  9:03 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE writes:

> Does that mean that total-expire is now fast in nnml groups, even with
> holes?

Well, "fast" is probably not true, but it's not as hopelessly slow as
in nnfolder groups.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-07  2:27 ` Hrvoje Niksic
  1999-07-07 17:49   ` Rob Browning
  1999-07-09 17:06   ` Lars Magne Ingebrigtsen
@ 1999-07-11 14:13   ` Hallvard B Furuseth
  2 siblings, 0 replies; 16+ messages in thread
From: Hallvard B Furuseth @ 1999-07-11 14:13 UTC (permalink / raw)


Hrvoje Niksic writes:
>> I've often seen complaints about inefficiencies induced by big holes
>> in the sequence of read article numbers, (...)
> 
> The one inefficiency I know of is when you use total-expiry and nnml.

nncache is slow with big holes.  It takes half a minute to fetch this
group in buffer *Gnus Browse Server*:
    K  11589: comp.emacs
There are 8 actual articles.

-- 
Hallvard


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 18:26       ` Lars Magne Ingebrigtsen
  1999-07-09 22:34         ` Kai.Grossjohann
@ 1999-07-12  3:02         ` Hrvoje Niksic
  1999-08-27 17:21           ` Lars Magne Ingebrigtsen
                             ` (2 more replies)
  1999-07-12 21:16         ` jonkv
  2 siblings, 3 replies; 16+ messages in thread
From: Hrvoje Niksic @ 1999-07-12  3:02 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Aha!  nnml already does it the other way around -- it looks at the
> actual articles in the group, and then works on the intersection
> between the articles it's asked to expire and the ones it knows
> exists.

Huh?  Is this something new?  As far as I remember, I've experienced
the "hole" problem with nnml.  Ticking a message in a high-volume
group made exiting the group excruciatingly slow.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-09 18:26       ` Lars Magne Ingebrigtsen
  1999-07-09 22:34         ` Kai.Grossjohann
  1999-07-12  3:02         ` Hrvoje Niksic
@ 1999-07-12 21:16         ` jonkv
  2 siblings, 0 replies; 16+ messages in thread
From: jonkv @ 1999-07-12 21:16 UTC (permalink / raw)
  Cc: jonkv

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]

(Apparently, I should write more followups to mailing list articles.
Maybe then I'd finally learn to use f/F instead of r/R...  I did NOT
intend to send the first reply privately.)

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Aha!  nnml already does it the other way around -- it looks at the
> actual articles in the group, and then works on the intersection
> between the articles it's asked to expire and the ones it knows
> exists.  nnfolder, on the other hand, just iterates over the entire
> list of expirable articles, which results in a call to
> `nnfolder-goto-article' for each article in the list.  That must be
> slooow...

I looked at this about a year ago, since I was using nnfolder for some 
large groups (for example, for the linux-kernel mailing list) and it
was taking forever to expire those groups.  I came to the same
conclusion you just did.  

AFAICT, nnfolder-request-expire-articles is often called with a huge
list of articles, because of holes or whatever.  Then, it iterates
through that entire list, and for each article, it searches the entire
buffer until it finds the article.  But most of the article numbers
don't exist, and for them the *entire* buffer is searched each time...

I rewrote that part so that it first finds all article numbers (which
shouldn't take that much more time than searching for a single
non-existing article, since in each case the entire buffer has to be
scanned), then intersects that with the articles it should expire.
Back then, I had no idea that nnml did it that way.

Anyway, I wrote that code a year ago and promptly forgot all about
sending in a patch or anything.  But here it is, ported to pgnus 0.95,
in case anyone is interested.  It seems to work on my system, but I
can't guarantee that it will work on anyone else's or that I haven't
made stupid assumptions that won't hold all the time -- I'm not
exactly familiar with the internal workings of Gnus.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Faster expiry in nnfolder --]
[-- Type: text/x-patch, Size: 2878 bytes --]

--- nnfolder-original.el	Mon Jul 12 17:06:00 1999
+++ nnfolder.el	Mon Jul 12 17:33:28 1999
@@ -295,39 +295,64 @@
     (let ((nnmail-file-coding-system nnfolder-file-coding-system))
       (nnmail-find-file nnfolder-newsgroups-file))))
 
+;; Return a list consisting of all article numbers existing in the
+;; current folder.
+
+(defun nnfolder-existing-articles ()
+  (save-excursion
+    (when nnfolder-current-buffer
+      (set-buffer nnfolder-current-buffer)
+      (goto-char (point-min))
+      (let ((marker (concat "\n" nnfolder-article-marker))
+	    (number "[0-9]+")
+	    numbers)
+      
+	(while (and (search-forward marker nil t)
+		    (re-search-forward number nil t))
+	  (let ((newnum (string-to-number (match-string 0))))
+	    (if (nnmail-within-headers-p)
+		(push newnum numbers))))
+	numbers))))
+
 (deffoo nnfolder-request-expire-articles
   (articles newsgroup &optional server force)
   (nnfolder-possibly-change-group newsgroup server)
   (let* ((is-old t)
-	 rest)
+	 ;; The articles we have deleted so far.
+	 (deleted-articles nil)
+	 ;; The articles that really exist and will be expired if they are old enough.
+	 (maybe-expirable (gnus-intersection articles (nnfolder-existing-articles))))
     (nnmail-activate 'nnfolder)
 
     (save-excursion
       (set-buffer nnfolder-current-buffer)
-      (while (and articles is-old)
+      ;; Since messages are sorted in arrival order and expired in the
+      ;; same order, we can stop as soon as we find a message that is
+      ;; too old.
+      (while (and maybe-expirable is-old)
 	(goto-char (point-min))
-	(when (and (nnfolder-goto-article (car articles))
+	(when (and (nnfolder-goto-article (car maybe-expirable))
 		   (search-forward (concat "\n" nnfolder-article-marker)
 				   nil t))
 	  (forward-sexp)
-	  (if (setq is-old
+	  (when (setq is-old
 		    (nnmail-expired-article-p
 		     newsgroup
 		     (buffer-substring
 		      (point) (progn (end-of-line) (point)))
 		     force nnfolder-inhibit-expiry))
-	      (progn
 		(nnheader-message 5 "Deleting article %d..."
-				  (car articles) newsgroup)
-		(nnfolder-delete-mail))
-	    (push (car articles) rest)))
-	(setq articles (cdr articles)))
+			      (car maybe-expirable) newsgroup)
+	    (nnfolder-delete-mail)
+	    ;; Must remember which articles were actually deleted
+	    (push (car maybe-expirable) deleted-articles)))
+	(setq maybe-expirable (cdr maybe-expirable)))
       (unless nnfolder-inhibit-expiry
 	(nnheader-message 5 "Deleting articles...done"))
       (nnfolder-save-buffer)
       (nnfolder-adjust-min-active newsgroup)
       (nnfolder-save-active nnfolder-group-alist nnfolder-active-file)
-      (nconc rest articles))))
+      (gnus-sorted-complement articles (nreverse deleted-articles)))))
 
 (deffoo nnfolder-request-move-article (article group server
 					       accept-form &optional last)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-12  3:02         ` Hrvoje Niksic
@ 1999-08-27 17:21           ` Lars Magne Ingebrigtsen
  1999-08-28 17:55           ` Jan Vroonhof
       [not found]           ` <m3yaexdtre.fsf@quimb <byemgnx00v.fsf@bolzano.math.ethz.ch>
  2 siblings, 0 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-08-27 17:21 UTC (permalink / raw)


Hrvoje Niksic <hniksic@srce.hr> writes:

> Huh?  Is this something new?  As far as I remember, I've experienced
> the "hole" problem with nnml.  Ticking a message in a high-volume
> group made exiting the group excruciatingly slow.

Well, there's excruciatingly slow (nnml) and horribly, unworkably slow 
(nnfolder).  The slowness in nnml with holes comes not from statting a 
gazillion files, but just from expanding (9 . 783739) to a looong list 
of numbers.  

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
  1999-07-12  3:02         ` Hrvoje Niksic
  1999-08-27 17:21           ` Lars Magne Ingebrigtsen
@ 1999-08-28 17:55           ` Jan Vroonhof
       [not found]           ` <m3yaexdtre.fsf@quimb <byemgnx00v.fsf@bolzano.math.ethz.ch>
  2 siblings, 0 replies; 16+ messages in thread
From: Jan Vroonhof @ 1999-08-28 17:55 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Well, there's excruciatingly slow (nnml) and horribly, unworkably slow 
> (nnfolder).  The slowness in nnml with holes comes not from statting a 
> gazillion files, but just from expanding (9 . 783739) to a looong list 
> of numbers.  

Maybe it is time to support real ranges in the overview file?

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Holes in article sequence
       [not found]           ` <m3yaexdtre.fsf@quimb <byemgnx00v.fsf@bolzano.math.ethz.ch>
@ 1999-09-25  6:45             ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 16+ messages in thread
From: Lars Magne Ingebrigtsen @ 1999-09-25  6:45 UTC (permalink / raw)


Jan Vroonhof <vroonhof@math.ethz.ch> writes:

> > Well, there's excruciatingly slow (nnml) and horribly, unworkably slow 
> > (nnfolder).  The slowness in nnml with holes comes not from statting a 
> > gazillion files, but just from expanding (9 . 783739) to a looong list 
> > of numbers.  
> 
> Maybe it is time to support real ranges in the overview file?

Supporting ranges everywhere would be very nice (and that was my
original plan), but it would be Major Work.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~1999-09-25  6:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-07-07  1:13 Holes in article sequence François Pinard
1999-07-07  2:27 ` Hrvoje Niksic
1999-07-07 17:49   ` Rob Browning
1999-07-09 17:06   ` Lars Magne Ingebrigtsen
1999-07-09 17:11     ` Kai.Grossjohann
1999-07-09 18:20       ` Lars Magne Ingebrigtsen
1999-07-09 17:25     ` François Pinard
1999-07-09 18:26       ` Lars Magne Ingebrigtsen
1999-07-09 22:34         ` Kai.Grossjohann
1999-07-11  9:03           ` Lars Magne Ingebrigtsen
1999-07-12  3:02         ` Hrvoje Niksic
1999-08-27 17:21           ` Lars Magne Ingebrigtsen
1999-08-28 17:55           ` Jan Vroonhof
     [not found]           ` <m3yaexdtre.fsf@quimb <byemgnx00v.fsf@bolzano.math.ethz.ch>
1999-09-25  6:45             ` Lars Magne Ingebrigtsen
1999-07-12 21:16         ` jonkv
1999-07-11 14:13   ` Hallvard B Furuseth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).