Gnus development mailing list
 help / color / mirror / Atom feed
* performance respooling articles in nnmbox or nnbabyl
@ 1996-03-16  1:30 Greg Stark
  1996-03-16  9:58 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Stark @ 1996-03-16  1:30 UTC (permalink / raw)



Hypothetically speaking, i was helping  set up a new user who wanted to use
nnmbox as his backend.  He had everything set up fine and it came time to set
up a nnmail-split-method.  After setting up an nnmail-split method he would
just bave to use Br to respool all the articles into groups, right?

The problem is that nnmbox and nnbabyl use a single file for all the groups
and use the Gnus-* headers to indicate which group an article is in.  To
refile an article, all that's necessary is to change the header, in theory. In
practice Gnus deletes the old message inserts the message in a new buffer,
edits the text there, then inserts that text at the other end of the buffer.

Think about how the buffer gap has to move to accomplish this when respooling
the entire nnmbox; the behaviour is absolutely pessimized.  The buffer gap has
to move back and forth across the entire nnmbox file for each delete and
insert, meaning emacs has to do the equivalent of an memmove of the entire
nnmbox file in memory twice for each respool.  Since each movement has to
cover N articles, and it has to do 2N movemements, this respooling algorithm
should run in N^2 time.

I'm not 100% emacs 19.30 still has to do this, there are a lot of obvious
optimizations emacs could do for operations at the end of buffers, but i'm
pretty sure the multiple gap code never made it into the mainline source.
But there's no real reason such a simple operation doing so much work anyways.

If it just went through and editted the headers for each article it would only
have to move the gap over each article once in sequence, the equivalent of a
single movement over the entire buffer.  In other words it should run in order
N time.  If there's no clean way to get this behaviour without breaking the
abstraction barriers (it looks like it shouldn't be too hard to me) then
another order N algorithm would be to process all the deletions first then
insert all of them at the end. I think this algorithm would still be doing
more work than the header editting one, though i can't be sure.
 
I hope the optimization splurge continues even after this release, and i hope
you pay attention to algorithmic optimizations and not just minor tweaks like
choosing which primitives to use.  I suspect there are still a lot of N^2
algorithms lying around even after that patch in gnu.emacs.gnus that changed a
lot of them.  In particular, killing lots newsgroups still seems slow.

greg


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: performance respooling articles in nnmbox or nnbabyl
  1996-03-16  1:30 performance respooling articles in nnmbox or nnbabyl Greg Stark
@ 1996-03-16  9:58 ` Lars Magne Ingebrigtsen
  1996-03-17 18:27   ` Greg Stark
  1996-03-17 18:39   ` Greg Stark
  0 siblings, 2 replies; 8+ messages in thread
From: Lars Magne Ingebrigtsen @ 1996-03-16  9:58 UTC (permalink / raw)


gsstark@MIT.EDU (Greg Stark) writes:

> The problem is that nnmbox and nnbabyl use a single file for all the
> groups and use the Gnus-* headers to indicate which group an article
> is in.  To refile an article, all that's necessary is to change the
> header, in theory.

Some header munging also has to be done to remove all those babyleze
headers.

> If it just went through and editted the headers for each article it
> would only have to move the gap over each article once in sequence,
> the equivalent of a single movement over the entire buffer.  In
> other words it should run in order N time.  If there's no clean way
> to get this behaviour without breaking the abstraction barriers (it
> looks like it shouldn't be too hard to me) then another order N
> algorithm would be to process all the deletions first then insert
> all of them at the end.

But you're only considering the particular case of respooling between
nnbabyl and nnmbox (and possibly nnfolder).  You're probably right
that Gnus does far too much work when moving messages between those
backends, but it's hard to see how to do what you suggest between,
say, nntp and nnml cleanly without adding a gazillion new functions.
(Each backend now only has to provide two functions to move, accept,
copy and respool articles -- `nn*-request-move-article' and
`nn*-request-accept-article'.)  The added complexity does not justify
the speed increases, which I'm not convinced would happen, actually.

(I don't know anything about those buffer gaps and stuff, though.  Why
do the gaps have to move?  Point stays at the end of the buffer being
respooled to, I think...)

Oh.  Uhm.  I misunderstood.  You meant respooling using the same
backend?  Those backends assume that the messages are stored using
ascending article numbers, so just changing the Gnus-* headers won't
work reliably.  In any case, that's not something most people would do
very often.

-- 
  "Yes.  The journey through the human heart 
     would have to wait until some other time."


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: performance respooling articles in nnmbox or nnbabyl
  1996-03-16  9:58 ` Lars Magne Ingebrigtsen
@ 1996-03-17 18:27   ` Greg Stark
  1996-03-18 16:18     ` Lars Magne Ingebrigtsen
  1996-03-17 18:39   ` Greg Stark
  1 sibling, 1 reply; 8+ messages in thread
From: Greg Stark @ 1996-03-17 18:27 UTC (permalink / raw)
  Cc: ding


Yes, the problem only occurs when respooling from nnmbox to nnmbox
or from nnbabyl to nnbabyl.  Except i think this is a fairly common
thing to need to do. Because every time you add a new split-method 
entry you'll want to re-spool things.

A bit of background on Emacs's internal representation of buffers; 
if i get any of this wrong someone should correct me.  A buffer is
stored in a contiguous piece of memory with a gap.  the gap is always
moved to be where you are editting (with some exceptions).  Whenever an
insertion or deletion is done, the gap has to move there to provide
or take up the unused buffer space.  The only way to move the gap is
essentially equivalent to an memcopy or memmove, and is fairly CPU-
intensive.

When you request-move an article from the beginning of an mbox group
the nnmbox backend deletes the article and inserts it at the end. 
The deletion needs the gap at the beginning of the buffer and the 
insertion needs it at the end.  

I think a reasonable solution to this would be to have an optional 
nnchoke-request-move-articles that took an assoc list of article 
numbers and groups to move them to.  Only backends for which this 
allowed a more efficient implementation would provide this function.
For nnmbox and nnbabyl -request-move-articles could store all the
moved articles in a temporary buffer and insert them all at the end
of the operation.

greg


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: performance respooling articles in nnmbox or nnbabyl
  1996-03-16  9:58 ` Lars Magne Ingebrigtsen
  1996-03-17 18:27   ` Greg Stark
@ 1996-03-17 18:39   ` Greg Stark
  1996-03-18 16:18     ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 8+ messages in thread
From: Greg Stark @ 1996-03-17 18:39 UTC (permalink / raw)
  Cc: ding


Hmm, there is another problem here.  Respooling into the same method is the
only way to make a split-method change affect existing mail messages, but it
doesn't really do what i want.

Commonly i add a new split-method when i realize that the traffic from some
list or whatever is getting high.  Then i want to move all the messages from
my misc group into the new group.  If I simply respool my entire misc
directory, all my ticks and marks are lost, all the articles show up as
unread, and a lot of extra work is done.

What i really want to happen is for the split-method functions to be
reapplied, and if the result doesn't match the existing Xref header, only
then should the article be moved.  And any marks should move with the article.

This doesn't seem too hard to get right, and i think it's pretty important 
for making split-methods more usable.

greg



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: performance respooling articles in nnmbox or nnbabyl
  1996-03-17 18:27   ` Greg Stark
@ 1996-03-18 16:18     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 8+ messages in thread
From: Lars Magne Ingebrigtsen @ 1996-03-18 16:18 UTC (permalink / raw)


gsstark@mit.edu (Greg Stark) writes:

> I think a reasonable solution to this would be to have an optional 
> nnchoke-request-move-articles that took an assoc list of article 
> numbers and groups to move them to.  Only backends for which this 
> allowed a more efficient implementation would provide this function.
> For nnmbox and nnbabyl -request-move-articles could store all the
> moved articles in a temporary buffer and insert them all at the end
> of the operation.

That wouldn't be too much work, actually.  It could even be done
without adding an extra backend interface function.

For each article:

1)  Request the article; put it in a temp buffer; clean it up to
remove all old Gnusey headers.

2)  Delete the article.

Then we'd have one temp buffer with all the "removed" articles, and
one could just point `nnchoke-request-scan' to that buffer/file to
pretend that it's an incoming spool and run it through the split
methods.

Hmn.  This would lose all article marks, though.

-- 
  "Yes.  The journey through the human heart 
     would have to wait until some other time."


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: performance respooling articles in nnmbox or nnbabyl
  1996-03-17 18:39   ` Greg Stark
@ 1996-03-18 16:18     ` Lars Magne Ingebrigtsen
  1996-03-18 17:40       ` Greg Stark
  0 siblings, 1 reply; 8+ messages in thread
From: Lars Magne Ingebrigtsen @ 1996-03-18 16:18 UTC (permalink / raw)


gsstark@mit.edu (Greg Stark) writes:

> Commonly i add a new split-method when i realize that the traffic from some
> list or whatever is getting high.  Then i want to move all the messages from
> my misc group into the new group.  If I simply respool my entire misc
> directory, all my ticks and marks are lost, all the articles show up as
> unread, and a lot of extra work is done.

Well, that's a bug.  Fix in 0.55.

> What i really want to happen is for the split-method functions to be
> reapplied, and if the result doesn't match the existing Xref header,
> only then should the article be moved.

Hm.  It seems perfectly valid to me to respool articles to the same
group, really.  This is handy if one wants to "clean up" the article
numbers, for instance.

-- 
  "Yes.  The journey through the human heart 
     would have to wait until some other time."


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: performance respooling articles in nnmbox or nnbabyl
  1996-03-18 16:18     ` Lars Magne Ingebrigtsen
@ 1996-03-18 17:40       ` Greg Stark
  1996-03-18 22:10         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Stark @ 1996-03-18 17:40 UTC (permalink / raw)
  Cc: ding


>>>>> Lars Magne Ingebrigtsen writes:

> gsstark@mit.edu (Greg Stark) writes:
>> What i really want to happen is for the split-method functions to be
>> reapplied, and if the result doesn't match the existing Xref header, only
>> then should the article be moved.

> Hm. It seems perfectly valid to me to respool articles to the same group,
> really. This is handy if one wants to "clean up" the article numbers, for
> instance.

I agree, that's why i think there are two different operations being
considered here. There are times when losing existing information and treating
marked articles as completely new articles is intended; but when i add a new
split method i want a little effect as possible, so i don't want the
unaffected articles moved, and i don't want information about the moved
articles lost.

greg


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: performance respooling articles in nnmbox or nnbabyl
  1996-03-18 17:40       ` Greg Stark
@ 1996-03-18 22:10         ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 8+ messages in thread
From: Lars Magne Ingebrigtsen @ 1996-03-18 22:10 UTC (permalink / raw)


gsstark@mit.edu (Greg Stark) writes:

> I agree, that's why i think there are two different operations being
> considered here. There are times when losing existing information
> and treating marked articles as completely new articles is intended;
> but when i add a new split method i want a little effect as
> possible, so i don't want the unaffected articles moved, and i don't
> want information about the moved articles lost.

Ok, I've added this to the Red Gnus todo list.

-- 
  "Yes.  The journey through the human heart 
     would have to wait until some other time."


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1996-03-18 22:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-03-16  1:30 performance respooling articles in nnmbox or nnbabyl Greg Stark
1996-03-16  9:58 ` Lars Magne Ingebrigtsen
1996-03-17 18:27   ` Greg Stark
1996-03-18 16:18     ` Lars Magne Ingebrigtsen
1996-03-17 18:39   ` Greg Stark
1996-03-18 16:18     ` Lars Magne Ingebrigtsen
1996-03-18 17:40       ` Greg Stark
1996-03-18 22:10         ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).