Gnus development mailing list
 help / color / mirror / Atom feed
* Agent downloads too many headers
@ 2002-10-22  6:20 Kai Großjohann
  2002-10-23  6:42 ` Kai Großjohann
  2002-10-23 15:54 ` Kai Großjohann
  0 siblings, 2 replies; 13+ messages in thread
From: Kai Großjohann @ 2002-10-22  6:20 UTC (permalink / raw)


The situation is as follows: in gnus-agent-fetch-headers, if
gnus-agent-consider-all-articles is non-nil, the list of articles is
set to the active range of the group.  The active range could be
something like (1 . 4711), so the list of articles would be (1 2 ...
4710 4711).  Then we remove from that list the list of
already-downloaded articles.  But probably the user has started
reading the group when the article numbers were greater than 1
already.  So most probably there are a lot of articles in the
low-number range which are not in the group at all.

Then the agent fetches the headers from that group for this list of
articles.

So the agent ends up fetching (almost) all headers for all groups.

What can we do?

We want to avoid fetching headers for the low-numbered articles where
we already learned yesterday that these articles don't exist.

One possibility would be to tell the agent to never fetch articles
with numbers less than what we've already fetched.  This would be (fairly)
easy to implement, but it would lead to a problem: people who start
using the Agent and download the (unread) message 4711, then decide
they want to download old articles, too.  For them, the agent would
never download articles with numbers lower than 4711 because that's
the lowest number fetched already.  One workaround would be to enter
the group and type `C-u J u' which would fetch even those articles.
But I think it is not nice to require them to do that, they might
have lots of groups.

Another possibility is to keep an "unactive list (of ranges)".  This
would be a range of articles known not to exist.  This would require
storing more data in the agent.  But it would also be precise.  I see
two problems with this, a minor one and a major one.  The minor
problem is that I don't know how to store additional data in the
agent.  The major problem is that I don't know if it will be
efficient: the unactive ranges might grow quite large.  For example,
if you start reading a group starting with article 1 and the agent
fetches every tenth article, then the unactive ranges will be ((2 .
10) (12 . 20) ...) and after a few tens of thousands of articles
there will be information about those long-gone articles 1, 11, 21
that have presumably long been expired from the agent.

What do you think?

Please help!

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-22  6:20 Agent downloads too many headers Kai Großjohann
@ 2002-10-23  6:42 ` Kai Großjohann
  2002-10-23 13:55   ` Wes Hardaker
  2002-10-23 15:54 ` Kai Großjohann
  1 sibling, 1 reply; 13+ messages in thread
From: Kai Großjohann @ 2002-10-23  6:42 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> The situation is as follows: in gnus-agent-fetch-headers, if
> gnus-agent-consider-all-articles is non-nil, the list of articles is
> set to the active range of the group.  The active range could be
> something like (1 . 4711), so the list of articles would be (1 2 ...
> 4710 4711).  Then we remove from that list the list of
> already-downloaded articles.  But probably the user has started
> reading the group when the article numbers were greater than 1
> already.  So most probably there are a lot of articles in the
> low-number range which are not in the group at all.
>
> Then the agent fetches the headers from that group for this list of
> articles.
>
> So the agent ends up fetching (almost) all headers for all groups.
>
> What can we do?

The more I think about it, the more confused I get.

My current idea is to store ranges of articles which the agent has
already tried to fetch (it might have failed for nonexisting
articles).  So if the current group starts at article 4712, then the
range will also contain (1 . 4711) because at the first try Gnus will
try to fetch those headers, but they don't exist.

Then at subsequent fetching attempts, we can remove those headers.

But the problem is this: suppose the user uses the agent for a while
in the default setting, then changes gnus-agent-consider-all-articles
to true later on.  This means that lots of articles need to be
fetched where the headers are fetched already.  So at some point
headers from the cache need to be added to the list of articles to
check.  How do I do that?

Another thing is that we probably don't want to iterate over all
those long-fetched headers again to see if we need to fetch the
corresponding article now.  So maybe along with the range of articles
fetched we should store the value of gnus-agent-consider-all-articles
and the predicate that was used.  Then when either one changes we can
consider all those long-fetched headers again and when they stay the
same we only need to check the new headers.

It's getting quite complicated for my feeble mind, and every time I
try to implement something, I get confused.  So it would be really
nice if somebody could help.  Whatever you know, just let me know
about it.  So if you know an algorithm that might work, tell me.  If
you know how to implement a small part of what needs to be done, tell
me.

Thanks,
kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23  6:42 ` Kai Großjohann
@ 2002-10-23 13:55   ` Wes Hardaker
  2002-10-23 14:30     ` Kai Großjohann
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Wes Hardaker @ 2002-10-23 13:55 UTC (permalink / raw)
  Cc: ding

>>>>> On Wed, 23 Oct 2002 08:42:28 +0200, kai.grossjohann@uni-duisburg.de (Kai Großjohann) said:

Kai> But the problem is this: suppose the user uses the agent for a while
Kai> in the default setting, then changes gnus-agent-consider-all-articles
Kai> to true later on.  This means that lots of articles need to be
Kai> fetched where the headers are fetched already.  So at some point
Kai> headers from the cache need to be added to the list of articles to
Kai> check.  How do I do that?

Kai> Another thing is that we probably don't want to iterate over all
Kai> those long-fetched headers again to see if we need to fetch the
Kai> corresponding article now.  So maybe along with the range of
Kai> articles fetched we should store the value of
Kai> gnus-agent-consider-all-articles and the predicate that was used.
Kai> Then when either one changes we can consider all those
Kai> long-fetched headers again and when they stay the same we only
Kai> need to check the new headers.

First off Kai, I thank you tremendous amounts for looking into
implementing all the features I've been craving for quite some time
with respect to the agent.  I've never had time or brain capacity to
look into it myself (and every time I think I know elisp, all I have
to do is open the gnus code to prove to myself that I don't.  (What
the heck do ',(' and '`(' and '`[,(' mean anyway???)

Anyway, the other option to the above problems is to merely document
it in the variable documentation.  IE, "if you change this value from
nil to t, you must also run gnus-forget-all-my-agent-knowledge or
something to remove the ranges you're worried about.  It might be
easier than implementing the idea in the second paragraph above
(though that would certainly be easier for the end-user).

I keep meaning to post the nnsf (sourceforge
tracker/bugs/patches/... data viewing) backend I've written (it works.
almost), but I need to look into agent interactions with it still and
every I think about it my head starts spinning.

-- 
"The trouble with having an open mind, of course, is that people will
 insist on coming along and trying to put things in it."   -- Terry Pratchett



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23 13:55   ` Wes Hardaker
@ 2002-10-23 14:30     ` Kai Großjohann
  2002-10-23 14:31     ` Kai Großjohann
  2002-10-23 19:14     ` Josh Huber
  2 siblings, 0 replies; 13+ messages in thread
From: Kai Großjohann @ 2002-10-23 14:30 UTC (permalink / raw)


Wes Hardaker <wes@hardakers.net> writes:

> I keep meaning to post the nnsf (sourceforge
> tracker/bugs/patches/... data viewing) backend I've written (it works.
> almost), but I need to look into agent interactions with it still and
> every I think about it my head starts spinning.

Do not wait because of the agent.  It will be very useful without the
agent.

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23 13:55   ` Wes Hardaker
  2002-10-23 14:30     ` Kai Großjohann
@ 2002-10-23 14:31     ` Kai Großjohann
  2002-10-23 19:14     ` Josh Huber
  2 siblings, 0 replies; 13+ messages in thread
From: Kai Großjohann @ 2002-10-23 14:31 UTC (permalink / raw)


Wes Hardaker <wes@hardakers.net> writes:

> Anyway, the other option to the above problems is to merely document
> it in the variable documentation.  IE, "if you change this value from
> nil to t, you must also run gnus-forget-all-my-agent-knowledge or
> something to remove the ranges you're worried about.  It might be
> easier than implementing the idea in the second paragraph above
> (though that would certainly be easier for the end-user).

This is easy to do, of course.  So it remains to be seen whether the
implementation is also easy :-)

People can enter a group and hit `C-u J u' to fetch all those old
articles.

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-22  6:20 Agent downloads too many headers Kai Großjohann
  2002-10-23  6:42 ` Kai Großjohann
@ 2002-10-23 15:54 ` Kai Großjohann
  2002-10-23 20:33   ` Kai Großjohann
  1 sibling, 1 reply; 13+ messages in thread
From: Kai Großjohann @ 2002-10-23 15:54 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> The situation is as follows: in gnus-agent-fetch-headers, if
> gnus-agent-consider-all-articles is non-nil, the list of articles is
> set to the active range of the group.  The active range could be
> something like (1 . 4711), so the list of articles would be (1 2 ...
> 4710 4711).  Then we remove from that list the list of
> already-downloaded articles.  But probably the user has started
> reading the group when the article numbers were greater than 1
> already.  So most probably there are a lot of articles in the
> low-number range which are not in the group at all.
>
> Then the agent fetches the headers from that group for this list of
> articles.
>
> So the agent ends up fetching (almost) all headers for all groups.
>
> What can we do?

I now think the solution is to take a step backwards and to think
about the NOV cache, too (the gnus-agent one), and about the behavior
that we want the agent to have.

Suppose I tell the agent to fetch articles.  Then I read things
offline and compose things and later on I send the queue.  Then I'm
back online and don't use the agent for a while.  I read some
messages while I'm online.  Then I tell the agent to fetch articles
in order to go offline.  Here is the crucial point.  I want the
articles to be fetched that I've read while I was online.

The question is now how to achieve this.

For each group, the agent should store a list (range) of articles
considered already.  This list (range) should be updated whenever the
user does `J s' or `J u' in the group buffer.  So it should be
updated from gnus-agent-fetch-group-1.  That function calls
gnus-agent-fetch-headers to get the articles to consider.  So that's
where we need to store our data.

I think we're getting closer.  Do you follow my logic?  Is it right?

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23 13:55   ` Wes Hardaker
  2002-10-23 14:30     ` Kai Großjohann
  2002-10-23 14:31     ` Kai Großjohann
@ 2002-10-23 19:14     ` Josh Huber
  2 siblings, 0 replies; 13+ messages in thread
From: Josh Huber @ 2002-10-23 19:14 UTC (permalink / raw)


Wes Hardaker <wes@hardakers.net> writes:

> (What the heck do ',(' and '`(' and '`[,(' mean anyway???)

Look at the help for the backquote function.

Basically, it's a macro which lets you have a quoted expression, but
it lets you selectively insert non-quoted values/results as well.  It
also has the handy ,@ splice operator which inserts the contents of an
array as individual items.

(let ((foo '(a b c d e))
      (one 1)
      (two 2)
      (three 3))
  `(,one ,two ,three four five ,@foo))

=> (1 2 3 four five a b c d e)

Get it?

-- 
Josh Huber



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23 15:54 ` Kai Großjohann
@ 2002-10-23 20:33   ` Kai Großjohann
  2002-10-23 21:42     ` Henrik Enberg
  2002-10-25  7:32     ` Danny Siu
  0 siblings, 2 replies; 13+ messages in thread
From: Kai Großjohann @ 2002-10-23 20:33 UTC (permalink / raw)


kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> Suppose I tell the agent to fetch articles.  Then I read things
> offline and compose things and later on I send the queue.  Then I'm
> back online and don't use the agent for a while.  I read some
> messages while I'm online.  Then I tell the agent to fetch articles
> in order to go offline.  Here is the crucial point.  I want the
> articles to be fetched that I've read while I was online.
>
> The question is now how to achieve this.
>
> For each group, the agent should store a list (range) of articles
> considered already.  This list (range) should be updated whenever the
> user does `J s' or `J u' in the group buffer.  So it should be
> updated from gnus-agent-fetch-group-1.  That function calls
> gnus-agent-fetch-headers to get the articles to consider.  So that's
> where we need to store our data.
>
> I think we're getting closer.  Do you follow my logic?  Is it right?

I couldn't wait.  I tried to implement this, and now, finally, I
think it at least does _something_.

Henrik, is it fetching less now?

I've committed this so that others can hack on it while I sleep :-)
Another reason is that tomorrow morning before I got to rush off to
catch the train, the agent fetching will be quicker :-)

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23 20:33   ` Kai Großjohann
@ 2002-10-23 21:42     ` Henrik Enberg
  2002-10-24  7:14       ` Kai Großjohann
  2002-10-25  7:32     ` Danny Siu
  1 sibling, 1 reply; 13+ messages in thread
From: Henrik Enberg @ 2002-10-23 21:42 UTC (permalink / raw)
  Cc: ding

kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> I couldn't wait.  I tried to implement this, and now, finally, I
> think it at least does _something_.
>
> Henrik, is it fetching less now?

Yes.  After the initaial round of fetching all headers, subsequent
fetching DTRT.  Great work.

This has also magically fixed the problem I had with the agent
downloading articles again and again.  So now everything works for me
regardless of the value of `gnus-agent-consider-all-articles'.

> I've committed this so that others can hack on it while I sleep :-)
> Another reason is that tomorrow morning before I got to rush off to
> catch the train, the agent fetching will be quicker :-)

WIBNI `gnus-agent-consider-all-articles' could be made to respect
`gnus-agent-expire-days' so it only tries to keep all articles frome
the last n days in the agent?  My server has some really complex expiry
rules, in some groups the articles stay arount for months, but in some
groups they expire after just a week or so.

It seems like `gnus-agent-consider-all-articles' tries to keep all
articles the server knows about in the agent.  This can lead to cases
where the agent first expires a bunch of articles and then fetches them
again when you do a new fetch.  

Or maybe I'm misunderstanding something?

-- 
Booting... /vmemacs.el



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23 21:42     ` Henrik Enberg
@ 2002-10-24  7:14       ` Kai Großjohann
  2002-10-24 20:51         ` Henrik Enberg
  0 siblings, 1 reply; 13+ messages in thread
From: Kai Großjohann @ 2002-10-24  7:14 UTC (permalink / raw)


Henrik Enberg <henrik@enberg.org> writes:

> kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:
>
>> I couldn't wait.  I tried to implement this, and now, finally, I
>> think it at least does _something_.
>>
>> Henrik, is it fetching less now?
>
> Yes.  After the initaial round of fetching all headers, subsequent
> fetching DTRT.  Great work.

Thanks.  I notice, however, that it keeps fetching some articles that
it shouldn't.  I'm guessing that it is due to the strange code that I
don't understand.  For example the marks code that Paul and I are talking
about.

> This has also magically fixed the problem I had with the agent
> downloading articles again and again.  So now everything works for me
> regardless of the value of `gnus-agent-consider-all-articles'.

Good.

>> I've committed this so that others can hack on it while I sleep :-)
>> Another reason is that tomorrow morning before I got to rush off to
>> catch the train, the agent fetching will be quicker :-)
>
> WIBNI `gnus-agent-consider-all-articles' could be made to respect
> `gnus-agent-expire-days' so it only tries to keep all articles frome
> the last n days in the agent?  My server has some really complex expiry
> rules, in some groups the articles stay arount for months, but in some
> groups they expire after just a week or so.
>
> It seems like `gnus-agent-consider-all-articles' tries to keep all
> articles the server knows about in the agent.  This can lead to cases
> where the agent first expires a bunch of articles and then fetches them
> again when you do a new fetch.  
>
> Or maybe I'm misunderstanding something?

I think you are right.  I did not consider expiry when I made my
change.  It should, however, be fairly simple to add it: add a
predicate to the list of agent predicates and then you can use
`unexpired' or `(not expired)' or whatever in the predicate for the
category.

Can you do this yourself?

My long-term goal is to have the agent reflect the server status.  So
when a message disappears on the server, it should be removed from
the agent, too.  I don't know how to do that, though.

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-24  7:14       ` Kai Großjohann
@ 2002-10-24 20:51         ` Henrik Enberg
  2002-10-25  8:36           ` Kai Großjohann
  0 siblings, 1 reply; 13+ messages in thread
From: Henrik Enberg @ 2002-10-24 20:51 UTC (permalink / raw)
  Cc: ding

kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> Henrik Enberg <henrik@enberg.org> writes:
>
>> WIBNI `gnus-agent-consider-all-articles' could be made to respect
>> `gnus-agent-expire-days' so it only tries to keep all articles frome
>> the last n days in the agent?  My server has some really complex expiry
>> rules, in some groups the articles stay arount for months, but in some
>> groups they expire after just a week or so.
>>
>> It seems like `gnus-agent-consider-all-articles' tries to keep all
>> articles the server knows about in the agent.  This can lead to cases
>> where the agent first expires a bunch of articles and then fetches them
>> again when you do a new fetch.  
>>
>> Or maybe I'm misunderstanding something?
>
> I think you are right.  I did not consider expiry when I made my
> change.  It should, however, be fairly simple to add it: add a
> predicate to the list of agent predicates and then you can use
> `unexpired' or `(not expired)' or whatever in the predicate for the
> category.
>
> Can you do this yourself?

Yes, there's even an example in the manual.  I really need to sit down
and re-read it some day :)

> My long-term goal is to have the agent reflect the server status.  So
> when a message disappears on the server, it should be removed from
> the agent, too.  I don't know how to do that, though.

I'm not all that familiar with the news protocol, but if there's a way
to ask the server for the expiry time of each group, then maybe
gnus-agent-expire-days could have the special value 'server which tells
the agent to expire stuff at the same time the server does.

-- 
Booting... /vmemacs.el



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-23 20:33   ` Kai Großjohann
  2002-10-23 21:42     ` Henrik Enberg
@ 2002-10-25  7:32     ` Danny Siu
  1 sibling, 0 replies; 13+ messages in thread
From: Danny Siu @ 2002-10-25  7:32 UTC (permalink / raw)


Kai Großjohann writes:

  Kai> kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:
  >> Suppose I tell the agent to fetch articles.  Then I read things offline
  >> and compose things and later on I send the queue.  Then I'm back online
  >> and don't use the agent for a while.  I read some messages while I'm
  >> online.  Then I tell the agent to fetch articles in order to go
  >> offline.  Here is the crucial point.  I want the articles to be fetched
  >> that I've read while I was online.

That's what I would expect Agent to do for me.  Before all Kai's recent
changes to the agent code, 'J u' or 'J s' hasn't been working as expect;
some articles were downloaded while other weren't even when all my groups
belong to catagory that has predicate true.

  Kai> Henrik, is it fetching less now?

  Kai> I've committed this so that others can hack on it while I sleep :-)
  Kai> Another reason is that tomorrow morning before I got to rush off to
  Kai> catch the train, the agent fetching will be quicker :-)

Yes.  Agent fetching is doing much less than couple days ago.  Thanks Kai.

-- 
Danny Siu




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Agent downloads too many headers
  2002-10-24 20:51         ` Henrik Enberg
@ 2002-10-25  8:36           ` Kai Großjohann
  0 siblings, 0 replies; 13+ messages in thread
From: Kai Großjohann @ 2002-10-25  8:36 UTC (permalink / raw)
  Cc: ding

Henrik Enberg <henrik@enberg.org> writes:

> I'm not all that familiar with the news protocol, but if there's a way
> to ask the server for the expiry time of each group, then maybe
> gnus-agent-expire-days could have the special value 'server which tells
> the agent to expire stuff at the same time the server does.

This is a possibility.  However, this doesn't catch canceled articles
and such stuff.  My idea was to just ask the server which articles it
has.  The problem I saw was that fetching all headers takes a long
time.  Hm.  But maybe the active data can be used.

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-10-25  8:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-22  6:20 Agent downloads too many headers Kai Großjohann
2002-10-23  6:42 ` Kai Großjohann
2002-10-23 13:55   ` Wes Hardaker
2002-10-23 14:30     ` Kai Großjohann
2002-10-23 14:31     ` Kai Großjohann
2002-10-23 19:14     ` Josh Huber
2002-10-23 15:54 ` Kai Großjohann
2002-10-23 20:33   ` Kai Großjohann
2002-10-23 21:42     ` Henrik Enberg
2002-10-24  7:14       ` Kai Großjohann
2002-10-24 20:51         ` Henrik Enberg
2002-10-25  8:36           ` Kai Großjohann
2002-10-25  7:32     ` Danny Siu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).