Gnus development mailing list
 help / color / mirror / Atom feed
* Question about article identification and backends.
@ 2003-05-17 19:56 Rob Browning
  2003-05-17 21:29 ` Andreas Fuchs
  0 siblings, 1 reply; 16+ messages in thread
From: Rob Browning @ 2003-05-17 19:56 UTC (permalink / raw)



Right now Gnus appears very dependent on the "group and number" pair
to uniquely identify an article.  This has the unfortunate effect of
making it hard to track an article across moves, or within several
groups, at least without relying on the Message-ID (which I presume
can't be trusted and aren't likely to be fast).

In addition, one of the few things that bothers me about Gnus is the
fact that if I need to file a message from my inbox (or wherever) into
another group which already has a bunch of newer messages, there's no
way to have it inserted in its "arrival sequence" without reordering
all the other messages, possibly destroying the "group and number"
consistency for all the group articles and probably breaking other
Gnus features.

I believe I recall some talk about overhauling the backend interfaces,
and I was just wondering if there was any likelihood that Gnus might
be able to switch to an approach where each message is assigned a
guaranteed unique ID (integer or other) when it's first seen, and
where (at least within a given backend) group membership and
group-sequence information is treated as mutable data.  In part I'm
asking because I don't have a good idea whether or not the assumption
of article "group and number" immutability is just too entrenched to
change.

I had been thinking about this for a while, but this paper (posted by
Andreas) talks about something similar.  Although it uses
hash/timestamp for the unique ids (I think you'd also want to include
the headers in the hash, at least for the UID), for most purposes, I
suspect even just a simple monotonically increasing integer serial
number would be sufficient.

  http://www.informatik.uni-freiburg.de/~thiemann/papers/mailstore.pdf

Thanks

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-17 19:56 Question about article identification and backends Rob Browning
@ 2003-05-17 21:29 ` Andreas Fuchs
  2003-05-18  2:57   ` Rob Browning
  2003-05-18  3:19   ` Rob Browning
  0 siblings, 2 replies; 16+ messages in thread
From: Andreas Fuchs @ 2003-05-17 21:29 UTC (permalink / raw)


Today, Rob Browning <rlb@defaultvalue.org> wrote:
> I had been thinking about this for a while, but this paper (posted by
> Andreas) talks about something similar.  Although it uses
> hash/timestamp for the unique ids (I think you'd also want to include
> the headers in the hash, at least for the UID), for most purposes, I
> suspect even just a simple monotonically increasing integer serial
> number would be sufficient.

I fear not - concurrency issues would make the whole thing fall apart.

Message A arrives at host x; gets next free number, 23.
Message B arrives at host y; gets next free number, 23.
You rsync messages from x -> y; Disaster.

The basic idea behind this is that you don't need to assign
pseudo-unique identities to messages because they are unique by
themselves (more or less, but MD5 should give a pretty good
approximation of unique for the purpose of mail storage).

The only problem I see with this approach is that you don't get a clean
number<->article-id mapping; nnmaildir's solution to this problem might
also apply for integrating the RMS into gnus.

Any takers? (-;
-- 
Andreas Fuchs, <asf@acm.org>, asf@jabber.at, antifuchs
irc.freenode.net's #emacs - online emacs advice from IRC addicts




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-17 21:29 ` Andreas Fuchs
@ 2003-05-18  2:57   ` Rob Browning
  2003-05-18  3:19   ` Rob Browning
  1 sibling, 0 replies; 16+ messages in thread
From: Rob Browning @ 2003-05-18  2:57 UTC (permalink / raw)


Andreas Fuchs <asf@void.at> writes:

>> the headers in the hash, at least for the UID), for most purposes, I
>> suspect even just a simple monotonically increasing integer serial
>> number would be sufficient.
>
> I fear not - concurrency issues would make the whole thing fall apart.

Certainly, but when I mentioned the monotonically increasing serial
number approach, I wasn't thinking about the RMS system.  I was
thinking about a case where Gnus is the only one handling incoming
messages.  i.e. an approach where there is no "Gnus unique number"
until Gnus assigns one.  More specifically, I was thinking in terms of
an SQL serial column where the DB handles the concurrency issues.  I
mentioned the RMS system primarily because it also discussed the
values of a permanent message ID (and because it sounds like an
interesting system).  Though if implementing/using RMS, I probably
would want to include the headers in the hash (as they discuss
speculatively).

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-17 21:29 ` Andreas Fuchs
  2003-05-18  2:57   ` Rob Browning
@ 2003-05-18  3:19   ` Rob Browning
  2003-05-18  9:52     ` Kai Großjohann
  1 sibling, 1 reply; 16+ messages in thread
From: Rob Browning @ 2003-05-18  3:19 UTC (permalink / raw)


Andreas Fuchs <asf@void.at> writes:

> The only problem I see with this approach is that you don't get a
> clean number<->article-id mapping; nnmaildir's solution to this
> problem might also apply for integrating the RMS into gnus.

I think you may be referring to the main thing I'm wondering about --
how difficult would it be for Gnus to change such that it doesn't
require all of the following:

  A few remarks about these article numbers might be useful.  First of
  all, the numbers are positive integers.  Secondly, it is normally
  not possible for later articles to `re-use' older article numbers
  without confusing Gnus.  That is, if a group has ever contained a
  message numbered 42, then no other message may get that number, or
  Gnus will get mightily confused.(1) Third, article numbers must be
  assigned in order of arrival in the group; this is not necessarily
  the same as the date of the message.

i.e. how reasonable might it be, and how difficult the task to change
Gnus to rely on stable unique article ids (global ones -- or at least
global within a backend) rather than stable unique per-group integers?
And given such a change, could Gnus perhaps then just expect an
ordered list of IDs from the backend when it wants the articles in a
group, and not mind if this order changes from backend query to
backend query?

Among other things, I presume this would require Gnus to reorient to
key marks by a group-id/unique-id pair rather than the group/position
pair it uses now.  The sequence position of an article within a group
would either have to be stored independently, or perhaps it could even
be dropped entirely if people were happy with just having by-date (or
maybe by arrival time) sorting -- not a big deal to me either way.

Thanks

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-18  3:19   ` Rob Browning
@ 2003-05-18  9:52     ` Kai Großjohann
  2003-05-18 18:38       ` Rob Browning
  2003-05-19 16:47       ` Paul Jarc
  0 siblings, 2 replies; 16+ messages in thread
From: Kai Großjohann @ 2003-05-18  9:52 UTC (permalink / raw)


Rob Browning <rlb@defaultvalue.org> writes:

> I think you may be referring to the main thing I'm wondering about --
> how difficult would it be for Gnus to change such that it doesn't
> require all of the following:
>
>   A few remarks about these article numbers might be useful.  First of
>   all, the numbers are positive integers.  Secondly, it is normally
>   not possible for later articles to `re-use' older article numbers
>   without confusing Gnus.  That is, if a group has ever contained a
>   message numbered 42, then no other message may get that number, or
>   Gnus will get mightily confused.(1) Third, article numbers must be
>   assigned in order of arrival in the group; this is not necessarily
>   the same as the date of the message.
>
> i.e. how reasonable might it be, and how difficult the task to change
> Gnus to rely on stable unique article ids (global ones -- or at least
> global within a backend) rather than stable unique per-group integers?

I guess it might not be that difficult.  It seems that the current
problem is: say you have read all of the articles in a certain group,
that is, articles 1 through 100.  Now you move an article numbered 42
into that group.  It is unread.  Then Gnus will think it is read.

But it ought to be possible to tweak Gnus so that it just splits the
read sequence from 1-100 into 1-41,43-100.

Maybe it is even possible using the backend interface: the backend
could tell Gnus the new list 1-41,43-100 of read articles.
-- 
This line is not blank.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-18  9:52     ` Kai Großjohann
@ 2003-05-18 18:38       ` Rob Browning
  2003-05-18 18:50         ` Rob Browning
                           ` (2 more replies)
  2003-05-19 16:47       ` Paul Jarc
  1 sibling, 3 replies; 16+ messages in thread
From: Rob Browning @ 2003-05-18 18:38 UTC (permalink / raw)


kai.grossjohann@gmx.net (Kai Großjohann) writes:

> I guess it might not be that difficult.  It seems that the current
> problem is: say you have read all of the articles in a certain group,
> that is, articles 1 through 100.  Now you move an article numbered 42
> into that group.  It is unread.  Then Gnus will think it is read.

True, though for my own use, now that I think about it, I'd probably
want at least some of the marks to follow the article, though I
realize there are potentially sticky issues here.  Actually, I might
just prefer to have gnus-summary-move-marking-as and
gnus-summary-copy-marking-as.  With each you would invoke the
function, it would expect a keypress to indicate the mark (?, u, U, d,
etc.) and then it would process the article(s).  You could possibly
also have a gnus-default-group-added-article-mark that would be
initially set to " ".

It'd also be nice to have the ability to say "delete this article from
the current group" *and* "delete this article from all groups".  Come
to think of it we may already have that, but I'd prefer it were keyed
on a unique ID we control, rather than one provided from the outside.

> But it ought to be possible to tweak Gnus so that it just splits the
> read sequence from 1-100 into 1-41,43-100.
>
> Maybe it is even possible using the backend interface: the backend
> could tell Gnus the new list 1-41,43-100 of read articles.

I thought one of the problems would still be that if you renumber
articles, some Gnus features like agent? perhaps, get very unhappy
(i.e. break).  They'd have to be keyed on the global article unique ID
(integer or whatever) to fix that.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-18 18:38       ` Rob Browning
@ 2003-05-18 18:50         ` Rob Browning
  2003-05-19 13:02           ` Kai Großjohann
  2003-05-19 13:02         ` Kai Großjohann
  2003-05-19 16:57         ` Paul Jarc
  2 siblings, 1 reply; 16+ messages in thread
From: Rob Browning @ 2003-05-18 18:50 UTC (permalink / raw)


Rob Browning <rlb@defaultvalue.org> writes:

> It'd also be nice to have the ability to say "delete this article
> from the current group" *and* "delete this article from all groups".

This should have read:

  It'd also be nice to have the ability to say either "delete this
  article from the current group" or "delete this article from all
  groups".

i.e. I'd like to be able to pick which I wanted, and in the "delete
the article from the current group" case, I'd like the underlying
unique article to be garbage collected iff it's not referenced from
elsewhere.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-18 18:50         ` Rob Browning
@ 2003-05-19 13:02           ` Kai Großjohann
  0 siblings, 0 replies; 16+ messages in thread
From: Kai Großjohann @ 2003-05-19 13:02 UTC (permalink / raw)


Rob Browning <rlb@defaultvalue.org> writes:

> Rob Browning <rlb@defaultvalue.org> writes:
>
>> It'd also be nice to have the ability to say "delete this article
>> from the current group" *and* "delete this article from all groups".
>
> This should have read:
>
>   It'd also be nice to have the ability to say either "delete this
>   article from the current group" or "delete this article from all
>   groups".
>
> i.e. I'd like to be able to pick which I wanted, and in the "delete
> the article from the current group" case, I'd like the underlying
> unique article to be garbage collected iff it's not referenced from
> elsewhere.

That would be nice.  Note that `B m' in an nnml server uses
hardlinks if available.  The "from current group" optional already
exists, only "from all groups" is missing.  IIUC.
-- 
This line is not blank.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-18 18:38       ` Rob Browning
  2003-05-18 18:50         ` Rob Browning
@ 2003-05-19 13:02         ` Kai Großjohann
  2003-05-19 16:57         ` Paul Jarc
  2 siblings, 0 replies; 16+ messages in thread
From: Kai Großjohann @ 2003-05-19 13:02 UTC (permalink / raw)


Rob Browning <rlb@defaultvalue.org> writes:

> kai.grossjohann@gmx.net (Kai Großjohann) writes:
>
>> I guess it might not be that difficult.  It seems that the current
>> problem is: say you have read all of the articles in a certain group,
>> that is, articles 1 through 100.  Now you move an article numbered 42
>> into that group.  It is unread.  Then Gnus will think it is read.
>
> True, though for my own use, now that I think about it, I'd probably
> want at least some of the marks to follow the article, though I
> realize there are potentially sticky issues here.

Yeah, I was trying to say where Gnus currently *fails*.  Then I
suggested a workaround.
-- 
This line is not blank.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-18  9:52     ` Kai Großjohann
  2003-05-18 18:38       ` Rob Browning
@ 2003-05-19 16:47       ` Paul Jarc
  2003-05-19 20:49         ` Kai Großjohann
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Jarc @ 2003-05-19 16:47 UTC (permalink / raw)


kai.grossjohann@gmx.net (Kai Großjohann) wrote:
> It seems that the current problem is: say you have read all of the
> articles in a certain group, that is, articles 1 through 100.  Now
> you move an article numbered 42 into that group.  It is unread.
> Then Gnus will think it is read.
>
> But it ought to be possible to tweak Gnus so that it just splits the
> read sequence from 1-100 into 1-41,43-100.

If all articles from 1 to 100 already exist, and we are inserting a
new article at position 42, then the new read list is 1-41,43-101.
The positions of all the later articles have to be changed.  Since
articles are currently *identified* by position, this is more work
than just a tweak.

> Maybe it is even possible using the backend interface: the backend
> could tell Gnus the new list 1-41,43-100 of read articles.

That's (sort of) how nnmaildir used to do it.  But it breaks the
agent, the cache, and 'seen marks, at least.


paul



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-18 18:38       ` Rob Browning
  2003-05-18 18:50         ` Rob Browning
  2003-05-19 13:02         ` Kai Großjohann
@ 2003-05-19 16:57         ` Paul Jarc
  2 siblings, 0 replies; 16+ messages in thread
From: Paul Jarc @ 2003-05-19 16:57 UTC (permalink / raw)
  Cc: ding

Rob Browning <rlb@defaultvalue.org> wrote:
> Actually, I might just prefer to have gnus-summary-move-marking-as
> and gnus-summary-copy-marking-as.  With each you would invoke the
> function, it would expect a keypress to indicate the mark (?, u, U,
> d, etc.) and then it would process the article(s).  You could
> possibly also have a gnus-default-group-added-article-mark that
> would be initially set to " ".

That's similar to an idea I had a while ago: replacing
gnus-gcc-mark-as-read with gnuc-gcc-marks, which would be a list of
mark symbols that would be automatically added to Gcc'ed articles.

NB: any code that works with marks should always use the symbols, not
the characters, whenever possible.  The symbols represent the true
state of an article, and are nearly orthogonal.  The characters are a
lossy, non-orthogonal representation of the symbols, used for the sake
of conserving screen space.  But there's no reason to inflict lossage
on other parts of the code.


paul



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-19 16:47       ` Paul Jarc
@ 2003-05-19 20:49         ` Kai Großjohann
  2003-05-19 20:57           ` Paul Jarc
  2003-05-20 17:33           ` Rob Browning
  0 siblings, 2 replies; 16+ messages in thread
From: Kai Großjohann @ 2003-05-19 20:49 UTC (permalink / raw)


prj@po.cwru.edu (Paul Jarc) writes:

> kai.grossjohann@gmx.net (Kai Großjohann) wrote:
>> It seems that the current problem is: say you have read all of the
>> articles in a certain group, that is, articles 1 through 100.  Now
>> you move an article numbered 42 into that group.  It is unread.
>> Then Gnus will think it is read.
>>
>> But it ought to be possible to tweak Gnus so that it just splits the
>> read sequence from 1-100 into 1-41,43-100.
>
> If all articles from 1 to 100 already exist, and we are inserting a
> new article at position 42, then the new read list is 1-41,43-101.
> The positions of all the later articles have to be changed.  Since
> articles are currently *identified* by position, this is more work
> than just a tweak.

Why 101?  We were talking about all articles having a unique id in
the whole backend.  This means that there was no article 42
previously in the group -- there is only one article with number 42.

(Articles coming from other backends should get new, higher, numbers,
I guess.)

>> Maybe it is even possible using the backend interface: the backend
>> could tell Gnus the new list 1-41,43-100 of read articles.
>
> That's (sort of) how nnmaildir used to do it.  But it breaks the
> agent, the cache, and 'seen marks, at least.

Hm.  You have been there and done it.  So it's better to believe you :-)

-- 
This line is not blank.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-19 20:49         ` Kai Großjohann
@ 2003-05-19 20:57           ` Paul Jarc
  2003-05-20 10:56             ` Kai Großjohann
  2003-05-20 17:33           ` Rob Browning
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Jarc @ 2003-05-19 20:57 UTC (permalink / raw)


kai.grossjohann@gmx.net (Kai Großjohann) wrote:
> prj@po.cwru.edu (Paul Jarc) writes:
>> If all articles from 1 to 100 already exist, and we are inserting a
>> new article at position 42, then the new read list is 1-41,43-101.
>> The positions of all the later articles have to be changed.
>
> Why 101?  We were talking about all articles having a unique id in
> the whole backend.  This means that there was no article 42
> previously in the group -- there is only one article with number 42.

Ah, ok.  That might work.  But then how far would we be from the
largest Emacs integer if we sequentially numbered all the articles in
all the groups on a large news server?


paul



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-19 20:57           ` Paul Jarc
@ 2003-05-20 10:56             ` Kai Großjohann
  0 siblings, 0 replies; 16+ messages in thread
From: Kai Großjohann @ 2003-05-20 10:56 UTC (permalink / raw)


prj@po.cwru.edu (Paul Jarc) writes:

> Ah, ok.  That might work.  But then how far would we be from the
> largest Emacs integer if we sequentially numbered all the articles in
> all the groups on a large news server?

We need bignums!
-- 
This line is not blank.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-19 20:49         ` Kai Großjohann
  2003-05-19 20:57           ` Paul Jarc
@ 2003-05-20 17:33           ` Rob Browning
  2003-05-22  7:04             ` Kai Großjohann
  1 sibling, 1 reply; 16+ messages in thread
From: Rob Browning @ 2003-05-20 17:33 UTC (permalink / raw)


kai.grossjohann@gmx.net (Kai Großjohann) writes:

> Why 101?  We were talking about all articles having a unique id in
> the whole backend.  This means that there was no article 42
> previously in the group -- there is only one article with number 42.
>
> (Articles coming from other backends should get new, higher, numbers,
> I guess.)

So with respect to ordering the articles in a group, were you thinking
that gnus would:

  - just keep a separate per-group list ordering the articles in a
    group.

  - default to order by "Date:" (or some new "X-Gnus-Arrival-Date:" header).

  - default to order by the unique article ID, presuming that the ID
    included the arrival date (a la maildir?).

  - something else.

In particular, if when I enter a large group and gnus asks me how many
articles, how would that article subset be chosen?  I'd really like
for it to always be the N most recent articles either by Date: or
arrival date by default.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org; previously @cs.utexas.edu
GPG starting 2002-11-03 = 14DD 432F AE39 534D B592  F9A0 25C8 D377 8C7E 73A4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Question about article identification and backends.
  2003-05-20 17:33           ` Rob Browning
@ 2003-05-22  7:04             ` Kai Großjohann
  0 siblings, 0 replies; 16+ messages in thread
From: Kai Großjohann @ 2003-05-22  7:04 UTC (permalink / raw)


Rob Browning <rlb@defaultvalue.org> writes:

> So with respect to ordering the articles in a group, were you thinking
> that gnus would:
>
>   - just keep a separate per-group list ordering the articles in a
>     group.

No.

>   - default to order by "Date:" (or some new "X-Gnus-Arrival-Date:" header).

That would be slower, but possible.

>   - default to order by the unique article ID, presuming that the ID
>     included the arrival date (a la maildir?).

That's what I prefer.  In fact, I was kind of assuming that the
backend just counts articles in a server, just like nnml now counts
articles in a group.

> In particular, if when I enter a large group and gnus asks me how many
> articles, how would that article subset be chosen?  I'd really like
> for it to always be the N most recent articles either by Date: or
> arrival date by default.

That's a thorny issue.  Hm.  But nnml in principle has the same
problem, so maybe a similar solution could be devised?

However, I'm afraid it might boil down to keeping a list of numbers
for each group, which is kinda like the first suggestion that you made
that I rejected.  Hm.  Well, I think the Agent already keeps such
lists, and it works in principle, so the amount of data involved is
not too big.

-- 
This line is not blank.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-05-22  7:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-17 19:56 Question about article identification and backends Rob Browning
2003-05-17 21:29 ` Andreas Fuchs
2003-05-18  2:57   ` Rob Browning
2003-05-18  3:19   ` Rob Browning
2003-05-18  9:52     ` Kai Großjohann
2003-05-18 18:38       ` Rob Browning
2003-05-18 18:50         ` Rob Browning
2003-05-19 13:02           ` Kai Großjohann
2003-05-19 13:02         ` Kai Großjohann
2003-05-19 16:57         ` Paul Jarc
2003-05-19 16:47       ` Paul Jarc
2003-05-19 20:49         ` Kai Großjohann
2003-05-19 20:57           ` Paul Jarc
2003-05-20 10:56             ` Kai Großjohann
2003-05-20 17:33           ` Rob Browning
2003-05-22  7:04             ` Kai Großjohann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).