Gnus development mailing list
 help / color / mirror / Atom feed
* Abandoning the concept of groups as a storage medium?
@ 2009-04-25 10:05 Jan Rychter
  2009-04-28  8:56 ` David Engster
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Rychter @ 2009-04-25 10:05 UTC (permalink / raw)
  To: ding

Here's some context: I've been a Gnus user for the last 13 years or
so. I use nnml as my mail backend. I used Linux for ten years, then
moved over to Mac OS X. You might say I'm not exactly a newbie.

I've always been trying to make my mail searchable. I tried namazu in
the past, but it didn't work well for my volume of mail. I hacked a
plugin for Spotlight that indexes an NNML spool. It works, but needs
polishing, and is only useful for finding a single E-mail message.

Then I discovered mairix and nnmairix. They are close to what I need,
but there is the issue of flag propagation, which nnmairix doesn't do
for an NNML spool.

Which brings me to my main point. I believe the notion of splitting all
mail into groups is fundamentally flawed. Yes, it makes sense for
mailing lists, but it doesn't for pretty much anything else. I would
much rather assign tags and have a good search interface. I want to
access my mail in a multitude of ways, searching by date, sender, tags,
and picking out entire conversations (threads). I want to be able to set
flags and tags on *any* E-mail message *anywhere*, not just in the "real
group it belongs to". I don't want my "groups" to have anything to do
with the way my messages are stored. E-mail should be stored in a
key/value store with metadata copied and indexed separately.

The problem I see with Gnus is that it is designed around a central
concept of a mail backend which exposes groups. The registry, if I
understand correctly, is a workaround for some of the problems people
encountered with this approach.

Is there any way to achieve what I want with Gnus? Is anybody working on
something of the kind? What would be the possible approaches?

--J.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Abandoning the concept of groups as a storage medium?
  2009-04-25 10:05 Abandoning the concept of groups as a storage medium? Jan Rychter
@ 2009-04-28  8:56 ` David Engster
  2009-04-29 10:58   ` David Engster
  2009-04-30 19:27   ` Ted Zlatanov
  0 siblings, 2 replies; 6+ messages in thread
From: David Engster @ 2009-04-28  8:56 UTC (permalink / raw)
  To: ding

Jan Rychter <jan@rychter.com> writes:
> Then I discovered mairix and nnmairix. They are close to what I need,
> but there is the issue of flag propagation, which nnmairix doesn't do
> for an NNML spool.

I have fixed this. I'll send you a patch soon, but I want to test this a
bit further. nnml is tricky, because it can save marks in .newsrc.eld as
well as the .marks files.

However, mairix won't know anything about marks from nnml spools,
meaning you won't be able to search for ticked/read/replied
articles. This can only be fixed within mairix, but since I've never got
any reaction from the maintainer about the maildir patches, I'm not
terribly motivated to work on this feature.

[...]
> I want to be able to set flags and tags on *any* E-mail message
> *anywhere*, not just in the "real group it belongs to". I don't want
> my "groups" to have anything to do with the way my messages are
> stored. E-mail should be stored in a key/value store with metadata
> copied and indexed separately.
>
> The problem I see with Gnus is that it is designed around a central
> concept of a mail backend which exposes groups. 

Yes, but I don't see this as a restriction for what you would like to
have. Gnus can handle "dynamic" groups, where the contents changes all
the time, although it requires a lot of work to get right. As you say,
nnmairix comes pretty close, and it strictly works with the Gnus group
back end API. My main problem with maintaining nnmairix is that the back
ends behave differently, especially when it comes to marks and unread
count.

> The registry, if I understand correctly, is a workaround for some of
> the problems people encountered with this approach.
>
> Is there any way to achieve what I want with Gnus? Is anybody working on
> something of the kind? What would be the possible approaches?

We just had this discussion, and Ted raised some interesting points
regarding the registry and how it could be extended, and I agree with
him.

I think the registry should save all important headers of a message, and
maybe also some MIME information, like attachment names. Of course, as
Ted also said, this information can't be saved anymore in plain text
files like gnus.registry.eld, but needs some kind of external database
back end.

We could then add a back end which can create virtual groups based on
registry information. One could extend nnir to do that, but I'd vote for
creating a completely new one.

However, full text search is another matter entirely. This simply cannot
be done in Emacs Lisp.

-David



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Abandoning the concept of groups as a storage medium?
  2009-04-28  8:56 ` David Engster
@ 2009-04-29 10:58   ` David Engster
  2009-04-30 19:27   ` Ted Zlatanov
  1 sibling, 0 replies; 6+ messages in thread
From: David Engster @ 2009-04-29 10:58 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

David Engster <deng@randomsample.de> writes:
> Jan Rychter <jan@rychter.com> writes:
>> Then I discovered mairix and nnmairix. They are close to what I need,
>> but there is the issue of flag propagation, which nnmairix doesn't do
>> for an NNML spool.
>
> I have fixed this. I'll send you a patch soon, but I want to test this a
> bit further. nnml is tricky, because it can save marks in .newsrc.eld as
> well as the .marks files.

OK, here it is. Please apply the attached patch against nnmairix from
the current Gnus CVS.

You do not need a patched mairix for using this. Just activate marks
propagation for one of your mairix groups and see what happens. I
couldn't really test this in depth, so you better have a backup of your
mails in case nnmairix messes up your marks or the active file.

Regards,
David


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: nnmairix-nnml-patch.diff --]
[-- Type: text/x-patch, Size: 2644 bytes --]

Index: nnmairix.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/nnmairix.el,v
retrieving revision 7.13
diff -u -r7.13 nnmairix.el
--- nnmairix.el	22 Jan 2009 07:02:16 -0000	7.13
+++ nnmairix.el	29 Apr 2009 10:54:13 -0000
@@ -522,7 +522,7 @@
 	(when (eq nnmairix-backend 'nnml)
 	  (when nnmairix-rename-files-for-nnml
 	    (nnmairix-rename-files-consecutively mfolder))
-	  (nnml-generate-nov-databases-directory mfolder))
+	  (nnml-generate-nov-databases-directory mfolder nil t))
 	(nnmairix-call-backend
 	 "request-scan" folder nnmairix-backend-server)
 	(if (and fast allowfast)
@@ -936,7 +936,8 @@
 If PROPMARKS is a positive number, set parameter to t.
 If PROPMARKS is a negative number, set it to nil."
   (interactive)
-  (unless (nnmairix-check-mairix-version "maildirpatch")
+  (unless (or (eq nnmairix-backend 'nnml)
+	      (nnmairix-check-mairix-version "maildirpatch"))
     (error "You need a mairix binary with maildir patch to use this feature.  See docs for details"))
   (let ((group (gnus-group-group-name)))
     (when (or (not (string= (gnus-group-short-name group)
@@ -1154,12 +1155,18 @@
   (if nnmairix-marks-cache
       (let (number ogroup number-cache method mid-marks temp)
 	;; first we get the article numbers
-	(catch 'problem
-	  (while (setq ogroup (pop nnmairix-marks-cache))
+	(while (setq ogroup (pop nnmairix-marks-cache))
+	  (catch 'problem
 	    (while (setq mid-marks (pop (cdr ogroup)))
 	      (setq number
-		    (cdr
-		     (gnus-request-head (car mid-marks) (car ogroup))))
+		    (if (eq nnmairix-backend 'nnml)
+			(with-temp-buffer
+			  (nnml-find-id (gnus-group-short-name (car ogroup))
+					(car mid-marks)
+					(gnus-group-server (car ogroup))))
+		      (cdr
+		       (nnmairix-call-backend 'request-head
+					      (gnus-request-head (car mid-marks) (car ogroup))))))
 	      (unless number
 		(nnheader-message
 		 3 "Unable to set mark: couldn't determine article number for %s in %s"
@@ -1187,6 +1194,13 @@
 		   (gnus-group-short-name (car cur))
 		   (cdr cur)
 		   (list (nth 1 method)))
+	    ;; This a hack to enforce a reread of the .marks file
+	    (when (and (eq nnmairix-backend 'nnml)
+		       (not (cdr-safe (assoc 'nnml-marks-is-evil method))))
+	      (let ((file (nnml-group-pathname (gnus-group-short-name (car cur)) 
+					       nnml-marks-file-name 
+					       (gnus-group-server (car cur)))))
+		(gnus-sethash file nil nnml-marks-modtime)))
 	    (gnus-group-jump-to-group (car cur))
 	    (gnus-group-get-new-news-this-group)))
 	(nnheader-message 5 "nnmairix: Propagating marks... done"))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Abandoning the concept of groups as a storage medium?
  2009-04-28  8:56 ` David Engster
  2009-04-29 10:58   ` David Engster
@ 2009-04-30 19:27   ` Ted Zlatanov
  2009-05-04 11:07     ` David Engster
  1 sibling, 1 reply; 6+ messages in thread
From: Ted Zlatanov @ 2009-04-30 19:27 UTC (permalink / raw)
  To: ding

On Tue, 28 Apr 2009 10:56:39 +0200 David Engster <deng@randomsample.de> wrote: 

DE> Jan Rychter <jan@rychter.com> writes:
>> I want to be able to set flags and tags on *any* E-mail message
>> *anywhere*, not just in the "real group it belongs to". I don't want
>> my "groups" to have anything to do with the way my messages are
>> stored. E-mail should be stored in a key/value store with metadata
>> copied and indexed separately.
>> 
>> The problem I see with Gnus is that it is designed around a central
>> concept of a mail backend which exposes groups. 

DE> Yes, but I don't see this as a restriction for what you would like to
DE> have. Gnus can handle "dynamic" groups, where the contents changes all
DE> the time, although it requires a lot of work to get right. As you say,
DE> nnmairix comes pretty close, and it strictly works with the Gnus group
DE> back end API. My main problem with maintaining nnmairix is that the back
DE> ends behave differently, especially when it comes to marks and unread
DE> count.

The big problem I see is that Gnus can't build groups asynchronously.
Emacs Lisp itself is the impediment here.

>> The registry, if I understand correctly, is a workaround for some of
>> the problems people encountered with this approach.
>> 
>> Is there any way to achieve what I want with Gnus? Is anybody working on
>> something of the kind? What would be the possible approaches?

DE> We just had this discussion, and Ted raised some interesting points
DE> regarding the registry and how it could be extended, and I agree with
DE> him.

DE> I think the registry should save all important headers of a message, and
DE> maybe also some MIME information, like attachment names. Of course, as
DE> Ted also said, this information can't be saved anymore in plain text
DE> files like gnus.registry.eld, but needs some kind of external database
DE> back end.

DE> We could then add a back end which can create virtual groups based on
DE> registry information. One could extend nnir to do that, but I'd vote for
DE> creating a completely new one.

nnregistry?

group list: dynamic based on tags (labels) defined by user
article list in group: generated on entry with a tag search
article retrieve: uses the original article backend

I'm definitely not going to get to it anytime soon, but if anyone else
feels adventurous, I'll help out any way I can.

DE> However, full text search is another matter entirely. This simply cannot
DE> be done in Emacs Lisp.

The index necessary for good search performance would be huge, but easy
to store in a database (on a server, on IMAP, whatever).  It's easy to
parallelize these searches (especially with IMAP as the backend).  So
there's some hope.

Ted




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Abandoning the concept of groups as a storage medium?
  2009-04-30 19:27   ` Ted Zlatanov
@ 2009-05-04 11:07     ` David Engster
  2009-05-08 18:18       ` Ted Zlatanov
  0 siblings, 1 reply; 6+ messages in thread
From: David Engster @ 2009-05-04 11:07 UTC (permalink / raw)
  To: ding

Ted Zlatanov <tzz@lifelogs.com> writes:
> On Tue, 28 Apr 2009 10:56:39 +0200 David Engster <deng@randomsample.de> wrote: 
>
> DE> Jan Rychter <jan@rychter.com> writes:
>>> I want to be able to set flags and tags on *any* E-mail message
>>> *anywhere*, not just in the "real group it belongs to". I don't want
>>> my "groups" to have anything to do with the way my messages are
>>> stored. E-mail should be stored in a key/value store with metadata
>>> copied and indexed separately.
>>> 
>>> The problem I see with Gnus is that it is designed around a central
>>> concept of a mail backend which exposes groups. 
>
> DE> Yes, but I don't see this as a restriction for what you would like to
> DE> have. Gnus can handle "dynamic" groups, where the contents changes all
> DE> the time, although it requires a lot of work to get right. As you say,
> DE> nnmairix comes pretty close, and it strictly works with the Gnus group
> DE> back end API. My main problem with maintaining nnmairix is that the back
> DE> ends behave differently, especially when it comes to marks and unread
> DE> count.
>
> The big problem I see is that Gnus can't build groups asynchronously.
> Emacs Lisp itself is the impediment here.

That's why it better has to build those groups fast. ;-)

But you're right, of course. There has been some work on introducing
parallelism into Emacs Lisp on emacs-devel. Maybe Emacs 24 will have
something like that...

> DE> I think the registry should save all important headers of a message, and
> DE> maybe also some MIME information, like attachment names. Of course, as
> DE> Ted also said, this information can't be saved anymore in plain text
> DE> files like gnus.registry.eld, but needs some kind of external database
> DE> back end.
>
> DE> We could then add a back end which can create virtual groups based on
> DE> registry information. One could extend nnir to do that, but I'd vote for
> DE> creating a completely new one.
>
> nnregistry?

Makes sense. :-)

> group list: dynamic based on tags (labels) defined by user
> article list in group: generated on entry with a tag search
> article retrieve: uses the original article backend

Exactly. What makes this stuff difficult is that article numbers in Gnus
have to be unique, just like in IMAP. nnmairix "solves" this problem by
adding an offset to the article numbers each time a group is re-created.

> I'm definitely not going to get to it anytime soon, but if anyone else
> feels adventurous, I'll help out any way I can.

My guess is that most of the needed code already exists in nnir and
nnmairix. Unfortunately, I won't have time to do it either, at least not
in the coming months.

> DE> However, full text search is another matter entirely. This simply cannot
> DE> be done in Emacs Lisp.
>
> The index necessary for good search performance would be huge, but easy
> to store in a database (on a server, on IMAP, whatever).  It's easy to
> parallelize these searches (especially with IMAP as the backend).  So
> there's some hope.

Maybe. But just feeding such a database through Emacs Lisp would already
be a very time consuming task, considering the huge amount of mails many
people have (I have about 1GB, and I think this is pretty average).

-David



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Abandoning the concept of groups as a storage medium?
  2009-05-04 11:07     ` David Engster
@ 2009-05-08 18:18       ` Ted Zlatanov
  0 siblings, 0 replies; 6+ messages in thread
From: Ted Zlatanov @ 2009-05-08 18:18 UTC (permalink / raw)
  To: ding

On Mon, 04 May 2009 13:07:20 +0200 David Engster <deng@randomsample.de> wrote: 

DE> I think the registry should save all important headers of a message, and
DE> maybe also some MIME information, like attachment names. Of course, as
DE> Ted also said, this information can't be saved anymore in plain text
DE> files like gnus.registry.eld, but needs some kind of external database
DE> back end.

You're probably thinking of saving the keywords too, yes?  Generally any
key-val map should be storeable; this is ridiculously easy with proper
hashtable serialization, which is coming in Emacs CVS once the pretest
is over.  I have a patch ready to go.  With that serialization, we can
finally write out the registry whole or piece by piece in one simple
command, and read it back just as easily.

With an IMAP storage backend for the registry, this could result in
registry entries stored in messages like this:

Subject: [entry subject]
Keywords: [entry keywords]
Message-ID: [entry ID].gnus.registry (so it doesn't collide with the original)
... other headers ...

[whole serialized entry as the body]

so then we could do IMAP SEARCH by keyword, for example.

>> nnregistry?
>> group list: dynamic based on tags (labels) defined by user
>> article list in group: generated on entry with a tag search
>> article retrieve: uses the original article backend

DE> Exactly. What makes this stuff difficult is that article numbers in Gnus
DE> have to be unique, just like in IMAP. nnmairix "solves" this problem by
DE> adding an offset to the article numbers each time a group is re-created.

I wonder if it would be possible to use message IDs as article numbers.

DE> However, full text search is another matter entirely. This simply cannot
DE> be done in Emacs Lisp.
>> 
>> The index necessary for good search performance would be huge, but easy
>> to store in a database (on a server, on IMAP, whatever).  It's easy to
>> parallelize these searches (especially with IMAP as the backend).  So
>> there's some hope.

DE> Maybe. But just feeding such a database through Emacs Lisp would already
DE> be a very time consuming task, considering the huge amount of mails many
DE> people have (I have about 1GB, and I think this is pretty average).

I think it's possible, as long as we keep the indexing and searching on
the server, and Emacs Lisp is only the query and display agent.
Displaying large amounts of data is not so bad if you can get it 50
results at a time...

Ted




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-05-08 18:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-25 10:05 Abandoning the concept of groups as a storage medium? Jan Rychter
2009-04-28  8:56 ` David Engster
2009-04-29 10:58   ` David Engster
2009-04-30 19:27   ` Ted Zlatanov
2009-05-04 11:07     ` David Engster
2009-05-08 18:18       ` Ted Zlatanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).