Gnus development mailing list
 help / color / mirror / Atom feed
* Re: Indexing Gnus (and other...) mails
       [not found]   ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org>
@ 2009-04-08 18:19     ` Ted Zlatanov
  2009-04-08 18:28       ` Tassilo Horn
                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ted Zlatanov @ 2009-04-08 18:19 UTC (permalink / raw)
  Cc: Ding Mailing List

The following message is a courtesy copy of an article
that has been posted to gnu.emacs.gnus as well.

On Wed, 08 Apr 2009 18:46:32 +0200 egallego@babel.ls.fi.upm.es (Emilio Jesús Gallego Arias) wrote: 

EJGA> Tassilo Horn <tassilo@member.fsf.org> writes:
>> You could try mairix (+ the gnus nnmairix backend).  According to its
>> homepage, it should be quite fast while indexing.  (But still 30GB is
>> quite a lot...)

EJGA> I couldn't be happier with mairix (just 10Gb of email tough)

EJGA> In fact, it is so fast that I use it to simulate "Gmail" like
EJGA> threads. Just index your sent mails archive and search for a subject,
EJGA> mairix will create a group with all the mails.

This made me think about IMAP specifically.

Gnus (imap.el AFAICT, so the support is missing all the way down) does
not support the IMAP SEARCH command, except by UID.  It probably should
allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the other
standard search keys in RFC 3501 (section 6.4.4) [1].

I don't know how IMAP servers implement SEARCH.  Is the speed decent?
If not, that should be an issue for the server maintainers (or they can
allow search plugins, so things like mairix can be integrated).  It
seems to me that IMAP SEARCH is a good way to provide universal
searching in Gnus for IMAP backends.

Obviously mairix (with nnmairix) is still useful, and perhaps Gnus
should have backend searching capabilities that go beyond just limiting
the full list of articles.  But IMAP SEARCH support seems to me to be an
essential piece of building a good Gnus search solution that doesn't
depend on mairix or any other search tools, but can use them when they
are available.

Ideas?  Suggestions?  Did I overlook something?

Thanks
Ted

[1] http://www.ietf.org/rfc/rfc3501.txt



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-08 18:19     ` Indexing Gnus (and other...) mails Ted Zlatanov
@ 2009-04-08 18:28       ` Tassilo Horn
  2009-04-09  7:15         ` Tassilo Horn
  2009-04-09 13:02       ` Ted Zlatanov
       [not found]       ` <86ws9uugj0.fsf__11476.5163431459$1239285119$gmane$org@lifelogs.com>
  2 siblings, 1 reply; 7+ messages in thread
From: Tassilo Horn @ 2009-04-08 18:28 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: Ding Mailing List

Ted Zlatanov <tzz@lifelogs.com> writes:

Hi Ted,

> EJGA> Tassilo Horn <tassilo@member.fsf.org> writes:
>>> You could try mairix (+ the gnus nnmairix backend).  According to
>>> its homepage, it should be quite fast while indexing.  (But still
>>> 30GB is quite a lot...)
>
> EJGA> I couldn't be happier with mairix (just 10Gb of email tough)
>
> EJGA> In fact, it is so fast that I use it to simulate "Gmail" like
> EJGA> threads. Just index your sent mails archive and search for a
> EJGA> subject, mairix will create a group with all the mails.
>
> This made me think about IMAP specifically.
>
> Gnus (imap.el AFAICT, so the support is missing all the way down) does
> not support the IMAP SEARCH command, except by UID.  It probably
> should allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the
> other standard search keys in RFC 3501 (section 6.4.4) [1].
>
> I don't know how IMAP servers implement SEARCH.  Is the speed decent?

I use a local dovecot server and it doesn't index anything except
message ids.  When I search for something (using nnir) it'll take quite
a long time for large groups and it constantly accesses the filesystem.
I guess something like grep is used here.

Bye,
Tassilo
-- 
No person,  no idea, and no  religion deserves to be  illegal to insult,
not even the Church of Emacs. (Richard M. Stallman)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-08 18:28       ` Tassilo Horn
@ 2009-04-09  7:15         ` Tassilo Horn
  2009-04-09 12:22           ` Tassilo Horn
  2009-04-09 18:34           ` Tassilo Horn
  0 siblings, 2 replies; 7+ messages in thread
From: Tassilo Horn @ 2009-04-09  7:15 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: Ding Mailing List

Tassilo Horn <tassilo@member.fsf.org> writes:

Hi!

>> This made me think about IMAP specifically.
>>
>> Gnus (imap.el AFAICT, so the support is missing all the way down)
>> does not support the IMAP SEARCH command, except by UID.  It probably
>> should allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the
>> other standard search keys in RFC 3501 (section 6.4.4) [1].
>>
>> I don't know how IMAP servers implement SEARCH.  Is the speed decent?
>
> I use a local dovecot server and it doesn't index anything except
> message ids.  When I search for something (using nnir) it'll take
> quite a long time for large groups and it constantly accesses the
> filesystem.  I guess something like grep is used here.

Oh, this is plain wrong!  It's very slow for searching the BODY, but all
(or at least the important ones) are indexed by dovecot out of the box.
Searching for let's say a word in the subject takes about 2-3 secs on my
emacs-devel folder with about 20000 messages.

And for speading up searches in the whole message including BODY,
dovecot offers plugins:

  http://wiki.dovecot.org/Plugins/FTS

So I'd love to see a better support for IMAP SEARCH in Gnus.  nnir
works, but sometimes it's a bit humpy.  For example if I search for a
string the second time each result message appears two times in the
search result group.

And what bothers me most: When I want to see a result message in the
original group (I want to reply using the groups posting styles and
parameters) I hit G T on the message (gnus-summary-nnir-goto-thread) but
then I'm queried for my username / password for the local imap server.
Those informations are already in my .authinfo...

Bye,
Tassilo
-- 
A child of five could understand this! Fetch me a child of five!



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-09  7:15         ` Tassilo Horn
@ 2009-04-09 12:22           ` Tassilo Horn
  2009-04-09 18:34           ` Tassilo Horn
  1 sibling, 0 replies; 7+ messages in thread
From: Tassilo Horn @ 2009-04-09 12:22 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: Ding Mailing List

Tassilo Horn <tassilo@member.fsf.org> writes:

> And for speading up searches in the whole message including BODY,
> dovecot offers plugins:
>
>   http://wiki.dovecot.org/Plugins/FTS

I couldn't wait and configured the Squat plugin.  The first search took
a minute (the first search command initiated the building of the index)
and subsequent searches (in headers & body) take less than 1 sec on
emacs-devel with about 20000 messages.  Yay!

Bye,
Tassilo
-- 
The First rule of Chuck Norris is: you do not talk about Chuck Norris. 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-08 18:19     ` Indexing Gnus (and other...) mails Ted Zlatanov
  2009-04-08 18:28       ` Tassilo Horn
@ 2009-04-09 13:02       ` Ted Zlatanov
       [not found]       ` <86ws9uugj0.fsf__11476.5163431459$1239285119$gmane$org@lifelogs.com>
  2 siblings, 0 replies; 7+ messages in thread
From: Ted Zlatanov @ 2009-04-09 13:02 UTC (permalink / raw)
  Cc: Ding Mailing List

The following message is a courtesy copy of an article
that has been posted to gnu.emacs.gnus as well.

On Wed, 08 Apr 2009 13:19:02 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> Gnus (imap.el AFAICT, so the support is missing all the way down) does
TZ> not support the IMAP SEARCH command, except by UID.  It probably should
TZ> allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the other
TZ> standard search keys in RFC 3501 (section 6.4.4) [1].

Well, I mistyped "grep -i seach *.el" and assumed there was no IMAP
SEARCH support.  Sorry about the confusion--I was wrong.

nnir is the backend that implements search at the highest level in Gnus.
nnmairix is independent of it, but could probably be converted to a nnir
backend.  There are some TODO items with nnir as Tassilo pointed out,
with duplicate searches being a pretty big one.

Finally, nnir doesn't support incremental results AFAICT, which are
important for people with 30 GB of mail.  It would be nice if it did.
In general Gnus backends do very little incrementally, and that causes
problems with entering large groups and elsewhere, not just searching.

TZ> I don't know how IMAP servers implement SEARCH.  Is the speed decent?
TZ> If not, that should be an issue for the server maintainers (or they can
TZ> allow search plugins, so things like mairix can be integrated).  It
TZ> seems to me that IMAP SEARCH is a good way to provide universal
TZ> searching in Gnus for IMAP backends.

If anyone has experience integrating mairix with Courier or Dovecot,
please let me know.

Ted



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Indexing Gnus (and other...) mails
       [not found]       ` <86ws9uugj0.fsf__11476.5163431459$1239285119$gmane$org@lifelogs.com>
@ 2009-04-09 14:40         ` David Engster
  0 siblings, 0 replies; 7+ messages in thread
From: David Engster @ 2009-04-09 14:40 UTC (permalink / raw)
  Cc: ding

The following message is a courtesy copy of an article
that has been posted to gmane.emacs.gnus.user as well.

Ted Zlatanov <tzz@lifelogs.com> writes:
> nnir is the backend that implements search at the highest level in Gnus.
> nnmairix is independent of it, but could probably be converted to a nnir
> backend. 

When I started with this, I thought about integrating mairix into nnir,
but the way nnir works internally doesn't really fit too well for
mairix.

Mairix does not care about mailboxes and article numbers; it works
strictly on the filesystem level, and search results are simply links to
the original message files. While this has some obvious advantages (it's
fast, and the resulting mailbox is "just there", but still occupies
almost no filespace), it makes other things pretty hard to do,
e.g. finding the original article in Gnus and propagating marks to it.

With IMAP SEARCH, it's pretty much the other way round - you know the
original articles, and the main work is to produce a mailbox which
integrates all the search results and transparently maps article numbers
in that mailbox to the original ones.

> TZ> I don't know how IMAP servers implement SEARCH.  Is the speed decent?

Tassilo already gave numbers on that. Usually, searching in the body is
slow. Since building indexes for full text search puts quite some load
on the server and can takes lots of filespace, it's usually only an
option for people managing their own IMAP servers (for example, Squat
takes about 30% of the mailbox size in the default configuration)

> If anyone has experience integrating mairix with Courier or Dovecot,
> please let me know.

You mean as a plugin? Otherwise, it's pretty straightforward. I call it
via ssh slave connections directly on the server.

-David



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-09  7:15         ` Tassilo Horn
  2009-04-09 12:22           ` Tassilo Horn
@ 2009-04-09 18:34           ` Tassilo Horn
  1 sibling, 0 replies; 7+ messages in thread
From: Tassilo Horn @ 2009-04-09 18:34 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: Ding Mailing List

Tassilo Horn <tassilo@member.fsf.org> writes:

> And for speading up searches in the whole message including BODY,
> dovecot offers plugins:
>
>   http://wiki.dovecot.org/Plugins/FTS
>
> So I'd love to see a better support for IMAP SEARCH in Gnus.  nnir
> works, but sometimes it's a bit humpy.  For example if I search for a
> string the second time each result message appears two times in the
> search result group.
>
> And what bothers me most: When I want to see a result message in the
> original group (I want to reply using the groups posting styles and
> parameters) I hit G T on the message (gnus-summary-nnir-goto-thread)
> but then I'm queried for my username / password for the local imap
> server.  Those informations are already in my .authinfo...

Ok, I have fixed that for me like by redefining g-s-n-g-t like that:

--8<---------------cut here---------------start------------->8---
(defun gnus-summary-nnir-goto-thread ()
  "Redefines the function from nnir.el."
  (interactive)
  (unless (eq 'nnir (car (gnus-find-method-for-group gnus-newsgroup-name)))
    (error "Can't execute this command unless in nnir group."))
  (let* ((cur (gnus-summary-article-number))
         (group (nnir-artlist-artitem-group nnir-artlist cur))
         (mid (mail-header 'message-id
                           (progn
                             (gnus-summary-select-article t)
                             (with-current-buffer gnus-article-buffer
                               (goto-char (point-min))
                               (mail-header-extract-no-properties))))))
    (gnus-group-read-group 1 t group nil)
    (gnus-summary-refer-article mid)
    (gnus-summary-refer-thread)
    ;; TODO: Is there nothing like limiting to the current thread?
    ))
--8<---------------cut here---------------end--------------->8---

This works good for me and maybe it's the right approach, too.  I don't
quite understand why nnir uses an epheneral group and black article
number magic while Gnus will do the right thing given a message-id
anyway.

Comments?

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-04-09 18:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87y6ubo5th.fsf@an-dro.enstb.org>
     [not found] ` <87ab6rbi5p.fsf@thinkpad.tsdh.de>
     [not found]   ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org>
2009-04-08 18:19     ` Indexing Gnus (and other...) mails Ted Zlatanov
2009-04-08 18:28       ` Tassilo Horn
2009-04-09  7:15         ` Tassilo Horn
2009-04-09 12:22           ` Tassilo Horn
2009-04-09 18:34           ` Tassilo Horn
2009-04-09 13:02       ` Ted Zlatanov
     [not found]       ` <86ws9uugj0.fsf__11476.5163431459$1239285119$gmane$org@lifelogs.com>
2009-04-09 14:40         ` David Engster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).