Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* Indexing Gnus (and other...) mails
@ 2009-04-08  9:25 Ronan Keryell
  2009-04-08  9:37 ` Tassilo Horn
  0 siblings, 1 reply; 12+ messages in thread
From: Ronan Keryell @ 2009-04-08  9:25 UTC (permalink / raw)
  To: info-gnus-english

Hello !

I'm looking for an efficient solution to index my mail that becomes a
huge issue: around 30 GB since 1987...  I've played around with beagle
and tracker with no success, except spoling my processors for weeks and
10 GB of index that was never completed...

Any idea ?
-- 
  Ronan KERYELL
  HPC Project
  FRANCE

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-08  9:25 Indexing Gnus (and other...) mails Ronan Keryell
@ 2009-04-08  9:37 ` Tassilo Horn
  2009-04-08 16:46   ` Emilio Jesús Gallego Arias
                     ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Tassilo Horn @ 2009-04-08  9:37 UTC (permalink / raw)
  To: info-gnus-english

Ronan Keryell <Ronan.Keryell@hpc-project.com> writes:

Hi Ronan,

> I'm looking for an efficient solution to index my mail that becomes a
> huge issue: around 30 GB since 1987...  I've played around with beagle
> and tracker with no success, except spoling my processors for weeks
> and 10 GB of index that was never completed...

You could try mairix (+ the gnus nnmairix backend).  According to its
homepage, it should be quite fast while indexing.  (But still 30GB is
quite a lot...)

Bye,
Tassilo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-08  9:37 ` Tassilo Horn
@ 2009-04-08 16:46   ` Emilio Jesús Gallego Arias
       [not found]   ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Emilio Jesús Gallego Arias @ 2009-04-08 16:46 UTC (permalink / raw)
  To: info-gnus-english

Tassilo Horn <tassilo@member.fsf.org> writes:

> You could try mairix (+ the gnus nnmairix backend).  According to its
> homepage, it should be quite fast while indexing.  (But still 30GB is
> quite a lot...)

I couldn't be happier with mairix (just 10Gb of email tough)

In fact, it is so fast that I use it to simulate "Gmail" like
threads. Just index your sent mails archive and search for a subject,
mairix will create a group with all the mails.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
       [not found]   ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org>
@ 2009-04-08 18:19     ` Ted Zlatanov
       [not found]     ` <86ab6rypnt.fsf@lifelogs.com>
  1 sibling, 0 replies; 12+ messages in thread
From: Ted Zlatanov @ 2009-04-08 18:19 UTC (permalink / raw)
  To: info-gnus-english; +Cc: Ding Mailing List

On Wed, 08 Apr 2009 18:46:32 +0200 egallego@babel.ls.fi.upm.es (Emilio Jesús Gallego Arias) wrote: 

EJGA> Tassilo Horn <tassilo@member.fsf.org> writes:
>> You could try mairix (+ the gnus nnmairix backend).  According to its
>> homepage, it should be quite fast while indexing.  (But still 30GB is
>> quite a lot...)

EJGA> I couldn't be happier with mairix (just 10Gb of email tough)

EJGA> In fact, it is so fast that I use it to simulate "Gmail" like
EJGA> threads. Just index your sent mails archive and search for a subject,
EJGA> mairix will create a group with all the mails.

This made me think about IMAP specifically.

Gnus (imap.el AFAICT, so the support is missing all the way down) does
not support the IMAP SEARCH command, except by UID.  It probably should
allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the other
standard search keys in RFC 3501 (section 6.4.4) [1].

I don't know how IMAP servers implement SEARCH.  Is the speed decent?
If not, that should be an issue for the server maintainers (or they can
allow search plugins, so things like mairix can be integrated).  It
seems to me that IMAP SEARCH is a good way to provide universal
searching in Gnus for IMAP backends.

Obviously mairix (with nnmairix) is still useful, and perhaps Gnus
should have backend searching capabilities that go beyond just limiting
the full list of articles.  But IMAP SEARCH support seems to me to be an
essential piece of building a good Gnus search solution that doesn't
depend on mairix or any other search tools, but can use them when they
are available.

Ideas?  Suggestions?  Did I overlook something?

Thanks
Ted

[1] http://www.ietf.org/rfc/rfc3501.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
       [not found]     ` <86ab6rypnt.fsf@lifelogs.com>
@ 2009-04-09 13:02       ` Ted Zlatanov
  2009-04-09 14:40         ` David Engster
       [not found]         ` <m2k55tevq8.fsf@arcor.de>
  0 siblings, 2 replies; 12+ messages in thread
From: Ted Zlatanov @ 2009-04-09 13:02 UTC (permalink / raw)
  To: info-gnus-english; +Cc: Ding Mailing List

On Wed, 08 Apr 2009 13:19:02 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: 

TZ> Gnus (imap.el AFAICT, so the support is missing all the way down) does
TZ> not support the IMAP SEARCH command, except by UID.  It probably should
TZ> allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the other
TZ> standard search keys in RFC 3501 (section 6.4.4) [1].

Well, I mistyped "grep -i seach *.el" and assumed there was no IMAP
SEARCH support.  Sorry about the confusion--I was wrong.

nnir is the backend that implements search at the highest level in Gnus.
nnmairix is independent of it, but could probably be converted to a nnir
backend.  There are some TODO items with nnir as Tassilo pointed out,
with duplicate searches being a pretty big one.

Finally, nnir doesn't support incremental results AFAICT, which are
important for people with 30 GB of mail.  It would be nice if it did.
In general Gnus backends do very little incrementally, and that causes
problems with entering large groups and elsewhere, not just searching.

TZ> I don't know how IMAP servers implement SEARCH.  Is the speed decent?
TZ> If not, that should be an issue for the server maintainers (or they can
TZ> allow search plugins, so things like mairix can be integrated).  It
TZ> seems to me that IMAP SEARCH is a good way to provide universal
TZ> searching in Gnus for IMAP backends.

If anyone has experience integrating mairix with Courier or Dovecot,
please let me know.

Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-09 13:02       ` Ted Zlatanov
@ 2009-04-09 14:40         ` David Engster
       [not found]         ` <m2k55tevq8.fsf@arcor.de>
  1 sibling, 0 replies; 12+ messages in thread
From: David Engster @ 2009-04-09 14:40 UTC (permalink / raw)
  To: info-gnus-english; +Cc: ding

Ted Zlatanov <tzz@lifelogs.com> writes:
> nnir is the backend that implements search at the highest level in Gnus.
> nnmairix is independent of it, but could probably be converted to a nnir
> backend. 

When I started with this, I thought about integrating mairix into nnir,
but the way nnir works internally doesn't really fit too well for
mairix.

Mairix does not care about mailboxes and article numbers; it works
strictly on the filesystem level, and search results are simply links to
the original message files. While this has some obvious advantages (it's
fast, and the resulting mailbox is "just there", but still occupies
almost no filespace), it makes other things pretty hard to do,
e.g. finding the original article in Gnus and propagating marks to it.

With IMAP SEARCH, it's pretty much the other way round - you know the
original articles, and the main work is to produce a mailbox which
integrates all the search results and transparently maps article numbers
in that mailbox to the original ones.

> TZ> I don't know how IMAP servers implement SEARCH.  Is the speed decent?

Tassilo already gave numbers on that. Usually, searching in the body is
slow. Since building indexes for full text search puts quite some load
on the server and can takes lots of filespace, it's usually only an
option for people managing their own IMAP servers (for example, Squat
takes about 30% of the mailbox size in the default configuration)

> If anyone has experience integrating mairix with Courier or Dovecot,
> please let me know.

You mean as a plugin? Otherwise, it's pretty straightforward. I call it
via ssh slave connections directly on the server.

-David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-08  9:37 ` Tassilo Horn
  2009-04-08 16:46   ` Emilio Jesús Gallego Arias
       [not found]   ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org>
@ 2009-04-10 11:05   ` Ronan Keryell
  2009-04-10 21:13     ` David Engster
       [not found]   ` <mailman.5045.1239361534.31690.info-gnus-english@gnu.org>
  3 siblings, 1 reply; 12+ messages in thread
From: Ronan Keryell @ 2009-04-10 11:05 UTC (permalink / raw)
  To: info-gnus-english

Tassilo Horn <tassilo@member.fsf.org> writes:

> Ronan Keryell <Ronan.Keryell@hpc-project.com> writes:
>
> Hi Ronan,
>
>> I'm looking for an efficient solution to index my mail that becomes a
>> huge issue: around 30 GB since 1987...  I've played around with beagle
>> and tracker with no success, except spoling my processors for weeks
>> and 10 GB of index that was never completed...
>
> You could try mairix (+ the gnus nnmairix backend).  According to its
> homepage, it should be quite fast while indexing.  (But still 30GB is
> quite a lot...)

Thank you!

It is a quite good suggestion! I'm going to dig into
http://www.gnus.org/manual/gnus_43.html

I've tried it but it failed quickly with a "Out of memory (at
rfc822.c:439, -1538 bytes)" whereas it was only using around 200 MB of
memory (I have 3 GB RAM + 8 GB swap available...). I do not have time to
investigate right now but I will try in 2 weeks.

It sounds like a 32-bit bug or limitation on the tool...

BTW, 30 GB is not a lot, it is only 2.28 € at 0.076 €/GB now in
France. :-) It is far less than the cost of all the people that kindly
reply to my answer and the time I'm spending on this issue... :-)
-- 
  Ronan KERYELL
  HPC Project
  FRANCE

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
       [not found]   ` <mailman.5045.1239361534.31690.info-gnus-english@gnu.org>
@ 2009-04-10 14:00     ` Richard Riley
  2009-04-10 16:28       ` Ronan Keryell
       [not found]       ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org>
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Riley @ 2009-04-10 14:00 UTC (permalink / raw)
  To: info-gnus-english

Ronan Keryell <Ronan.Keryell@hpc-project.com> writes:

> Tassilo Horn <tassilo@member.fsf.org> writes:
>
>> Ronan Keryell <Ronan.Keryell@hpc-project.com> writes:
>>
>> Hi Ronan,
>>
>>> I'm looking for an efficient solution to index my mail that becomes a
>>> huge issue: around 30 GB since 1987...  I've played around with beagle
>>> and tracker with no success, except spoling my processors for weeks
>>> and 10 GB of index that was never completed...
>>
>> You could try mairix (+ the gnus nnmairix backend).  According to its
>> homepage, it should be quite fast while indexing.  (But still 30GB is
>> quite a lot...)
>
> Thank you!
>
> It is a quite good suggestion! I'm going to dig into
> http://www.gnus.org/manual/gnus_43.html
>
> I've tried it but it failed quickly with a "Out of memory (at
> rfc822.c:439, -1538 bytes)" whereas it was only using around 200 MB of
> memory (I have 3 GB RAM + 8 GB swap available...). I do not have time to
> investigate right now but I will try in 2 weeks.
>
> It sounds like a 32-bit bug or limitation on the tool...
>
> BTW, 30 GB is not a lot, it is only 2.28 € at 0.076 €/GB now in
> France. :-) It is far less than the cost of all the people that kindly
> reply to my answer and the time I'm spending on this issue... :-)

I think he meant it's rather a ridiculous amount of email. Even 20 years
of email amounting to 30 Gig is a LOT for a single person ! Are you sure
you remembered to delete all the "oh so funny" emails with hugs
powerpoint attachments :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-10 14:00     ` Richard Riley
@ 2009-04-10 16:28       ` Ronan Keryell
       [not found]       ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Ronan Keryell @ 2009-04-10 16:28 UTC (permalink / raw)
  To: info-gnus-english

Richard Riley <rileyrgdev@googlemail.com> writes:

> I think he meant it's rather a ridiculous amount of email.

Yes I understand. And I reply that just trying to diminish this CPU and
disk usage does not worth the price of the brain time. :-)

>  Even 20 years of email amounting to 30 Gig is a LOT for a single
> person ! Are you sure you remembered to delete all the "oh so funny"
> emails with hugs powerpoint attachments :-)

A mail sent is a mail sent. So it is archived. :-)

Oops! I've forgot to archive the SPAM. :-(

640 KB of mail is enough for everybody is not a good argument to
convince me. :-)
-- 
  Ronan KERYELL
  HPC Project
  FRANCE

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
       [not found]       ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org>
@ 2009-04-10 17:18         ` Richard Riley
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Riley @ 2009-04-10 17:18 UTC (permalink / raw)
  To: info-gnus-english

Ronan Keryell <Ronan.Keryell@hpc-project.com> writes:

> Richard Riley <rileyrgdev@googlemail.com> writes:
>
>> I think he meant it's rather a ridiculous amount of email.
>
> Yes I understand. And I reply that just trying to diminish this CPU and
> disk usage does not worth the price of the brain time. :-)
>
>>  Even 20 years of email amounting to 30 Gig is a LOT for a single
>> person ! Are you sure you remembered to delete all the "oh so funny"
>> emails with hugs powerpoint attachments :-)
>
> A mail sent is a mail sent. So it is archived. :-)
>
> Oops! I've forgot to archive the SPAM. :-(
>
> 640 KB of mail is enough for everybody is not a good argument to
> convince me. :-)

You send and receive 4 megabytes a day for twenty years?

Incredible.

And no one mentioned 640 Megs. If you truly need to index that much then
good luck!

Since I suspect you only really want a quick find on email from the past
year or so then you could index that seperately.

I use mairix with gnus and its instant and I have about 4 years of
emails indexed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
  2009-04-10 11:05   ` Ronan Keryell
@ 2009-04-10 21:13     ` David Engster
  0 siblings, 0 replies; 12+ messages in thread
From: David Engster @ 2009-04-10 21:13 UTC (permalink / raw)
  To: info-gnus-english

Ronan Keryell <Ronan.Keryell@hpc-project.com> writes:
> I've tried it but it failed quickly with a "Out of memory (at
> rfc822.c:439, -1538 bytes)" whereas it was only using around 200 MB of
> memory (I have 3 GB RAM + 8 GB swap available...). I do not have time to
> investigate right now but I will try in 2 weeks.
>
> It sounds like a 32-bit bug or limitation on the tool...

If you investigate further, please post your results on the mairix
mailing list.

FWIW, nnmairix can handle several mairix installations. So what you
could do is set up several different mairixrc configuration files which
index different parts of your mail (with a size that mairix can still
handle). You can then set up a nnmairix server for each configuration
file with "mairix -f <configfile>" for nnmairix-mairix-command.

This is just a workaraound and has some disadvantages; for example, you
can only query one of the mairix servers at a time, so you will roughly
have to know which mairix index could get you what you're looking for.

> BTW, 30 GB is not a lot, it is only 2.28 € at 0.076 €/GB now in
> France. :-) 

Well yes, measured in Euros. But otherwise, I'd say that definitely is a
lot of mail. :-)

-David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Indexing Gnus (and other...) mails
       [not found]         ` <m2k55tevq8.fsf@arcor.de>
@ 2009-04-14 16:29           ` Ted Zlatanov
  0 siblings, 0 replies; 12+ messages in thread
From: Ted Zlatanov @ 2009-04-14 16:29 UTC (permalink / raw)
  To: info-gnus-english

On Thu, 09 Apr 2009 16:40:31 +0200 David Engster <deng@randomsample.de> wrote: 

DE> When I started with this, I thought about integrating mairix into nnir,
DE> but the way nnir works internally doesn't really fit too well for
DE> mairix.

DE> Mairix does not care about mailboxes and article numbers; it works
DE> strictly on the filesystem level, and search results are simply links to
DE> the original message files. While this has some obvious advantages (it's
DE> fast, and the resulting mailbox is "just there", but still occupies
DE> almost no filespace), it makes other things pretty hard to do,
DE> e.g. finding the original article in Gnus and propagating marks to it.

I see.  That's actually too bad, because it would be nice to get the
search results inside nnir.  Regardless, what bothers me is the
synchronous nature of all these backends.  If you use anything.el,
you'll see how it shows found items dynamically as they show up.  Gnus
should be able to do that too, but it can't because everything in the
backend architecture is synchronous and serialized.

>> If anyone has experience integrating mairix with Courier or Dovecot,
>> please let me know.

DE> You mean as a plugin? Otherwise, it's pretty straightforward. I call it
DE> via ssh slave connections directly on the server.

Yes, I meant as a plugin, so it would Just Work with IMAP SEARCH.

Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-04-14 16:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-08  9:25 Indexing Gnus (and other...) mails Ronan Keryell
2009-04-08  9:37 ` Tassilo Horn
2009-04-08 16:46   ` Emilio Jesús Gallego Arias
     [not found]   ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org>
2009-04-08 18:19     ` Ted Zlatanov
     [not found]     ` <86ab6rypnt.fsf@lifelogs.com>
2009-04-09 13:02       ` Ted Zlatanov
2009-04-09 14:40         ` David Engster
     [not found]         ` <m2k55tevq8.fsf@arcor.de>
2009-04-14 16:29           ` Ted Zlatanov
2009-04-10 11:05   ` Ronan Keryell
2009-04-10 21:13     ` David Engster
     [not found]   ` <mailman.5045.1239361534.31690.info-gnus-english@gnu.org>
2009-04-10 14:00     ` Richard Riley
2009-04-10 16:28       ` Ronan Keryell
     [not found]       ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org>
2009-04-10 17:18         ` Richard Riley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).