* Indexing Gnus (and other...) mails @ 2009-04-08 9:25 Ronan Keryell 2009-04-08 9:37 ` Tassilo Horn 0 siblings, 1 reply; 12+ messages in thread From: Ronan Keryell @ 2009-04-08 9:25 UTC (permalink / raw) To: info-gnus-english Hello ! I'm looking for an efficient solution to index my mail that becomes a huge issue: around 30 GB since 1987... I've played around with beagle and tracker with no success, except spoling my processors for weeks and 10 GB of index that was never completed... Any idea ? -- Ronan KERYELL HPC Project FRANCE ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Indexing Gnus (and other...) mails 2009-04-08 9:25 Indexing Gnus (and other...) mails Ronan Keryell @ 2009-04-08 9:37 ` Tassilo Horn 2009-04-08 16:46 ` Emilio Jesús Gallego Arias ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Tassilo Horn @ 2009-04-08 9:37 UTC (permalink / raw) To: info-gnus-english Ronan Keryell <Ronan.Keryell@hpc-project.com> writes: Hi Ronan, > I'm looking for an efficient solution to index my mail that becomes a > huge issue: around 30 GB since 1987... I've played around with beagle > and tracker with no success, except spoling my processors for weeks > and 10 GB of index that was never completed... You could try mairix (+ the gnus nnmairix backend). According to its homepage, it should be quite fast while indexing. (But still 30GB is quite a lot...) Bye, Tassilo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Indexing Gnus (and other...) mails 2009-04-08 9:37 ` Tassilo Horn @ 2009-04-08 16:46 ` Emilio Jesús Gallego Arias [not found] ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org> ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Emilio Jesús Gallego Arias @ 2009-04-08 16:46 UTC (permalink / raw) To: info-gnus-english Tassilo Horn <tassilo@member.fsf.org> writes: > You could try mairix (+ the gnus nnmairix backend). According to its > homepage, it should be quite fast while indexing. (But still 30GB is > quite a lot...) I couldn't be happier with mairix (just 10Gb of email tough) In fact, it is so fast that I use it to simulate "Gmail" like threads. Just index your sent mails archive and search for a subject, mairix will create a group with all the mails. ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <mailman.4911.1239209214.31690.info-gnus-english@gnu.org>]
* Re: Indexing Gnus (and other...) mails [not found] ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org> @ 2009-04-08 18:19 ` Ted Zlatanov [not found] ` <86ab6rypnt.fsf@lifelogs.com> 1 sibling, 0 replies; 12+ messages in thread From: Ted Zlatanov @ 2009-04-08 18:19 UTC (permalink / raw) To: info-gnus-english; +Cc: Ding Mailing List On Wed, 08 Apr 2009 18:46:32 +0200 egallego@babel.ls.fi.upm.es (Emilio Jesús Gallego Arias) wrote: EJGA> Tassilo Horn <tassilo@member.fsf.org> writes: >> You could try mairix (+ the gnus nnmairix backend). According to its >> homepage, it should be quite fast while indexing. (But still 30GB is >> quite a lot...) EJGA> I couldn't be happier with mairix (just 10Gb of email tough) EJGA> In fact, it is so fast that I use it to simulate "Gmail" like EJGA> threads. Just index your sent mails archive and search for a subject, EJGA> mairix will create a group with all the mails. This made me think about IMAP specifically. Gnus (imap.el AFAICT, so the support is missing all the way down) does not support the IMAP SEARCH command, except by UID. It probably should allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the other standard search keys in RFC 3501 (section 6.4.4) [1]. I don't know how IMAP servers implement SEARCH. Is the speed decent? If not, that should be an issue for the server maintainers (or they can allow search plugins, so things like mairix can be integrated). It seems to me that IMAP SEARCH is a good way to provide universal searching in Gnus for IMAP backends. Obviously mairix (with nnmairix) is still useful, and perhaps Gnus should have backend searching capabilities that go beyond just limiting the full list of articles. But IMAP SEARCH support seems to me to be an essential piece of building a good Gnus search solution that doesn't depend on mairix or any other search tools, but can use them when they are available. Ideas? Suggestions? Did I overlook something? Thanks Ted [1] http://www.ietf.org/rfc/rfc3501.txt ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <86ab6rypnt.fsf@lifelogs.com>]
* Re: Indexing Gnus (and other...) mails [not found] ` <86ab6rypnt.fsf@lifelogs.com> @ 2009-04-09 13:02 ` Ted Zlatanov 2009-04-09 14:40 ` David Engster [not found] ` <m2k55tevq8.fsf@arcor.de> 0 siblings, 2 replies; 12+ messages in thread From: Ted Zlatanov @ 2009-04-09 13:02 UTC (permalink / raw) To: info-gnus-english; +Cc: Ding Mailing List On Wed, 08 Apr 2009 13:19:02 -0500 Ted Zlatanov <tzz@lifelogs.com> wrote: TZ> Gnus (imap.el AFAICT, so the support is missing all the way down) does TZ> not support the IMAP SEARCH command, except by UID. It probably should TZ> allow SEARCH by TEXT, FROM, TO, SUBJECT, and probably all the other TZ> standard search keys in RFC 3501 (section 6.4.4) [1]. Well, I mistyped "grep -i seach *.el" and assumed there was no IMAP SEARCH support. Sorry about the confusion--I was wrong. nnir is the backend that implements search at the highest level in Gnus. nnmairix is independent of it, but could probably be converted to a nnir backend. There are some TODO items with nnir as Tassilo pointed out, with duplicate searches being a pretty big one. Finally, nnir doesn't support incremental results AFAICT, which are important for people with 30 GB of mail. It would be nice if it did. In general Gnus backends do very little incrementally, and that causes problems with entering large groups and elsewhere, not just searching. TZ> I don't know how IMAP servers implement SEARCH. Is the speed decent? TZ> If not, that should be an issue for the server maintainers (or they can TZ> allow search plugins, so things like mairix can be integrated). It TZ> seems to me that IMAP SEARCH is a good way to provide universal TZ> searching in Gnus for IMAP backends. If anyone has experience integrating mairix with Courier or Dovecot, please let me know. Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Indexing Gnus (and other...) mails 2009-04-09 13:02 ` Ted Zlatanov @ 2009-04-09 14:40 ` David Engster [not found] ` <m2k55tevq8.fsf@arcor.de> 1 sibling, 0 replies; 12+ messages in thread From: David Engster @ 2009-04-09 14:40 UTC (permalink / raw) To: info-gnus-english; +Cc: ding Ted Zlatanov <tzz@lifelogs.com> writes: > nnir is the backend that implements search at the highest level in Gnus. > nnmairix is independent of it, but could probably be converted to a nnir > backend. When I started with this, I thought about integrating mairix into nnir, but the way nnir works internally doesn't really fit too well for mairix. Mairix does not care about mailboxes and article numbers; it works strictly on the filesystem level, and search results are simply links to the original message files. While this has some obvious advantages (it's fast, and the resulting mailbox is "just there", but still occupies almost no filespace), it makes other things pretty hard to do, e.g. finding the original article in Gnus and propagating marks to it. With IMAP SEARCH, it's pretty much the other way round - you know the original articles, and the main work is to produce a mailbox which integrates all the search results and transparently maps article numbers in that mailbox to the original ones. > TZ> I don't know how IMAP servers implement SEARCH. Is the speed decent? Tassilo already gave numbers on that. Usually, searching in the body is slow. Since building indexes for full text search puts quite some load on the server and can takes lots of filespace, it's usually only an option for people managing their own IMAP servers (for example, Squat takes about 30% of the mailbox size in the default configuration) > If anyone has experience integrating mairix with Courier or Dovecot, > please let me know. You mean as a plugin? Otherwise, it's pretty straightforward. I call it via ssh slave connections directly on the server. -David ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <m2k55tevq8.fsf@arcor.de>]
* Re: Indexing Gnus (and other...) mails [not found] ` <m2k55tevq8.fsf@arcor.de> @ 2009-04-14 16:29 ` Ted Zlatanov 0 siblings, 0 replies; 12+ messages in thread From: Ted Zlatanov @ 2009-04-14 16:29 UTC (permalink / raw) To: info-gnus-english On Thu, 09 Apr 2009 16:40:31 +0200 David Engster <deng@randomsample.de> wrote: DE> When I started with this, I thought about integrating mairix into nnir, DE> but the way nnir works internally doesn't really fit too well for DE> mairix. DE> Mairix does not care about mailboxes and article numbers; it works DE> strictly on the filesystem level, and search results are simply links to DE> the original message files. While this has some obvious advantages (it's DE> fast, and the resulting mailbox is "just there", but still occupies DE> almost no filespace), it makes other things pretty hard to do, DE> e.g. finding the original article in Gnus and propagating marks to it. I see. That's actually too bad, because it would be nice to get the search results inside nnir. Regardless, what bothers me is the synchronous nature of all these backends. If you use anything.el, you'll see how it shows found items dynamically as they show up. Gnus should be able to do that too, but it can't because everything in the backend architecture is synchronous and serialized. >> If anyone has experience integrating mairix with Courier or Dovecot, >> please let me know. DE> You mean as a plugin? Otherwise, it's pretty straightforward. I call it DE> via ssh slave connections directly on the server. Yes, I meant as a plugin, so it would Just Work with IMAP SEARCH. Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Indexing Gnus (and other...) mails 2009-04-08 9:37 ` Tassilo Horn 2009-04-08 16:46 ` Emilio Jesús Gallego Arias [not found] ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org> @ 2009-04-10 11:05 ` Ronan Keryell 2009-04-10 21:13 ` David Engster [not found] ` <mailman.5045.1239361534.31690.info-gnus-english@gnu.org> 3 siblings, 1 reply; 12+ messages in thread From: Ronan Keryell @ 2009-04-10 11:05 UTC (permalink / raw) To: info-gnus-english Tassilo Horn <tassilo@member.fsf.org> writes: > Ronan Keryell <Ronan.Keryell@hpc-project.com> writes: > > Hi Ronan, > >> I'm looking for an efficient solution to index my mail that becomes a >> huge issue: around 30 GB since 1987... I've played around with beagle >> and tracker with no success, except spoling my processors for weeks >> and 10 GB of index that was never completed... > > You could try mairix (+ the gnus nnmairix backend). According to its > homepage, it should be quite fast while indexing. (But still 30GB is > quite a lot...) Thank you! It is a quite good suggestion! I'm going to dig into http://www.gnus.org/manual/gnus_43.html I've tried it but it failed quickly with a "Out of memory (at rfc822.c:439, -1538 bytes)" whereas it was only using around 200 MB of memory (I have 3 GB RAM + 8 GB swap available...). I do not have time to investigate right now but I will try in 2 weeks. It sounds like a 32-bit bug or limitation on the tool... BTW, 30 GB is not a lot, it is only 2.28 € at 0.076 €/GB now in France. :-) It is far less than the cost of all the people that kindly reply to my answer and the time I'm spending on this issue... :-) -- Ronan KERYELL HPC Project FRANCE ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Indexing Gnus (and other...) mails 2009-04-10 11:05 ` Ronan Keryell @ 2009-04-10 21:13 ` David Engster 0 siblings, 0 replies; 12+ messages in thread From: David Engster @ 2009-04-10 21:13 UTC (permalink / raw) To: info-gnus-english Ronan Keryell <Ronan.Keryell@hpc-project.com> writes: > I've tried it but it failed quickly with a "Out of memory (at > rfc822.c:439, -1538 bytes)" whereas it was only using around 200 MB of > memory (I have 3 GB RAM + 8 GB swap available...). I do not have time to > investigate right now but I will try in 2 weeks. > > It sounds like a 32-bit bug or limitation on the tool... If you investigate further, please post your results on the mairix mailing list. FWIW, nnmairix can handle several mairix installations. So what you could do is set up several different mairixrc configuration files which index different parts of your mail (with a size that mairix can still handle). You can then set up a nnmairix server for each configuration file with "mairix -f <configfile>" for nnmairix-mairix-command. This is just a workaraound and has some disadvantages; for example, you can only query one of the mairix servers at a time, so you will roughly have to know which mairix index could get you what you're looking for. > BTW, 30 GB is not a lot, it is only 2.28 € at 0.076 €/GB now in > France. :-) Well yes, measured in Euros. But otherwise, I'd say that definitely is a lot of mail. :-) -David ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <mailman.5045.1239361534.31690.info-gnus-english@gnu.org>]
* Re: Indexing Gnus (and other...) mails [not found] ` <mailman.5045.1239361534.31690.info-gnus-english@gnu.org> @ 2009-04-10 14:00 ` Richard Riley 2009-04-10 16:28 ` Ronan Keryell [not found] ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org> 0 siblings, 2 replies; 12+ messages in thread From: Richard Riley @ 2009-04-10 14:00 UTC (permalink / raw) To: info-gnus-english Ronan Keryell <Ronan.Keryell@hpc-project.com> writes: > Tassilo Horn <tassilo@member.fsf.org> writes: > >> Ronan Keryell <Ronan.Keryell@hpc-project.com> writes: >> >> Hi Ronan, >> >>> I'm looking for an efficient solution to index my mail that becomes a >>> huge issue: around 30 GB since 1987... I've played around with beagle >>> and tracker with no success, except spoling my processors for weeks >>> and 10 GB of index that was never completed... >> >> You could try mairix (+ the gnus nnmairix backend). According to its >> homepage, it should be quite fast while indexing. (But still 30GB is >> quite a lot...) > > Thank you! > > It is a quite good suggestion! I'm going to dig into > http://www.gnus.org/manual/gnus_43.html > > I've tried it but it failed quickly with a "Out of memory (at > rfc822.c:439, -1538 bytes)" whereas it was only using around 200 MB of > memory (I have 3 GB RAM + 8 GB swap available...). I do not have time to > investigate right now but I will try in 2 weeks. > > It sounds like a 32-bit bug or limitation on the tool... > > BTW, 30 GB is not a lot, it is only 2.28 € at 0.076 €/GB now in > France. :-) It is far less than the cost of all the people that kindly > reply to my answer and the time I'm spending on this issue... :-) I think he meant it's rather a ridiculous amount of email. Even 20 years of email amounting to 30 Gig is a LOT for a single person ! Are you sure you remembered to delete all the "oh so funny" emails with hugs powerpoint attachments :-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Indexing Gnus (and other...) mails 2009-04-10 14:00 ` Richard Riley @ 2009-04-10 16:28 ` Ronan Keryell [not found] ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org> 1 sibling, 0 replies; 12+ messages in thread From: Ronan Keryell @ 2009-04-10 16:28 UTC (permalink / raw) To: info-gnus-english Richard Riley <rileyrgdev@googlemail.com> writes: > I think he meant it's rather a ridiculous amount of email. Yes I understand. And I reply that just trying to diminish this CPU and disk usage does not worth the price of the brain time. :-) > Even 20 years of email amounting to 30 Gig is a LOT for a single > person ! Are you sure you remembered to delete all the "oh so funny" > emails with hugs powerpoint attachments :-) A mail sent is a mail sent. So it is archived. :-) Oops! I've forgot to archive the SPAM. :-( 640 KB of mail is enough for everybody is not a good argument to convince me. :-) -- Ronan KERYELL HPC Project FRANCE ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <mailman.5076.1239380919.31690.info-gnus-english@gnu.org>]
* Re: Indexing Gnus (and other...) mails [not found] ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org> @ 2009-04-10 17:18 ` Richard Riley 0 siblings, 0 replies; 12+ messages in thread From: Richard Riley @ 2009-04-10 17:18 UTC (permalink / raw) To: info-gnus-english Ronan Keryell <Ronan.Keryell@hpc-project.com> writes: > Richard Riley <rileyrgdev@googlemail.com> writes: > >> I think he meant it's rather a ridiculous amount of email. > > Yes I understand. And I reply that just trying to diminish this CPU and > disk usage does not worth the price of the brain time. :-) > >> Even 20 years of email amounting to 30 Gig is a LOT for a single >> person ! Are you sure you remembered to delete all the "oh so funny" >> emails with hugs powerpoint attachments :-) > > A mail sent is a mail sent. So it is archived. :-) > > Oops! I've forgot to archive the SPAM. :-( > > 640 KB of mail is enough for everybody is not a good argument to > convince me. :-) You send and receive 4 megabytes a day for twenty years? Incredible. And no one mentioned 640 Megs. If you truly need to index that much then good luck! Since I suspect you only really want a quick find on email from the past year or so then you could index that seperately. I use mairix with gnus and its instant and I have about 4 years of emails indexed. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-04-14 16:29 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-08 9:25 Indexing Gnus (and other...) mails Ronan Keryell 2009-04-08 9:37 ` Tassilo Horn 2009-04-08 16:46 ` Emilio Jesús Gallego Arias [not found] ` <mailman.4911.1239209214.31690.info-gnus-english@gnu.org> 2009-04-08 18:19 ` Ted Zlatanov [not found] ` <86ab6rypnt.fsf@lifelogs.com> 2009-04-09 13:02 ` Ted Zlatanov 2009-04-09 14:40 ` David Engster [not found] ` <m2k55tevq8.fsf@arcor.de> 2009-04-14 16:29 ` Ted Zlatanov 2009-04-10 11:05 ` Ronan Keryell 2009-04-10 21:13 ` David Engster [not found] ` <mailman.5045.1239361534.31690.info-gnus-english@gnu.org> 2009-04-10 14:00 ` Richard Riley 2009-04-10 16:28 ` Ronan Keryell [not found] ` <mailman.5076.1239380919.31690.info-gnus-english@gnu.org> 2009-04-10 17:18 ` Richard Riley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).