Gnus development mailing list
 help / color / mirror / Atom feed
From: Eric Abrahamsen <eric@ericabrahamsen.net>
To: ding@gnus.org
Subject: Re: nnimap backend performances ?
Date: Mon, 04 Jan 2016 09:50:36 +0800	[thread overview]
Message-ID: <877fjqru37.fsf@ericabrahamsen.net> (raw)
In-Reply-To: <874meukwph.fsf@gmail.com>

myglc2 <myglc2@gmail.com> writes:

> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>>> myglc2 <myglc2@gmail.com> writes:
>>>
>>> FWIW, My goal is to sweep multiple gmail accounts into bullet-proof
>>> archives with tools for searching and clustering several decades of
>>> work. To get a handle on what is doable, I am experimenting with gnus
>>> backends and search schemes, mu/mu4e, dovecot, and mbsync.
>>>
>>> I would love to hear suggestions of other emacs-compatible solutions I
>>> should try.
>>
>> I might not be too helpful here, unfortunately. I've only ever used
>> gnus/dovecot/isync. As I said in that post, I think dovecot plus a fts
>> plugin works great, but I really don't like the search syntax on the
>> Gnus/nnir side -- I find it very cumbersome. If there were a nicer
>> syntax for searching, I think I'd be happy with this.
>
> Thanks, that is helpful Search syntax is important to me. Lord knows I
> don't need any new obscure stuff to learn. Doesn't nnir simply pass the
> search string to the backend.  If so, is the syntax you don't like
> associated with nnimap, IMAP spec, or lucene?

Both nnimap and IMAP. Date-based IMAP searching in particular is hard to
use: I can never remember the keywords, or the proper date format, and
the results are often just wrong. It doesn't find messages that clearly
fall within the specified date range.

Nnimap is a little weird because it only lets you search on one header
by default. If you want to search on multiple headers, that's an extra
layer of querying. It's not a disaster, but it does make the process
more difficult.

> I have a toy installation of dovecot running with a small number of
> messages.  Search is very fast but article fetching is surprisingly
> slow. Have you experienced anything like this?

Yes, getting messages out of the server isn't super quick. That's why I
think plain old nnmaildir might be preferable for email archives.

The more I think about it, the more I'm leaning towards pulling my
archives out of Dovecot and just using nnmaildir/nnir/notmuch.

>> I could also imagine using nnmaildir -- it seems simpler, at least
>> conceptually, if all you want is to keep mails archived. In that case
>> you could probably use notmuch to index the mails, and set nnir to use
>> that notmuch installation as the search backend. If you don't need
>> accounts and authentication and everything for a simple local archive,
>> that should be enough. I'll bet the searching would be even faster, as
>> well.
>
> I currently have mu4e set up, which uses the same indexer (xapian) as
> notmuch with the same set of messages in Maildir. The search response
> seems as fast, and article fetch seems much faster.
>
> I plan to also set up notmuch and mairix. Then I will have:
>
>
> gnus --    nnir  --   nnimap  --  dovecot  --  maildir
>                       lucene                   messages
>                       (index)
>
> gnus --    nnir  --   notmuch --  maildir
>                       xapian      messages
>                       (index)
>
> mu4e --     mu   --   maildir
>           xapian      messages
>           (index)
>
> gnus --  mairix  --   maildir
>           flex        messages
>          (index)     smart group
>                    (search cache)
>
>
> Then I can inflate the size of the message stores and compare. Frankly,
> it is hard to imagine why performance shouldn't be pretty similar across
> the board.
>
> As far as I can tell from reading and other helpful exchanges on the
> list, the only major difference is that mairix caches a search result in
> a so-called "smart" permanent group. This is implemented by a maildir
> folder containing symbolic links to the actual messages.
>
> In theory, the second time we preform the exact same search, it should
> be faster. If we search for the same thing over and over it probably
> will be faster. If most or our searches are unique, making a cache
> probably will be slower.
>
> If in practice we search for the same term over an over and marix lets
> us just click on a group containing the result instead of typing the
> search term, this could indeed be nicer even if getting the search
> result is slower.
>
> As far as syntax, I am assume that mu4e and notmuch use a xapian
> syntax. Mairix uses the Flex lexical analyzer, so I assume it is
> different.
>
> Meanwhile I am trying to determine if there are other meaningful mairix
> functional differences.  - George

Notmuch also caches search results somehow (it's transparent, I'm not
sure how it works), though it doesn't have mairix's smart search groups.
I used mairix for a bit, several years ago, and thought the way it
creates groups was a bit awkward, but it's probably not a big deal.

Another consideration is non-ASCII searches. Not all the search backends
let you search for strings in funny encodings. IMAP requires you put the
strings in quotes, and at some point namazu didn't do non-ASCII searches
at all (or was it Mairix?). It's been a while now, so I don't want to
slander packages that may have upgraded in the meantime, but if that's
important to you it's something to look into.

Otherwise I think you're right -- there shouldn't be too much difference
in performance between the various backends.

E




      reply	other threads:[~2016-01-04  1:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20  5:27 Xavier Maillard
2015-11-20  5:51 ` Eric Abrahamsen
2015-11-20 20:20   ` Xavier Maillard
2016-01-02  0:09     ` Peter Davis
2016-01-02  3:22       ` Eric Abrahamsen
2016-01-02 12:04         ` Peter Davis
2016-01-02 12:20           ` Peter Davis
2016-01-02 13:57             ` Eric Abrahamsen
2015-12-29 19:25   ` myglc2
2015-12-31  6:07     ` Eric Abrahamsen
2016-01-01  3:51       ` myglc2
2016-01-02  3:38         ` Eric Abrahamsen
2016-01-04  0:35           ` myglc2
2016-01-04  1:50             ` Eric Abrahamsen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877fjqru37.fsf@ericabrahamsen.net \
    --to=eric@ericabrahamsen.net \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).