From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/86524 Path: news.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.gnus.general Subject: Re: nnimap backend performances ? Date: Mon, 04 Jan 2016 09:50:36 +0800 Message-ID: <877fjqru37.fsf@ericabrahamsen.net> References: <874mgh1amt.fsf@ericabrahamsen.net> <87twn19hwq.fsf@gmail.com> <87h9iz1789.fsf@ericabrahamsen.net> <87h9iyq7o8.fsf@gmail.com> <871ta08xcq.fsf@ericabrahamsen.net> <874meukwph.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1451872314 16456 80.91.229.3 (4 Jan 2016 01:51:54 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 4 Jan 2016 01:51:54 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M34751@lists.math.uh.edu Mon Jan 04 02:51:42 2016 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from lists1.math.uh.edu ([129.7.128.208]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aFuJB-0003EF-8v for ding-account@gmane.org; Mon, 04 Jan 2016 02:51:41 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by lists1.math.uh.edu with smtp (Exim 4.85) (envelope-from ) id 1aFuIV-0008Is-6W; Sun, 03 Jan 2016 19:50:59 -0600 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by lists1.math.uh.edu with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.85) (envelope-from ) id 1aFuIR-0008IM-Nn for ding@lists.math.uh.edu; Sun, 03 Jan 2016 19:50:55 -0600 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtps (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.85) (envelope-from ) id 1aFuIQ-0006BQ-K3 for ding@lists.math.uh.edu; Sun, 03 Jan 2016 19:50:55 -0600 Original-Received: from plane.gmane.org ([80.91.229.3]) by quimby.gnus.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1aFuIO-0003hN-JZ for ding@gnus.org; Mon, 04 Jan 2016 02:50:52 +0100 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1aFuIO-0002VF-0c for ding@gnus.org; Mon, 04 Jan 2016 02:50:52 +0100 Original-Received: from 111.197.159.241 ([111.197.159.241]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 04 Jan 2016 02:50:51 +0100 Original-Received: from eric by 111.197.159.241 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 04 Jan 2016 02:50:51 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 120 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 111.197.159.241 User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.1.50 (gnu/linux) Cancel-Lock: sha1:dP2X7BEPUXHS+qdSfXo3GJcCI4Y= X-Spam-Score: -0.1 (/) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:86524 Archived-At: myglc2 writes: > Eric Abrahamsen writes: > >>> myglc2 writes: >>> >>> FWIW, My goal is to sweep multiple gmail accounts into bullet-proof >>> archives with tools for searching and clustering several decades of >>> work. To get a handle on what is doable, I am experimenting with gnus >>> backends and search schemes, mu/mu4e, dovecot, and mbsync. >>> >>> I would love to hear suggestions of other emacs-compatible solutions I >>> should try. >> >> I might not be too helpful here, unfortunately. I've only ever used >> gnus/dovecot/isync. As I said in that post, I think dovecot plus a fts >> plugin works great, but I really don't like the search syntax on the >> Gnus/nnir side -- I find it very cumbersome. If there were a nicer >> syntax for searching, I think I'd be happy with this. > > Thanks, that is helpful Search syntax is important to me. Lord knows I > don't need any new obscure stuff to learn. Doesn't nnir simply pass the > search string to the backend. If so, is the syntax you don't like > associated with nnimap, IMAP spec, or lucene? Both nnimap and IMAP. Date-based IMAP searching in particular is hard to use: I can never remember the keywords, or the proper date format, and the results are often just wrong. It doesn't find messages that clearly fall within the specified date range. Nnimap is a little weird because it only lets you search on one header by default. If you want to search on multiple headers, that's an extra layer of querying. It's not a disaster, but it does make the process more difficult. > I have a toy installation of dovecot running with a small number of > messages. Search is very fast but article fetching is surprisingly > slow. Have you experienced anything like this? Yes, getting messages out of the server isn't super quick. That's why I think plain old nnmaildir might be preferable for email archives. The more I think about it, the more I'm leaning towards pulling my archives out of Dovecot and just using nnmaildir/nnir/notmuch. >> I could also imagine using nnmaildir -- it seems simpler, at least >> conceptually, if all you want is to keep mails archived. In that case >> you could probably use notmuch to index the mails, and set nnir to use >> that notmuch installation as the search backend. If you don't need >> accounts and authentication and everything for a simple local archive, >> that should be enough. I'll bet the searching would be even faster, as >> well. > > I currently have mu4e set up, which uses the same indexer (xapian) as > notmuch with the same set of messages in Maildir. The search response > seems as fast, and article fetch seems much faster. > > I plan to also set up notmuch and mairix. Then I will have: > > > gnus -- nnir -- nnimap -- dovecot -- maildir > lucene messages > (index) > > gnus -- nnir -- notmuch -- maildir > xapian messages > (index) > > mu4e -- mu -- maildir > xapian messages > (index) > > gnus -- mairix -- maildir > flex messages > (index) smart group > (search cache) > > > Then I can inflate the size of the message stores and compare. Frankly, > it is hard to imagine why performance shouldn't be pretty similar across > the board. > > As far as I can tell from reading and other helpful exchanges on the > list, the only major difference is that mairix caches a search result in > a so-called "smart" permanent group. This is implemented by a maildir > folder containing symbolic links to the actual messages. > > In theory, the second time we preform the exact same search, it should > be faster. If we search for the same thing over and over it probably > will be faster. If most or our searches are unique, making a cache > probably will be slower. > > If in practice we search for the same term over an over and marix lets > us just click on a group containing the result instead of typing the > search term, this could indeed be nicer even if getting the search > result is slower. > > As far as syntax, I am assume that mu4e and notmuch use a xapian > syntax. Mairix uses the Flex lexical analyzer, so I assume it is > different. > > Meanwhile I am trying to determine if there are other meaningful mairix > functional differences. - George Notmuch also caches search results somehow (it's transparent, I'm not sure how it works), though it doesn't have mairix's smart search groups. I used mairix for a bit, several years ago, and thought the way it creates groups was a bit awkward, but it's probably not a big deal. Another consideration is non-ASCII searches. Not all the search backends let you search for strings in funny encodings. IMAP requires you put the strings in quotes, and at some point namazu didn't do non-ASCII searches at all (or was it Mairix?). It's been a while now, so I don't want to slander packages that may have upgraded in the meantime, but if that's important to you it's something to look into. Otherwise I think you're right -- there shouldn't be too much difference in performance between the various backends. E