Gnus development mailing list
 help / color / mirror / Atom feed
* Experimental new Maildir backend
@ 2021-01-16 18:47 Juan José García-Ripoll
  2021-01-17  4:49 ` Eric Abrahamsen
  0 siblings, 1 reply; 8+ messages in thread
From: Juan José García-Ripoll @ 2021-01-16 18:47 UTC (permalink / raw)
  To: ding

Hi,

apologies if this topic is redundant with some other proposal or effort,
but I wanted to draw your attention to this small project of mine, and
also request help to polish and possibly contribute it to Emacs.

The project is called gnus-nnmaild and it is a new backend for Maildir
spool directories. It can be found here

   https://github.com/juanjosegarciaripoll/gnus-nnmaild

I have developed it because the nnmaildir backend does not work on
Windows, where "!" or ";" are used as flag separator in the file names
because ":" is not an allowed character. It also solves additional
problems with nnmaildir, namely that it creates one additional file for
each message to store nov files, plus additional directories and links
for other flags.

Instead, I have adopted a brute-force philosophy, where all information
is cached in a single Elisp file, which is updated when new files are
found. That may seem a bit wasteful, but given SSD's it seems to be a
very good compromise between space and speed.

Feedback is really welcome. Also pull requests.

Cheers,

-- 
Juan José García Ripoll
http://juanjose.garciaripoll.com
http://quinfog.hbar.es



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Experimental new Maildir backend
  2021-01-16 18:47 Experimental new Maildir backend Juan José García-Ripoll
@ 2021-01-17  4:49 ` Eric Abrahamsen
  2021-01-17  9:01   ` Juan José García-Ripoll
  2021-01-17 19:17   ` Eric S Fraga
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Abrahamsen @ 2021-01-17  4:49 UTC (permalink / raw)
  To: ding

Juan José García-Ripoll <juanjose.garciaripoll@gmail.com> writes:

> Hi,
>
> apologies if this topic is redundant with some other proposal or effort,
> but I wanted to draw your attention to this small project of mine, and
> also request help to polish and possibly contribute it to Emacs.
>
> The project is called gnus-nnmaild and it is a new backend for Maildir
> spool directories. It can be found here
>
>    https://github.com/juanjosegarciaripoll/gnus-nnmaild

Very cool! It's great to see work on more backends. I'm also in the
process of putting together fixes for nnmaildir, so there might be a
little bit of redundancy in our work, but it looks like your approach is
a more drastic rethinking -- my changes are mostly incremental tweaks.

> I have developed it because the nnmaildir backend does not work on
> Windows, where "!" or ";" are used as flag separator in the file names
> because ":" is not an allowed character. It also solves additional
> problems with nnmaildir, namely that it creates one additional file for
> each message to store nov files, plus additional directories and links
> for other flags.

I didn't realize that nnmaildir doesn't work on Windows at all!

> Instead, I have adopted a brute-force philosophy, where all information
> is cached in a single Elisp file, which is updated when new files are
> found. That may seem a bit wasteful, but given SSD's it seems to be a
> very good compromise between space and speed.
>
> Feedback is really welcome. Also pull requests.

I saw on the github page that this is based off nnml code, and there are
several functions (moving messages, creating groups) that haven't been
implemented yet. I'm curious if you were able to use any of Gnus'
backend inheritance features -- are you having to write everything from
scratch? I haven't read the code yet...

I've been toying with the idea of using sqlite as a store for Gnus'
caches and data: it seems like that would get us the biggest speedup
possible. I don't think it's feasible for Gnus' built-in backends, since
vanilla Emacs doesn't come with anything for talking to sqlite, but if
you made this an installable package, it could require the "sqlite"
package and do it that way... Just a thought!

Eric



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Experimental new Maildir backend
  2021-01-17  4:49 ` Eric Abrahamsen
@ 2021-01-17  9:01   ` Juan José García-Ripoll
  2021-01-17 21:33     ` Eric Abrahamsen
  2021-01-17 19:17   ` Eric S Fraga
  1 sibling, 1 reply; 8+ messages in thread
From: Juan José García-Ripoll @ 2021-01-17  9:01 UTC (permalink / raw)
  To: ding

Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Very cool! It's great to see work on more backends. I'm also in the
> process of putting together fixes for nnmaildir, so there might be a
> little bit of redundancy in our work, but it looks like your approach is
> a more drastic rethinking -- my changes are mostly incremental tweaks.

Hi Eric, nice to see you also are interested in this format. I also
started tweaking nnmaildir. I got to a point where I fixed the
separator, introducing a configuration parameter that starts with ":"
but can be configured to other values. Unfortunately the result was
extremely slow.

I gather Maildir creates one NOV file for every message, it also creates
a folder for each flag and creates links (which I am not even sure
whether it does correctly on Windows, where links are not always
possible), for every article that has a mark. That made a folder with
~900 emails take minutes to load at all. I know that this is a Windows
limitations but I think it points out how wasteful it is. In contrast,
the archives I have based on nnml are very snappy.

> I saw on the github page that this is based off nnml code, and there are
> several functions (moving messages, creating groups) that haven't been
> implemented yet. I'm curious if you were able to use any of Gnus'
> backend inheritance features -- are you having to write everything from
> scratch? I haven't read the code yet...

I started using inheritance, but it did not work. The parent backend did
not get its variables properly assigned and there were lots of confusing
group names: i.e. Archives.2011 would have to be renamed
Archives.2011.cur so that nnml tries to first find that directory and
then attempts Archives.2011/cur. However, at some point I also was not
satisfied with nnml's approach, which only stores the NOV files and
enforces that all file names must be a number (the article number) and I
gave up. The result is not that bad, ~500 lines. The biggest hurdle was
figuring out which backend functions need to be created for the backend
to tell Gnus the attributes of a message. That is not at all clear in
the manual and nnmaildir, nnimap and others seem to follow different not
well documented paths.

> I've been toying with the idea of using sqlite as a store for Gnus'
> caches and data: it seems like that would get us the biggest speedup
> possible. I don't think it's feasible for Gnus' built-in backends, since
> vanilla Emacs doesn't come with anything for talking to sqlite, but if
> you made this an installable package, it could require the "sqlite"
> package and do it that way... Just a thought!

It is a hurdle to install sqlite in Emacs on Windows. I had to create my
own "build-from-source" distribution for packaging Emacs with other
dependencies that are not standard
(https://github.com/juanjosegarciaripoll/emacs-build).

However, the good news is that the hashtable cache approach very good. I
have folders with 10000's of files and loading the hash table is not
that bad. Plus, this can be done once and only updated later based on
times, or dropping the NOVs unless they are needed -- the current code
is not optimal in my implementation. I believe there is an Emacs
database package that offers a hashtable backend as default. Maybe that
would be a more reasonable approach to general caching, and it would
benefit other stakeholders, such as org-roam.

Thanks for the questions and feedback. You brought up very good points
that would need to be addressed for Maildir to be more functional --
which I believe is relevant, given that Maildir is gaining traction
again due to mu4e and equivalent (albeit, once more, Windows-buggy)
packages.

Cheers,

-- 
Juan José García Ripoll
http://juanjose.garciaripoll.com
http://quinfog.hbar.es



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Experimental new Maildir backend
  2021-01-17  4:49 ` Eric Abrahamsen
  2021-01-17  9:01   ` Juan José García-Ripoll
@ 2021-01-17 19:17   ` Eric S Fraga
  2021-01-17 22:23     ` Eric Abrahamsen
  1 sibling, 1 reply; 8+ messages in thread
From: Eric S Fraga @ 2021-01-17 19:17 UTC (permalink / raw)
  To: ding

On Saturday, 16 Jan 2021 at 20:49, Eric Abrahamsen wrote:
> I've been toying with the idea of using sqlite as a store for Gnus'
> caches and data: it seems like that would get us the biggest speedup
> possible. 

This sounds interesting.

I used to use nnmaildir all the time, primarily because I frequently
access my emails from three different devices which I keep in sync using
unison.  The maildir format is ideal for this as there is no chance of
conflict arising if I forget to sync any given device.

However, the performance of the current nnmaildir is atrocious in such a
scenario because the time stamp on the .overview file (I believe -- it's
been a while since I did the analysis) leads to the whole mailbox being
scanned again to build up the numerical indices used by gnus.  The scan
is O(n^3) which becomes intractable when you have 1000s of emails in the
mailbox.

Anything that enables quick scanning/updating of the index for maildir
groups would be a major improvement.


-- 
Eric S Fraga via Emacs 28.0.50 & org 9.4.4 on Debian bullseye/sid



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Experimental new Maildir backend
  2021-01-17  9:01   ` Juan José García-Ripoll
@ 2021-01-17 21:33     ` Eric Abrahamsen
  2021-01-18  9:39       ` Juan José García-Ripoll
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Abrahamsen @ 2021-01-17 21:33 UTC (permalink / raw)
  To: ding

Juan José García-Ripoll <juanjose.garciaripoll@gmail.com> writes:

> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>> Very cool! It's great to see work on more backends. I'm also in the
>> process of putting together fixes for nnmaildir, so there might be a
>> little bit of redundancy in our work, but it looks like your approach is
>> a more drastic rethinking -- my changes are mostly incremental tweaks.
>
> Hi Eric, nice to see you also are interested in this format. I also
> started tweaking nnmaildir. I got to a point where I fixed the
> separator, introducing a configuration parameter that starts with ":"
> but can be configured to other values. Unfortunately the result was
> extremely slow.

You mean making that character configurable actually introduced a
slowdown? It seems odd that it would be that much of a factor.

> I gather Maildir creates one NOV file for every message, it also creates
> a folder for each flag and creates links (which I am not even sure
> whether it does correctly on Windows, where links are not always
> possible), for every article that has a mark. That made a folder with
> ~900 emails take minutes to load at all. I know that this is a Windows
> limitations but I think it points out how wasteful it is. In contrast,
> the archives I have based on nnml are very snappy.

Yeah, the whole setup is a little baroque. But I think this is the
dividing line between what I'm likely to do with nnmaildir, and where it
makes sense to write a new backend. I don't think I would change
nnmaildir's architecture, just try to fix some basic inefficiencies. If
users want a more drastic change, it makes more sense to just have a new
backend.

>> I saw on the github page that this is based off nnml code, and there are
>> several functions (moving messages, creating groups) that haven't been
>> implemented yet. I'm curious if you were able to use any of Gnus'
>> backend inheritance features -- are you having to write everything from
>> scratch? I haven't read the code yet...
>
> I started using inheritance, but it did not work. The parent backend did
> not get its variables properly assigned

To me this is usually a sign that there are code paths that don't hit
`nnoo-change-server'. I see you've got that in `nnmaild-open-server',
which gets called in `nnmaild-possibly-change-directory', but I would
take a look at the various code entry points and see if any of them
sneak past that. Also, you're calling `nnmaild-server-opened', but that
function doesn't seem to be defined?

> and there were lots of confusing group names: i.e. Archives.2011 would
> have to be renamed Archives.2011.cur so that nnml tries to first find
> that directory and then attempts Archives.2011/cur. However, at some
> point I also was not satisfied with nnml's approach, which only stores
> the NOV files and enforces that all file names must be a number (the
> article number) and I gave up. The result is not that bad, ~500 lines.
> The biggest hurdle was figuring out which backend functions need to be
> created for the backend to tell Gnus the attributes of a message. That
> is not at all clear in the manual and nnmaildir, nnimap and others
> seem to follow different not well documented paths.

This is too bad, and something else that I would (theoretically,
eventually) like to work on. If a new backend looks highly similar to an
existing backend, it should be able to share a lot of the code. What do
you mean exactly, "tell Gnus the attributes of a message"? If you have
concrete suggestions as to how Gnus could do a better job of allowing
inheritance, I'd love to hear it.

Eric



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Experimental new Maildir backend
  2021-01-17 19:17   ` Eric S Fraga
@ 2021-01-17 22:23     ` Eric Abrahamsen
  2021-01-18 11:14       ` Eric S Fraga
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Abrahamsen @ 2021-01-17 22:23 UTC (permalink / raw)
  To: Eric S Fraga; +Cc: ding

Eric S Fraga <e.fraga@ucl.ac.uk> writes:

> On Saturday, 16 Jan 2021 at 20:49, Eric Abrahamsen wrote:
>> I've been toying with the idea of using sqlite as a store for Gnus'
>> caches and data: it seems like that would get us the biggest speedup
>> possible. 
>
> This sounds interesting.
>
> I used to use nnmaildir all the time, primarily because I frequently
> access my emails from three different devices which I keep in sync using
> unison.  The maildir format is ideal for this as there is no chance of
> conflict arising if I forget to sync any given device.
>
> However, the performance of the current nnmaildir is atrocious in such a
> scenario because the time stamp on the .overview file (I believe -- it's
> been a while since I did the analysis) leads to the whole mailbox being
> scanned again to build up the numerical indices used by gnus.  The scan
> is O(n^3) which becomes intractable when you have 1000s of emails in the
> mailbox.
>
> Anything that enables quick scanning/updating of the index for maildir
> groups would be a major improvement.

I think there are a couple of overlapping issues -- actual bugs vs
design problems -- that might need to be unpicked: it's obvious that
first-time nov database building when adopting lots of old mail has
terrible performance, and it shouldn't need to be that way. Then it
sounds like there might be a real bug in that nov databases are getting
rebuilt when they don't need to be? Lastly, perhaps there's further
inefficiency just when retrieving nov headers on a run-of-the-mill group
opening. Does that sound right? Is simply opening a group slow, even
when all the nov databases are built?

Anyway, no need to go digging up past research, I'm mostly just trying
to clarify things in my own head.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Experimental new Maildir backend
  2021-01-17 21:33     ` Eric Abrahamsen
@ 2021-01-18  9:39       ` Juan José García-Ripoll
  0 siblings, 0 replies; 8+ messages in thread
From: Juan José García-Ripoll @ 2021-01-18  9:39 UTC (permalink / raw)
  To: ding

Eric Abrahamsen <eric@ericabrahamsen.net> writes:
> Juan José García-Ripoll <juanjose.garciaripoll@gmail.com> writes:
>> Hi Eric, nice to see you also are interested in this format. I also
>> started tweaking nnmaildir. I got to a point where I fixed the
>> separator, introducing a configuration parameter that starts with ":"
>> but can be configured to other values. Unfortunately the result was
>> extremely slow.
>
> You mean making that character configurable actually introduced a
> slowdown? It seems odd that it would be that much of a factor.

No, sorry for the confusion: the change of the separator is trivial, and
does not affect overall performance, but the actual backend is simply
too slow for platforms where file creation, time stamp checking and
linking are not cheap operations.

>> I started using inheritance, but it did not work. The parent backend did
>> not get its variables properly assigned
>
> To me this is usually a sign that there are code paths that don't hit
> `nnoo-change-server'. I see you've got that in `nnmaild-open-server',
> which gets called in `nnmaild-possibly-change-directory', but I would
> take a look at the various code entry points and see if any of them
> sneak past that. Also, you're calling `nnmaild-server-opened', but that
> function doesn't seem to be defined?

Again, sorry for my confusing explanation. What follows is an
explanation of how I attempted it, not how it is done in my
repository. In the version from GitHub, everything works and no
inheritance is needed. To be fair no inheritance is possible because I
ended up doing things differently from nnml, with a different caching
protocol, and a totally different way of handling marks.

As for my first attempts, I started using inheritance as per the nndir
example. In that case, I wanted to reuse nnml's file scanning and NOV
caching, which is why I created manually the nnmaild-request-*
functions. These attempts would perform some tasks before calling the
the functions of the parent backend with the same name. The problem
there is that while all variables are properly defined in the child
(including those variables that in defvoo specify alternative names for
the child framework, i.e. nnmaild-directory translated to
nnml-directory), the variables were arbitrarily restored to default
values when the parent functions were alled. I am sorry, but I do not
keep that code around, although it would not be difficult to try again.

> This is too bad, and something else that I would (theoretically,
> eventually) like to work on. If a new backend looks highly similar to an
> existing backend, it should be able to share a lot of the code.

There was no real code reuse. nnmaildir's functions are impossible to
reuse; nnml's relies heavily on numbered files and uses a different way
of caching NOV's

> What do you mean exactly, "tell Gnus the attributes of a message"?

It took me quite a lot of debugging to develop the functions that modify
the info structure replacing the marks with those that are given by the
Maildir backend. It was not clear from the documentation where that is
supposed to happen. nnml has no implementation for marks, delegating
everything on the Gnus dribble files. nnimap has an undocumented
implementation that uses the fact that it has a request and a finish-
processing stages.

> If you have concrete suggestions as to how Gnus could do a better job
> of allowing inheritance, I'd love to hear it.

To be fair, I am quite lost regarding how inheritance is handled and how
the whole infrastructure works.

- I find it confusing the way defvoo works and the fact that everything
is based on variables instead of objects, methods and slots. 

- In particular, coming from an OO background (both in Common Lisp and
C++), I do not understand when and how those variables are rewritten. In
OO paradigms, one fixes the slots of the parent classes during the
construction phase; here it seems there is some magic rewrite happening
at hidden places.

- This made it confusing to me how I can write my own defvoo method and,
from within that method call equivalent methods from other
backends. That does not seem to be supported.

- On a more architectural note, it is also strange that methods are
expected to work with global state, instead of getting properly formed
records. This makes the handling of group, server and info objects very
confusing -- when do they have a value, vs. when can they be just nil,
still puzzles me.

- I also find that, for proper inheritance, some of the backends should
have less assumptions about the underlying system (e.g. nnml should not
assume numbering of files), allowing the rewrite of specific methods
(e.g. retrieval and writing of marks), which in the current framework
seems very difficult, as there are no real virtual functions.

Hope this helps clarify how confused I am about inheritance. This said,
I think I now understand how to write the backend _without_ inheritance.

Juanjo

-- 
Juan José García Ripoll
http://juanjose.garciaripoll.com
http://quinfog.hbar.es



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Experimental new Maildir backend
  2021-01-17 22:23     ` Eric Abrahamsen
@ 2021-01-18 11:14       ` Eric S Fraga
  0 siblings, 0 replies; 8+ messages in thread
From: Eric S Fraga @ 2021-01-18 11:14 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: ding

On Sunday, 17 Jan 2021 at 14:23, Eric Abrahamsen wrote:
> I think there are a couple of overlapping issues -- actual bugs vs
> design problems -- that might need to be unpicked: it's obvious that
> first-time nov database building when adopting lots of old mail has
> terrible performance, and it shouldn't need to be that way. Then it
> sounds like there might be a real bug in that nov databases are getting
> rebuilt when they don't need to be? 

I don't think it's a bug per se.  IIRC, the issue is that nnmaildir uses
the last modified time of various files and/or directories to see
whether the database needs to be rebuilt and the synchronization of my
files across various systems undermines the basis of this logic.

-- 
Eric S Fraga via Emacs 28.0.50 & org 9.4.4 on Debian bullseye/sid


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-20  2:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-16 18:47 Experimental new Maildir backend Juan José García-Ripoll
2021-01-17  4:49 ` Eric Abrahamsen
2021-01-17  9:01   ` Juan José García-Ripoll
2021-01-17 21:33     ` Eric Abrahamsen
2021-01-18  9:39       ` Juan José García-Ripoll
2021-01-17 19:17   ` Eric S Fraga
2021-01-17 22:23     ` Eric Abrahamsen
2021-01-18 11:14       ` Eric S Fraga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).