* Experimental new Maildir backend @ 2021-01-16 18:47 Juan José García-Ripoll 2021-01-17 4:49 ` Eric Abrahamsen 0 siblings, 1 reply; 8+ messages in thread From: Juan José García-Ripoll @ 2021-01-16 18:47 UTC (permalink / raw) To: ding Hi, apologies if this topic is redundant with some other proposal or effort, but I wanted to draw your attention to this small project of mine, and also request help to polish and possibly contribute it to Emacs. The project is called gnus-nnmaild and it is a new backend for Maildir spool directories. It can be found here https://github.com/juanjosegarciaripoll/gnus-nnmaild I have developed it because the nnmaildir backend does not work on Windows, where "!" or ";" are used as flag separator in the file names because ":" is not an allowed character. It also solves additional problems with nnmaildir, namely that it creates one additional file for each message to store nov files, plus additional directories and links for other flags. Instead, I have adopted a brute-force philosophy, where all information is cached in a single Elisp file, which is updated when new files are found. That may seem a bit wasteful, but given SSD's it seems to be a very good compromise between space and speed. Feedback is really welcome. Also pull requests. Cheers, -- Juan José García Ripoll http://juanjose.garciaripoll.com http://quinfog.hbar.es ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Experimental new Maildir backend 2021-01-16 18:47 Experimental new Maildir backend Juan José García-Ripoll @ 2021-01-17 4:49 ` Eric Abrahamsen 2021-01-17 9:01 ` Juan José García-Ripoll 2021-01-17 19:17 ` Eric S Fraga 0 siblings, 2 replies; 8+ messages in thread From: Eric Abrahamsen @ 2021-01-17 4:49 UTC (permalink / raw) To: ding Juan José García-Ripoll <juanjose.garciaripoll@gmail.com> writes: > Hi, > > apologies if this topic is redundant with some other proposal or effort, > but I wanted to draw your attention to this small project of mine, and > also request help to polish and possibly contribute it to Emacs. > > The project is called gnus-nnmaild and it is a new backend for Maildir > spool directories. It can be found here > > https://github.com/juanjosegarciaripoll/gnus-nnmaild Very cool! It's great to see work on more backends. I'm also in the process of putting together fixes for nnmaildir, so there might be a little bit of redundancy in our work, but it looks like your approach is a more drastic rethinking -- my changes are mostly incremental tweaks. > I have developed it because the nnmaildir backend does not work on > Windows, where "!" or ";" are used as flag separator in the file names > because ":" is not an allowed character. It also solves additional > problems with nnmaildir, namely that it creates one additional file for > each message to store nov files, plus additional directories and links > for other flags. I didn't realize that nnmaildir doesn't work on Windows at all! > Instead, I have adopted a brute-force philosophy, where all information > is cached in a single Elisp file, which is updated when new files are > found. That may seem a bit wasteful, but given SSD's it seems to be a > very good compromise between space and speed. > > Feedback is really welcome. Also pull requests. I saw on the github page that this is based off nnml code, and there are several functions (moving messages, creating groups) that haven't been implemented yet. I'm curious if you were able to use any of Gnus' backend inheritance features -- are you having to write everything from scratch? I haven't read the code yet... I've been toying with the idea of using sqlite as a store for Gnus' caches and data: it seems like that would get us the biggest speedup possible. I don't think it's feasible for Gnus' built-in backends, since vanilla Emacs doesn't come with anything for talking to sqlite, but if you made this an installable package, it could require the "sqlite" package and do it that way... Just a thought! Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Experimental new Maildir backend 2021-01-17 4:49 ` Eric Abrahamsen @ 2021-01-17 9:01 ` Juan José García-Ripoll 2021-01-17 21:33 ` Eric Abrahamsen 2021-01-17 19:17 ` Eric S Fraga 1 sibling, 1 reply; 8+ messages in thread From: Juan José García-Ripoll @ 2021-01-17 9:01 UTC (permalink / raw) To: ding Eric Abrahamsen <eric@ericabrahamsen.net> writes: > Very cool! It's great to see work on more backends. I'm also in the > process of putting together fixes for nnmaildir, so there might be a > little bit of redundancy in our work, but it looks like your approach is > a more drastic rethinking -- my changes are mostly incremental tweaks. Hi Eric, nice to see you also are interested in this format. I also started tweaking nnmaildir. I got to a point where I fixed the separator, introducing a configuration parameter that starts with ":" but can be configured to other values. Unfortunately the result was extremely slow. I gather Maildir creates one NOV file for every message, it also creates a folder for each flag and creates links (which I am not even sure whether it does correctly on Windows, where links are not always possible), for every article that has a mark. That made a folder with ~900 emails take minutes to load at all. I know that this is a Windows limitations but I think it points out how wasteful it is. In contrast, the archives I have based on nnml are very snappy. > I saw on the github page that this is based off nnml code, and there are > several functions (moving messages, creating groups) that haven't been > implemented yet. I'm curious if you were able to use any of Gnus' > backend inheritance features -- are you having to write everything from > scratch? I haven't read the code yet... I started using inheritance, but it did not work. The parent backend did not get its variables properly assigned and there were lots of confusing group names: i.e. Archives.2011 would have to be renamed Archives.2011.cur so that nnml tries to first find that directory and then attempts Archives.2011/cur. However, at some point I also was not satisfied with nnml's approach, which only stores the NOV files and enforces that all file names must be a number (the article number) and I gave up. The result is not that bad, ~500 lines. The biggest hurdle was figuring out which backend functions need to be created for the backend to tell Gnus the attributes of a message. That is not at all clear in the manual and nnmaildir, nnimap and others seem to follow different not well documented paths. > I've been toying with the idea of using sqlite as a store for Gnus' > caches and data: it seems like that would get us the biggest speedup > possible. I don't think it's feasible for Gnus' built-in backends, since > vanilla Emacs doesn't come with anything for talking to sqlite, but if > you made this an installable package, it could require the "sqlite" > package and do it that way... Just a thought! It is a hurdle to install sqlite in Emacs on Windows. I had to create my own "build-from-source" distribution for packaging Emacs with other dependencies that are not standard (https://github.com/juanjosegarciaripoll/emacs-build). However, the good news is that the hashtable cache approach very good. I have folders with 10000's of files and loading the hash table is not that bad. Plus, this can be done once and only updated later based on times, or dropping the NOVs unless they are needed -- the current code is not optimal in my implementation. I believe there is an Emacs database package that offers a hashtable backend as default. Maybe that would be a more reasonable approach to general caching, and it would benefit other stakeholders, such as org-roam. Thanks for the questions and feedback. You brought up very good points that would need to be addressed for Maildir to be more functional -- which I believe is relevant, given that Maildir is gaining traction again due to mu4e and equivalent (albeit, once more, Windows-buggy) packages. Cheers, -- Juan José García Ripoll http://juanjose.garciaripoll.com http://quinfog.hbar.es ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Experimental new Maildir backend 2021-01-17 9:01 ` Juan José García-Ripoll @ 2021-01-17 21:33 ` Eric Abrahamsen 2021-01-18 9:39 ` Juan José García-Ripoll 0 siblings, 1 reply; 8+ messages in thread From: Eric Abrahamsen @ 2021-01-17 21:33 UTC (permalink / raw) To: ding Juan José García-Ripoll <juanjose.garciaripoll@gmail.com> writes: > Eric Abrahamsen <eric@ericabrahamsen.net> writes: >> Very cool! It's great to see work on more backends. I'm also in the >> process of putting together fixes for nnmaildir, so there might be a >> little bit of redundancy in our work, but it looks like your approach is >> a more drastic rethinking -- my changes are mostly incremental tweaks. > > Hi Eric, nice to see you also are interested in this format. I also > started tweaking nnmaildir. I got to a point where I fixed the > separator, introducing a configuration parameter that starts with ":" > but can be configured to other values. Unfortunately the result was > extremely slow. You mean making that character configurable actually introduced a slowdown? It seems odd that it would be that much of a factor. > I gather Maildir creates one NOV file for every message, it also creates > a folder for each flag and creates links (which I am not even sure > whether it does correctly on Windows, where links are not always > possible), for every article that has a mark. That made a folder with > ~900 emails take minutes to load at all. I know that this is a Windows > limitations but I think it points out how wasteful it is. In contrast, > the archives I have based on nnml are very snappy. Yeah, the whole setup is a little baroque. But I think this is the dividing line between what I'm likely to do with nnmaildir, and where it makes sense to write a new backend. I don't think I would change nnmaildir's architecture, just try to fix some basic inefficiencies. If users want a more drastic change, it makes more sense to just have a new backend. >> I saw on the github page that this is based off nnml code, and there are >> several functions (moving messages, creating groups) that haven't been >> implemented yet. I'm curious if you were able to use any of Gnus' >> backend inheritance features -- are you having to write everything from >> scratch? I haven't read the code yet... > > I started using inheritance, but it did not work. The parent backend did > not get its variables properly assigned To me this is usually a sign that there are code paths that don't hit `nnoo-change-server'. I see you've got that in `nnmaild-open-server', which gets called in `nnmaild-possibly-change-directory', but I would take a look at the various code entry points and see if any of them sneak past that. Also, you're calling `nnmaild-server-opened', but that function doesn't seem to be defined? > and there were lots of confusing group names: i.e. Archives.2011 would > have to be renamed Archives.2011.cur so that nnml tries to first find > that directory and then attempts Archives.2011/cur. However, at some > point I also was not satisfied with nnml's approach, which only stores > the NOV files and enforces that all file names must be a number (the > article number) and I gave up. The result is not that bad, ~500 lines. > The biggest hurdle was figuring out which backend functions need to be > created for the backend to tell Gnus the attributes of a message. That > is not at all clear in the manual and nnmaildir, nnimap and others > seem to follow different not well documented paths. This is too bad, and something else that I would (theoretically, eventually) like to work on. If a new backend looks highly similar to an existing backend, it should be able to share a lot of the code. What do you mean exactly, "tell Gnus the attributes of a message"? If you have concrete suggestions as to how Gnus could do a better job of allowing inheritance, I'd love to hear it. Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Experimental new Maildir backend 2021-01-17 21:33 ` Eric Abrahamsen @ 2021-01-18 9:39 ` Juan José García-Ripoll 0 siblings, 0 replies; 8+ messages in thread From: Juan José García-Ripoll @ 2021-01-18 9:39 UTC (permalink / raw) To: ding Eric Abrahamsen <eric@ericabrahamsen.net> writes: > Juan José García-Ripoll <juanjose.garciaripoll@gmail.com> writes: >> Hi Eric, nice to see you also are interested in this format. I also >> started tweaking nnmaildir. I got to a point where I fixed the >> separator, introducing a configuration parameter that starts with ":" >> but can be configured to other values. Unfortunately the result was >> extremely slow. > > You mean making that character configurable actually introduced a > slowdown? It seems odd that it would be that much of a factor. No, sorry for the confusion: the change of the separator is trivial, and does not affect overall performance, but the actual backend is simply too slow for platforms where file creation, time stamp checking and linking are not cheap operations. >> I started using inheritance, but it did not work. The parent backend did >> not get its variables properly assigned > > To me this is usually a sign that there are code paths that don't hit > `nnoo-change-server'. I see you've got that in `nnmaild-open-server', > which gets called in `nnmaild-possibly-change-directory', but I would > take a look at the various code entry points and see if any of them > sneak past that. Also, you're calling `nnmaild-server-opened', but that > function doesn't seem to be defined? Again, sorry for my confusing explanation. What follows is an explanation of how I attempted it, not how it is done in my repository. In the version from GitHub, everything works and no inheritance is needed. To be fair no inheritance is possible because I ended up doing things differently from nnml, with a different caching protocol, and a totally different way of handling marks. As for my first attempts, I started using inheritance as per the nndir example. In that case, I wanted to reuse nnml's file scanning and NOV caching, which is why I created manually the nnmaild-request-* functions. These attempts would perform some tasks before calling the the functions of the parent backend with the same name. The problem there is that while all variables are properly defined in the child (including those variables that in defvoo specify alternative names for the child framework, i.e. nnmaild-directory translated to nnml-directory), the variables were arbitrarily restored to default values when the parent functions were alled. I am sorry, but I do not keep that code around, although it would not be difficult to try again. > This is too bad, and something else that I would (theoretically, > eventually) like to work on. If a new backend looks highly similar to an > existing backend, it should be able to share a lot of the code. There was no real code reuse. nnmaildir's functions are impossible to reuse; nnml's relies heavily on numbered files and uses a different way of caching NOV's > What do you mean exactly, "tell Gnus the attributes of a message"? It took me quite a lot of debugging to develop the functions that modify the info structure replacing the marks with those that are given by the Maildir backend. It was not clear from the documentation where that is supposed to happen. nnml has no implementation for marks, delegating everything on the Gnus dribble files. nnimap has an undocumented implementation that uses the fact that it has a request and a finish- processing stages. > If you have concrete suggestions as to how Gnus could do a better job > of allowing inheritance, I'd love to hear it. To be fair, I am quite lost regarding how inheritance is handled and how the whole infrastructure works. - I find it confusing the way defvoo works and the fact that everything is based on variables instead of objects, methods and slots. - In particular, coming from an OO background (both in Common Lisp and C++), I do not understand when and how those variables are rewritten. In OO paradigms, one fixes the slots of the parent classes during the construction phase; here it seems there is some magic rewrite happening at hidden places. - This made it confusing to me how I can write my own defvoo method and, from within that method call equivalent methods from other backends. That does not seem to be supported. - On a more architectural note, it is also strange that methods are expected to work with global state, instead of getting properly formed records. This makes the handling of group, server and info objects very confusing -- when do they have a value, vs. when can they be just nil, still puzzles me. - I also find that, for proper inheritance, some of the backends should have less assumptions about the underlying system (e.g. nnml should not assume numbering of files), allowing the rewrite of specific methods (e.g. retrieval and writing of marks), which in the current framework seems very difficult, as there are no real virtual functions. Hope this helps clarify how confused I am about inheritance. This said, I think I now understand how to write the backend _without_ inheritance. Juanjo -- Juan José García Ripoll http://juanjose.garciaripoll.com http://quinfog.hbar.es ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Experimental new Maildir backend 2021-01-17 4:49 ` Eric Abrahamsen 2021-01-17 9:01 ` Juan José García-Ripoll @ 2021-01-17 19:17 ` Eric S Fraga 2021-01-17 22:23 ` Eric Abrahamsen 1 sibling, 1 reply; 8+ messages in thread From: Eric S Fraga @ 2021-01-17 19:17 UTC (permalink / raw) To: ding On Saturday, 16 Jan 2021 at 20:49, Eric Abrahamsen wrote: > I've been toying with the idea of using sqlite as a store for Gnus' > caches and data: it seems like that would get us the biggest speedup > possible. This sounds interesting. I used to use nnmaildir all the time, primarily because I frequently access my emails from three different devices which I keep in sync using unison. The maildir format is ideal for this as there is no chance of conflict arising if I forget to sync any given device. However, the performance of the current nnmaildir is atrocious in such a scenario because the time stamp on the .overview file (I believe -- it's been a while since I did the analysis) leads to the whole mailbox being scanned again to build up the numerical indices used by gnus. The scan is O(n^3) which becomes intractable when you have 1000s of emails in the mailbox. Anything that enables quick scanning/updating of the index for maildir groups would be a major improvement. -- Eric S Fraga via Emacs 28.0.50 & org 9.4.4 on Debian bullseye/sid ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Experimental new Maildir backend 2021-01-17 19:17 ` Eric S Fraga @ 2021-01-17 22:23 ` Eric Abrahamsen 2021-01-18 11:14 ` Eric S Fraga 0 siblings, 1 reply; 8+ messages in thread From: Eric Abrahamsen @ 2021-01-17 22:23 UTC (permalink / raw) To: Eric S Fraga; +Cc: ding Eric S Fraga <e.fraga@ucl.ac.uk> writes: > On Saturday, 16 Jan 2021 at 20:49, Eric Abrahamsen wrote: >> I've been toying with the idea of using sqlite as a store for Gnus' >> caches and data: it seems like that would get us the biggest speedup >> possible. > > This sounds interesting. > > I used to use nnmaildir all the time, primarily because I frequently > access my emails from three different devices which I keep in sync using > unison. The maildir format is ideal for this as there is no chance of > conflict arising if I forget to sync any given device. > > However, the performance of the current nnmaildir is atrocious in such a > scenario because the time stamp on the .overview file (I believe -- it's > been a while since I did the analysis) leads to the whole mailbox being > scanned again to build up the numerical indices used by gnus. The scan > is O(n^3) which becomes intractable when you have 1000s of emails in the > mailbox. > > Anything that enables quick scanning/updating of the index for maildir > groups would be a major improvement. I think there are a couple of overlapping issues -- actual bugs vs design problems -- that might need to be unpicked: it's obvious that first-time nov database building when adopting lots of old mail has terrible performance, and it shouldn't need to be that way. Then it sounds like there might be a real bug in that nov databases are getting rebuilt when they don't need to be? Lastly, perhaps there's further inefficiency just when retrieving nov headers on a run-of-the-mill group opening. Does that sound right? Is simply opening a group slow, even when all the nov databases are built? Anyway, no need to go digging up past research, I'm mostly just trying to clarify things in my own head. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Experimental new Maildir backend 2021-01-17 22:23 ` Eric Abrahamsen @ 2021-01-18 11:14 ` Eric S Fraga 0 siblings, 0 replies; 8+ messages in thread From: Eric S Fraga @ 2021-01-18 11:14 UTC (permalink / raw) To: Eric Abrahamsen; +Cc: ding On Sunday, 17 Jan 2021 at 14:23, Eric Abrahamsen wrote: > I think there are a couple of overlapping issues -- actual bugs vs > design problems -- that might need to be unpicked: it's obvious that > first-time nov database building when adopting lots of old mail has > terrible performance, and it shouldn't need to be that way. Then it > sounds like there might be a real bug in that nov databases are getting > rebuilt when they don't need to be? I don't think it's a bug per se. IIRC, the issue is that nnmaildir uses the last modified time of various files and/or directories to see whether the database needs to be rebuilt and the synchronization of my files across various systems undermines the basis of this logic. -- Eric S Fraga via Emacs 28.0.50 & org 9.4.4 on Debian bullseye/sid ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-01-20 2:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-01-16 18:47 Experimental new Maildir backend Juan José García-Ripoll 2021-01-17 4:49 ` Eric Abrahamsen 2021-01-17 9:01 ` Juan José García-Ripoll 2021-01-17 21:33 ` Eric Abrahamsen 2021-01-18 9:39 ` Juan José García-Ripoll 2021-01-17 19:17 ` Eric S Fraga 2021-01-17 22:23 ` Eric Abrahamsen 2021-01-18 11:14 ` Eric S Fraga
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).