Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* nnimap-split-download-body removed?
@ 2020-11-30 15:03 Bodertz
  2020-11-30 17:10 ` Eric Abrahamsen
  0 siblings, 1 reply; 10+ messages in thread
From: Bodertz @ 2020-11-30 15:03 UTC (permalink / raw)
  To: info-gnus-english

I wrote a function which can do something useful with the body of an
email.  If worse comes to worse, I can run this function manually after
opening the email, but my plan was to use splitting to automate this.

As documented in "(gnus)Fancy Mail Splitting", splitting based on the
content of the email can be done with nnimap by setting
'nnimap-split-download-body' to 't'.  See the last sentence:

  (: FUNCTION ARG1 ARG2 ...)’
      If the split is a list, and the first element is ‘:’, then the
      second element will be called as a function with ARGS given as
      arguments.  The function should return a SPLIT.
  
      For instance, the following function could be used to split based
      on the body of the messages:
  
           (defun split-on-body ()
             (save-excursion
               (save-restriction
                 (widen)
                 (goto-char (point-min))
                 (when (re-search-forward "Some.*string" nil t)
                   "string.group"))))
  
      The buffer is narrowed to the header of the message in question
      when FUNCTION is run.  That’s why ‘(widen)’ needs to be called
      after ‘save-excursion’ and ‘save-restriction’ in the example above.
      Also note that with the nnimap backend, message bodies will not be
      downloaded by default.  You need to set
      ‘nnimap-split-download-body’ to ‘t’ to do that (*note Client-Side
      IMAP Splitting).

However, this did not seem to work, although I haven't tested
thouroughly.  Looking through the git history of Emacs, it seems that
'nnimap-split-download-body' was removed a little over ten years ago.
An internal variable, 'nnimap-split-download-body-default' is still
present, but is as far as I can tell unused.

Have I missed something obvious?  Is there a new way to do this?  Am I
just out of luck?

Thanks.

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-11-30 15:03 nnimap-split-download-body removed? Bodertz
@ 2020-11-30 17:10 ` Eric Abrahamsen
  2020-12-01  0:15   ` Bodertz
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Abrahamsen @ 2020-11-30 17:10 UTC (permalink / raw)
  To: Bodertz; +Cc: info-gnus-english

Bodertz <bodertz@gmail.com> writes:

> I wrote a function which can do something useful with the body of an
> email.  If worse comes to worse, I can run this function manually after
> opening the email, but my plan was to use splitting to automate this.
>
> As documented in "(gnus)Fancy Mail Splitting", splitting based on the
> content of the email can be done with nnimap by setting
> 'nnimap-split-download-body' to 't'.  See the last sentence:
>
>   (: FUNCTION ARG1 ARG2 ...)’
>       If the split is a list, and the first element is ‘:’, then the
>       second element will be called as a function with ARGS given as
>       arguments.  The function should return a SPLIT.
>   
>       For instance, the following function could be used to split based
>       on the body of the messages:
>   
>            (defun split-on-body ()
>              (save-excursion
>                (save-restriction
>                  (widen)
>                  (goto-char (point-min))
>                  (when (re-search-forward "Some.*string" nil t)
>                    "string.group"))))
>   
>       The buffer is narrowed to the header of the message in question
>       when FUNCTION is run.  That’s why ‘(widen)’ needs to be called
>       after ‘save-excursion’ and ‘save-restriction’ in the example above.
>       Also note that with the nnimap backend, message bodies will not be
>       downloaded by default.  You need to set
>       ‘nnimap-split-download-body’ to ‘t’ to do that (*note Client-Side
>       IMAP Splitting).
>
> However, this did not seem to work, although I haven't tested
> thouroughly.  Looking through the git history of Emacs, it seems that
> 'nnimap-split-download-body' was removed a little over ten years ago.
> An internal variable, 'nnimap-split-download-body-default' is still
> present, but is as far as I can tell unused.

I tried looking through the history, too, and that was pretty confusing.
It looks like the manual section talking about
`nnimap-split-download-body' was added *after* the actual defcustom was
removed. I suspect that's an artifact of git history merge, but the
current situation is still odd (the manual still refers to
`nnimap-split-download-body').

It looks like `nnimap-split-download-body-default' is in fact still
used, in `nnimap-fetch-inbox', and setting it to t looks like it should
work. Give that a shot?

No matter what, there's a mismatch between code and documentation. And
if split-download-body-default still has effect, then it should be a
defcustom, too. I don't see why there were two separate variables to
begin with.

Anyway, let me know if setting that variable to t works.

Eric

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-11-30 17:10 ` Eric Abrahamsen
@ 2020-12-01  0:15   ` Bodertz
  2020-12-01  1:51     ` Eric Abrahamsen
  0 siblings, 1 reply; 10+ messages in thread
From: Bodertz @ 2020-12-01  0:15 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

I can confirm that setting `nnimap-split-download-body-default' to t
works as intended.  I really should have tried that first.

It sounds like the removal wasn't intentional, so hopefully it can be
added back.

It seems to download the body before the splitting, which is
unfortunate.  It would be better if it only downloaded the body when the
split required it.  Or if it could match the sender against a whitelist
of accepted senders and only download when they matched.  Or the same
thing but with the summary, but then it's getting close to just adding
a simpler split system before the real one, which is a little odd.

So I don't forget, someone also suggested `nnimap-split-download-body
size', which might be a useful addition:


[-- Attachment #2: Type: message/rfc822, Size: 4564 bytes --]

From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: nnimap-split-download-body feature request
Date: Thu, 20 Nov 2003 09:20:09 -0500
Message-ID: <4n8ymbjf52.fsf@lockgroove.bwh.harvard.edu>

On Thu, 20 Nov 2003, jas@extundo.com wrote:

> Ted Zlatanov <tzz@lifelogs.com> writes:

>> Nah, just fetch the headers.  I think that's all you can reasonably
>> expect as a Gnus user.  Maybe fake the body with "BODY TOO LARGE"
>> or something like that, or add a header, but I personally think
>> that retrieving just the headers in such a case is a perfectly good
>> solution.
> 
> The asynchronous prefetch, agent, cache (and possibly more things)
> would cache this incomplete article.  How would they know the
> message was incomplete?  After requesting a re-fetch of the entire
> article, all those cached copies will need to be purged.  Sounds
> like work.

I'm not sure I understand.  Here's a patch to show you what I think
could be done, since nnimap-split-articles already decides between
the head and the whole body.  The nnimap-check-body-size function
needs to be provided, but I hope you see what I mean.

Ted.

--- nnimap.el   4 Sep 2003 22:22:18 -0000       6.71
+++ nnimap.el   20 Nov 2003 14:20:24 -0000
@@ -1271,9 +1271,10 @@
          (when (setq rule (nnimap-split-find-rule server inbox))
            ;; iterate over articles
            (dolist (article (imap-search nnimap-split-predicate))
-             (when (if (if (eq nnimap-split-download-body 'default)
-                           nnimap-split-download-body-default
-                         nnimap-split-download-body)
+             (when (if (and (nnimap-check-body-size article)
+                            (if (eq nnimap-split-download-body 'default)
+                                nnimap-split-download-body-default
+                              nnimap-split-download-body))
                        (and (nnimap-request-article article)
                             (with-current-buffer nntp-server-buffer (mail-narrow-to-head)))
                      (nnimap-request-head article))



[-- Attachment #3: Type: text/plain, Size: 162 bytes --]

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-12-01  0:15   ` Bodertz
@ 2020-12-01  1:51     ` Eric Abrahamsen
  2020-12-01  3:04       ` Bodertz
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Abrahamsen @ 2020-12-01  1:51 UTC (permalink / raw)
  To: info-gnus-english

Bodertz <bodertz@gmail.com> writes:

> I can confirm that setting `nnimap-split-download-body-default' to t
> works as intended.  I really should have tried that first.

No worries, that's good news.

> It sounds like the removal wasn't intentional, so hopefully it can be
> added back.

I think we could just resurrect the old `nnimap-split-download-body'
defcustom, then fix the docs. I don't see any need for this
`nnimap-split-download-body-default'.

> It seems to download the body before the splitting, which is
> unfortunate.  It would be better if it only downloaded the body when the
> split required it.  Or if it could match the sender against a whitelist
> of accepted senders and only download when they matched.  Or the same
> thing but with the summary, but then it's getting close to just adding
> a simpler split system before the real one, which is a little odd.
>
> So I don't forget, someone also suggested `nnimap-split-download-body
> size', which might be a useful addition:

The code Ted's referring to is long gone, and things are a lot simpler
now. We could re-introduce some of the complexity, but I wasn't there
for the earlier versions and don't know the arguments for why the code
is the way it is now. The problem with checking the headers or the
message size before downloading the body is that you're then issuing one
FETCH to get all the messages without their bodies, and then issuing
another to get the bodies you want, likely just the same list as before.

That seems like it would end up being pretty inefficient, and I wouldn't
be surprised if it turned out that we had to issue one FETCH per message
we wanted the body for. I'll admit I haven't looked at this part of the
code closely, but...

Eric



_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-12-01  1:51     ` Eric Abrahamsen
@ 2020-12-01  3:04       ` Bodertz
  2020-12-01  3:35         ` Eric Abrahamsen
  0 siblings, 1 reply; 10+ messages in thread
From: Bodertz @ 2020-12-01  3:04 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> The problem with checking the headers or the message size before
> downloading the body is that you're then issuing one FETCH to get all
> the messages without their bodies, and then issuing another to get the
> bodies you want, likely just the same list as before.

It may not be worth doing, but I challenge your assumption that they
would be the same list.  For my case at least, the list of messages
which I want split based on the body is a small subset of the list of
new messages I receive.  When I imagine other uses of splitting based on
the body, they are only cases where splitting on other matches such as
from or subject wasn't enough, and that would still be a small subset of
new messages.  Most messages can be split with just the headers.

> That seems like it would end up being pretty inefficient, and I wouldn't
> be surprised if it turned out that we had to issue one FETCH per message
> we wanted the body for. I'll admit I haven't looked at this part of the
> code closely, but...

Since `nnimap-fetch-inbox' accepts a list of articles, I don't see why
we couldn't feed the list of new articles to it with either "[HEADER]"
or "[1]" based on the result of `(nnimap-ver4-p)', and then build a new
list of articles which meet some criteria and send that to
`nnimap-fetch-inbox' with "[]".  But I don't actually understand the
code, so I'm sure things are more complicated than that.

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-12-01  3:04       ` Bodertz
@ 2020-12-01  3:35         ` Eric Abrahamsen
  2020-12-01  8:46           ` Bodertz
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Abrahamsen @ 2020-12-01  3:35 UTC (permalink / raw)
  To: info-gnus-english

Bodertz <bodertz@gmail.com> writes:

> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> The problem with checking the headers or the message size before
>> downloading the body is that you're then issuing one FETCH to get all
>> the messages without their bodies, and then issuing another to get the
>> bodies you want, likely just the same list as before.
>
> It may not be worth doing, but I challenge your assumption that they
> would be the same list.  For my case at least, the list of messages
> which I want split based on the body is a small subset of the list of
> new messages I receive.  When I imagine other uses of splitting based on
> the body, they are only cases where splitting on other matches such as
> from or subject wasn't enough, and that would still be a small subset of
> new messages.  Most messages can be split with just the headers.

Well, fair enough. I suppose it would also be one of those things where
we just warned the users it might be slow, and they could decide for
themselves.

>> That seems like it would end up being pretty inefficient, and I wouldn't
>> be surprised if it turned out that we had to issue one FETCH per message
>> we wanted the body for. I'll admit I haven't looked at this part of the
>> code closely, but...
>
> Since `nnimap-fetch-inbox' accepts a list of articles, I don't see why
> we couldn't feed the list of new articles to it with either "[HEADER]"
> or "[1]" based on the result of `(nnimap-ver4-p)', and then build a new
> list of articles which meet some criteria and send that to
> `nnimap-fetch-inbox' with "[]".  But I don't actually understand the
> code, so I'm sure things are more complicated than that.

Continuing our baseless speculation without looking at the code... I
wonder if it would be even possible to do this: it would require either
running splitting twice (once to split simpler messages, and return a
list of messages that needed further downloading and re-splitting), or
pausing in the middle of splitting to download messages.

Anyway, let me send in a bug report for the simpler change. I'll mention
your other request, and see if anyone else has a point of view.

Eric


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-12-01  3:35         ` Eric Abrahamsen
@ 2020-12-01  8:46           ` Bodertz
  2020-12-01 18:26             ` Eric Abrahamsen
  0 siblings, 1 reply; 10+ messages in thread
From: Bodertz @ 2020-12-01  8:46 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> Continuing our baseless speculation without looking at the code... I
> wonder if it would be even possible to do this: it would require
> either running splitting twice (once to split simpler messages, and
> return a list of messages that needed further downloading and
> re-splitting), or pausing in the middle of splitting to download
> messages.

Yeah, I really don't understand much of the code.
`nnimap-split-incoming-mail' runs before `nnimap-fetch-inbox', so that
might be the place which would need to be altered so that it first
fetches the new article's headers with a new function like
`nnimap-fetch-inbox' but which just ignores `nnimap-split-download-body'
and then split those, but then `nnmail-split-it' would need to somehow
know not to split messages which operate on the body.  So that seems
complicated.

Maybe having the function given in the `(: function)' split do the work
of downloading the message would be easier.  I don't know if that's
possible, though.

> Anyway, let me send in a bug report for the simpler change.

Thanks for that.

> I'll mention your other request, and see if anyone else has a point of
> view.

Thanks for that as well.

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-12-01  8:46           ` Bodertz
@ 2020-12-01 18:26             ` Eric Abrahamsen
  2020-12-02  7:18               ` Bodertz
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Abrahamsen @ 2020-12-01 18:26 UTC (permalink / raw)
  To: info-gnus-english

Bodertz <bodertz@gmail.com> writes:

> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> Continuing our baseless speculation without looking at the code... I
>> wonder if it would be even possible to do this: it would require
>> either running splitting twice (once to split simpler messages, and
>> return a list of messages that needed further downloading and
>> re-splitting), or pausing in the middle of splitting to download
>> messages.
>
> Yeah, I really don't understand much of the code.
> `nnimap-split-incoming-mail' runs before `nnimap-fetch-inbox', so that
> might be the place which would need to be altered so that it first
> fetches the new article's headers with a new function like
> `nnimap-fetch-inbox' but which just ignores `nnimap-split-download-body'
> and then split those, but then `nnmail-split-it' would need to somehow
> know not to split messages which operate on the body.  So that seems
> complicated.
>
> Maybe having the function given in the `(: function)' split do the work
> of downloading the message would be easier.  I don't know if that's
> possible, though.

I'm not sure how big of a rewrite this would require. First fetching
headers only wouldn't be hard, but then we'd need to somehow partially
fake a run of the splitting process in order to know which messages
needed more. How do you indicate in your splits that the body should be
examined?


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-12-01 18:26             ` Eric Abrahamsen
@ 2020-12-02  7:18               ` Bodertz
  2020-12-02 20:34                 ` Eric Abrahamsen
  0 siblings, 1 reply; 10+ messages in thread
From: Bodertz @ 2020-12-02  7:18 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: info-gnus-english


> I'm not sure how big of a rewrite this would require. First fetching
> headers only wouldn't be hard, but then we'd need to somehow partially
> fake a run of the splitting process in order to know which messages
> needed more. How do you indicate in your splits that the body should
> be examined?

I guess an additional splitting character could be introduced (maybe @).
And instead of splitting then and there it would push the article number
onto some list, and then after all the normal non-body-splitting
happens, if that list is non-nil, `nnimap-fetch-inbox' would download
the bodies for those messages and run the split again.  But I don't
really know.

Going back to what I said earlier about just having the function in the
`(: function)' split download the body, it seems like that is possible.
Because it's in the split rule, it only deals with one message at a
time, so it is inefficient in that sense.  In my case, I expect
on-demand downloading of the occasional message body to be more
efficient than downloading the body of every message and not using most
of them, but maybe I'm mistaken.

Anyway, the code is messy, and not quite right (it doesn't clean up the
`^M's for example), but it does seem to work in the sense that I can
search for strings in the body and split based on that.  I don't know
which if any of these save-(excursion|restriction|match-data) forms are
required.


(setq nnimap-split-download-body-default nil)

(defun scratch/test-split ()
  (current-buffer) ;; => " *nntpd*"
  (save-excursion
    (save-restriction
      (save-match-data
	(goto-char (point-min))
	(re-search-forward (rx "X-nnimap-article: "
			       (group (+ digit))))
	(let* ((article (match-string 1))
	       (command (format "UID FETCH %s (UID BODY.PEEK[])" article))
	       (full-message
		(with-current-buffer (nnimap-buffer)
		  (let ((message-start (point-max)))
		    (nnimap-send-command command)
		    (buffer-substring message-start (point))))))
	  ;; Clear the original message (with only headers)
	  (delete-region (point-min)
			 (point-max))
	  ;; Insert the full message
	  (insert full-message)
	  ;; Finally split based on message body
	  (goto-char (point-min))
	  (if (search-forward "Test Gnus Splitting" nil t)
	      "mail.bodertz.test"
	    "mail.bodertz"))))))
	    

_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nnimap-split-download-body removed?
  2020-12-02  7:18               ` Bodertz
@ 2020-12-02 20:34                 ` Eric Abrahamsen
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Abrahamsen @ 2020-12-02 20:34 UTC (permalink / raw)
  To: info-gnus-english

Bodertz <bodertz@gmail.com> writes:

>> I'm not sure how big of a rewrite this would require. First fetching
>> headers only wouldn't be hard, but then we'd need to somehow partially
>> fake a run of the splitting process in order to know which messages
>> needed more. How do you indicate in your splits that the body should
>> be examined?
>
> I guess an additional splitting character could be introduced (maybe @).
> And instead of splitting then and there it would push the article number
> onto some list, and then after all the normal non-body-splitting
> happens, if that list is non-nil, `nnimap-fetch-inbox' would download
> the bodies for those messages and run the split again.  But I don't
> really know.
>
> Going back to what I said earlier about just having the function in the
> `(: function)' split download the body, it seems like that is possible.
> Because it's in the split rule, it only deals with one message at a
> time, so it is inefficient in that sense.  In my case, I expect
> on-demand downloading of the occasional message body to be more
> efficient than downloading the body of every message and not using most
> of them, but maybe I'm mistaken.

TBH I don't think it's very likely we'll do a major rewrite to
accommodate this case. Particularly if body-scanning is only likely to
happen for a smaller subset of messages, I'd just do it the way you're
doing it below.

> Anyway, the code is messy, and not quite right (it doesn't clean up the
> `^M's for example), but it does seem to work in the sense that I can
> search for strings in the body and split based on that.  I don't know
> which if any of these save-(excursion|restriction|match-data) forms are
> required.
>
>
> (setq nnimap-split-download-body-default nil)
>
> (defun scratch/test-split ()
>   (current-buffer) ;; => " *nntpd*"
>   (save-excursion
>     (save-restriction
>       (save-match-data
> 	(goto-char (point-min))
> 	(re-search-forward (rx "X-nnimap-article: "
> 			       (group (+ digit))))
> 	(let* ((article (match-string 1))
> 	       (command (format "UID FETCH %s (UID BODY.PEEK[])" article))
> 	       (full-message
> 		(with-current-buffer (nnimap-buffer)
> 		  (let ((message-start (point-max)))
> 		    (nnimap-send-command command)
> 		    (buffer-substring message-start (point))))))
> 	  ;; Clear the original message (with only headers)
> 	  (delete-region (point-min)
> 			 (point-max))
> 	  ;; Insert the full message
> 	  (insert full-message)
> 	  ;; Finally split based on message body
> 	  (goto-char (point-min))
> 	  (if (search-forward "Test Gnus Splitting" nil t)
> 	      "mail.bodertz.test"
> 	    "mail.bodertz"))))))

You might try putting the `full-message' text in a temp buffer and
running `nnimap-transform-split-mail' on it first, that might clear up
some of the oddities of the body text itself. I would keep both the
`save-restriction' and `save-match-data'. Otherwise looks like this will
do what you want!


_______________________________________________
info-gnus-english mailing list
info-gnus-english@gnu.org
https://lists.gnu.org/mailman/listinfo/info-gnus-english

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-02 20:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30 15:03 nnimap-split-download-body removed? Bodertz
2020-11-30 17:10 ` Eric Abrahamsen
2020-12-01  0:15   ` Bodertz
2020-12-01  1:51     ` Eric Abrahamsen
2020-12-01  3:04       ` Bodertz
2020-12-01  3:35         ` Eric Abrahamsen
2020-12-01  8:46           ` Bodertz
2020-12-01 18:26             ` Eric Abrahamsen
2020-12-02  7:18               ` Bodertz
2020-12-02 20:34                 ` Eric Abrahamsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).