* Scoring on basee64 encoded message body @ 2012-03-13 10:39 Jan Tatarik 2012-03-14 14:38 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-03-13 10:39 UTC (permalink / raw) To: ding [-- Attachment #1.1: Type: text/plain, Size: 410 bytes --] Hi, I just spent some time trying to get scoring by message body work in one of my nnimap groups. I finally realized the content of the messages is base64 encoded, so matching on the raw body cannot work. The attached patch fixes the problem for me, but I have no idea whether it's a generally acceptable solution. I'm only using the body match in a low-traffic group, so speed is not an issue for me. Jan [-- Attachment #1.2: Type: text/html, Size: 534 bytes --] [-- Attachment #2: score-body-base64-encoded.diff --] [-- Type: application/octet-stream, Size: 1142 bytes --] diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el index 954295438c953c2500b9c1959a49e52312cc9653..85a1f83466a246c26be5f8b05263a045c7928f0e 100644 --- a/lisp/gnus-logic.el +++ b/lisp/gnus-logic.el @@ -204,6 +204,8 @@ (narrow-to-region (point) (or (search-forward "\n\n" nil t) (point-max))) + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64") + (article-de-base64-unreadable t)) (narrow-to-region (or (search-forward "\n\n" nil t) (point)) (point-max)))) diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el index f86b6f837a70ce54b06668187821fe57c3f80f4c..231c942bb34c97d66563ca7a399c9808b1f66663 100644 --- a/lisp/gnus-score.el +++ b/lisp/gnus-score.el @@ -1781,6 +1781,8 @@ score in `gnus-newsgroup-scored' by SCORE." (narrow-to-region (point) (or (search-forward "\n\n" nil t) (point-max))) + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64") + (article-de-base64-unreadable t)) (narrow-to-region (or (search-forward "\n\n" nil t) (point)) (point-max)))) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-03-13 10:39 Scoring on basee64 encoded message body Jan Tatarik @ 2012-03-14 14:38 ` Lars Magne Ingebrigtsen 2012-03-14 20:21 ` Reiner Steib 2012-03-15 21:05 ` Jan Tatarik 0 siblings, 2 replies; 20+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-14 14:38 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: > I finally realized the content of the messages is base64 encoded, so > matching on the raw body cannot work. > > The attached patch fixes the problem for me, but I have no idea > whether it's a generally acceptable solution. I'm only using the body > match in a low-traffic group, so speed is not an issue for me. [...] > + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64") > + (article-de-base64-unreadable t)) This isn't a general enough solution here. QP-encoded messages also want decoding. But the more general issue is -- should scoring on bodies be done on the decoded messages or the encoded messages? I think it would make more sense to do it on decoded messages, and since these are body matches, speed don't really matter that much, because body matches are s-l-o-w anyway. Scoring on headers are done on the decoded headers, right? So it would be more consistent to decode the bodies, too. Wouldn't it? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-03-14 14:38 ` Lars Magne Ingebrigtsen @ 2012-03-14 20:21 ` Reiner Steib 2012-03-15 1:29 ` Lars Magne Ingebrigtsen 2012-03-15 21:05 ` Jan Tatarik 1 sibling, 1 reply; 20+ messages in thread From: Reiner Steib @ 2012-03-14 20:21 UTC (permalink / raw) To: ding On Wed, Mar 14 2012, Lars Magne Ingebrigtsen wrote: > But the more general issue is -- should scoring on bodies be done on the > decoded messages or the encoded messages? I think it would make more > sense to do it on decoded messages, Sure. I remember that we had this problem WRT mail splitting. And I'm a little surprised that `nnmail-mail-splitting-decodes' is nil by default. Probably because users might filter spam on "strange" charsets. > Scoring on headers are done on the decoded headers, right? So it would > be more consistent to decode the bodies, too. Wouldn't it? Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-03-14 20:21 ` Reiner Steib @ 2012-03-15 1:29 ` Lars Magne Ingebrigtsen 0 siblings, 0 replies; 20+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-15 1:29 UTC (permalink / raw) To: ding Reiner Steib <reinersteib+gmane@imap.cc> writes: > I remember that we had this problem WRT mail splitting. And I'm a > little surprised that `nnmail-mail-splitting-decodes' is nil by > default. Probably because users might filter spam on "strange" > charsets. I do that -- for instance to filter all the Russian spam into the spam group. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-03-14 14:38 ` Lars Magne Ingebrigtsen 2012-03-14 20:21 ` Reiner Steib @ 2012-03-15 21:05 ` Jan Tatarik 2012-03-22 20:38 ` Lars Magne Ingebrigtsen 1 sibling, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-03-15 21:05 UTC (permalink / raw) To: ding [-- Attachment #1: Type: text/plain, Size: 1000 bytes --] On Wed, Mar 14 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote: > Jan Tatarik <jan.tatarik@gmail.com> writes: >> I finally realized the content of the messages is base64 encoded, so >> matching on the raw body cannot work. >> The attached patch fixes the problem for me, but I have no idea >> whether it's a generally acceptable solution. I'm only using the body >> match in a low-traffic group, so speed is not an issue for me. > [...] >> + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64") >> + (article-de-base64-unreadable t)) > This isn't a general enough solution here. QP-encoded messages also > want decoding. > But the more general issue is -- should scoring on bodies be done on the > decoded messages or the encoded messages? I think it would make more > sense to do it on decoded messages, and since these are body matches, > speed don't really matter that much, because body matches are s-l-o-w > anyway. This better? [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: decode-message-before-scoring-on-body.diff --] [-- Type: text/x-diff, Size: 3182 bytes --] diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el index 954295438c953c2500b9c1959a49e52312cc9653..1b4fc22bc11ee0d599fb40a0adef53588b4f9ca4 100644 --- a/lisp/gnus-logic.el +++ b/lisp/gnus-logic.el @@ -181,8 +181,10 @@ (with-current-buffer nntp-server-buffer (let* ((request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect the + ;; content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) ofunc article) ;; Not all backends support partial fetching. In that case, we @@ -196,6 +198,14 @@ (gnus-message 7 "Scoring article %s..." article) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + ;; Searching base64/qp-encoded message body produces more + ;; satisfactory results if we decode the message first + (unless (or (eq ofunc 'gnus-request-head) + (eq request-func 'gnus-request-head)) + (let ((encoding (gnus-fetch-field "content-transfer-encoding"))) + (cond + ((string= "base64" encoding) (article-de-base64-unreadable t)) + ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t))))) ;; If just parts of the article is to be searched and the ;; backend didn't support partial fetching, we just narrow to ;; the relevant parts. diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el index f86b6f837a70ce54b06668187821fe57c3f80f4c..776194a31c6702441d3bae74c8f6048778270e67 100644 --- a/lisp/gnus-score.el +++ b/lisp/gnus-score.el @@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE." (all-scores scores) (request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect + ;; the content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) entries alist ofunc article last) (when articles @@ -1773,6 +1775,14 @@ score in `gnus-newsgroup-scored' by SCORE." (widen) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + ;; Searching base64/qp-encoded message body produces more + ;; satisfactory results if we decode the message first + (unless (or (eq ofunc 'gnus-request-head) + (eq request-func 'gnus-request-head)) + (let ((encoding (gnus-fetch-field "content-transfer-encoding"))) + (cond + ((string= "base64" encoding) (article-de-base64-unreadable t)) + ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t))))) ;; If just parts of the article is to be searched, but the ;; backend didn't support partial fetching, we just narrow ;; to the relevant parts. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-03-15 21:05 ` Jan Tatarik @ 2012-03-22 20:38 ` Lars Magne Ingebrigtsen 2012-03-23 12:11 ` Jan Tatarik 0 siblings, 1 reply; 20+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-03-22 20:38 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: > This better? Yes, that looks better, but it should probably just call `mm-decode-content-transfer-encoding' instead, I think? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-03-22 20:38 ` Lars Magne Ingebrigtsen @ 2012-03-23 12:11 ` Jan Tatarik 2012-04-10 19:32 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-03-23 12:11 UTC (permalink / raw) To: ding [-- Attachment #1: Type: text/plain, Size: 271 bytes --] On Thu, Mar 22 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote: > Jan Tatarik <jan.tatarik@gmail.com> writes: >> This better? > Yes, that looks better, but it should probably just call > `mm-decode-content-transfer-encoding' instead, I think? Like this? [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: decode message body before scoring on it --] [-- Type: text/x-diff, Size: 3608 bytes --] diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el index 954295438c953c2500b9c1959a49e52312cc9653..9216f5699ce1ed9a8c39dd03257a17885f6e8490 100644 --- a/lisp/gnus-logic.el +++ b/lisp/gnus-logic.el @@ -181,8 +181,10 @@ (with-current-buffer nntp-server-buffer (let* ((request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect the + ;; content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) ofunc article) ;; Not all backends support partial fetching. In that case, we @@ -196,6 +198,20 @@ (gnus-message 7 "Scoring article %s..." article) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + ;; Searching base64/qp-encoded message body produces more + ;; satisfactory results if we decode the message first + (unless (or (eq ofunc 'gnus-request-head) + (eq request-func 'gnus-request-head)) + (let ((encoding (gnus-fetch-field "content-transfer-encoding"))) + (when encoding + (save-excursion + (save-restriction + ;; narrow to body + (narrow-to-region + (or (search-forward "\n\n" nil t) (point)) + (point-max)) + (mm-decode-content-transfer-encoding + (intern (downcase encoding)))))))) ;; If just parts of the article is to be searched and the ;; backend didn't support partial fetching, we just narrow to ;; the relevant parts. diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el index f86b6f837a70ce54b06668187821fe57c3f80f4c..322aed78fa374b873fb8604482adead77362c5be 100644 --- a/lisp/gnus-score.el +++ b/lisp/gnus-score.el @@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE." (all-scores scores) (request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect + ;; the content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) entries alist ofunc article last) (when articles @@ -1773,6 +1775,20 @@ score in `gnus-newsgroup-scored' by SCORE." (widen) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + ;; Searching base64/qp-encoded message body produces more + ;; satisfactory results if we decode the message first + (unless (or (eq ofunc 'gnus-request-head) + (eq request-func 'gnus-request-head)) + (let ((encoding (gnus-fetch-field "content-transfer-encoding"))) + (when encoding + (save-excursion + (save-restriction + ;; narrow to body + (narrow-to-region + (or (search-forward "\n\n" nil t) (point)) + (point-max)) + (mm-decode-content-transfer-encoding + (intern (downcase encoding)))))))) ;; If just parts of the article is to be searched, but the ;; backend didn't support partial fetching, we just narrow ;; to the relevant parts. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-03-23 12:11 ` Jan Tatarik @ 2012-04-10 19:32 ` Lars Magne Ingebrigtsen 2012-04-11 7:30 ` Jan Tatarik 2012-04-11 19:34 ` Jan Tatarik 0 siblings, 2 replies; 20+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-04-10 19:32 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: >> Yes, that looks better, but it should probably just call >> `mm-decode-content-transfer-encoding' instead, I think? > > Like this? Yes. And I almost applied it, but then I remembered that this wouldn't, in general, really give us what we wanted, anyway. :-/ That is, a lot of messages are MIME multipart messages. If we start decoding, we really should decode all the parts, and score on the decoded text of them all, I think. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-04-10 19:32 ` Lars Magne Ingebrigtsen @ 2012-04-11 7:30 ` Jan Tatarik 2012-04-11 19:34 ` Jan Tatarik 1 sibling, 0 replies; 20+ messages in thread From: Jan Tatarik @ 2012-04-11 7:30 UTC (permalink / raw) To: ding On Tue, Apr 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote: >> Like this? > Yes. And I almost applied it, but then I remembered that this > wouldn't, in general, really give us what we wanted, anyway. :-/ > That is, a lot of messages are MIME multipart messages. If we start > decoding, we really should decode all the parts, and score on the > decoded text of them all, I think. Any other scenario I should be aware of, before I send the next patch? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-04-10 19:32 ` Lars Magne Ingebrigtsen 2012-04-11 7:30 ` Jan Tatarik @ 2012-04-11 19:34 ` Jan Tatarik 2012-04-12 18:45 ` Lars Magne Ingebrigtsen 1 sibling, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-04-11 19:34 UTC (permalink / raw) To: ding On Tue, Apr 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote: > Yes. And I almost applied it, but then I remembered that this > wouldn't, in general, really give us what we wanted, anyway. :-/ > That is, a lot of messages are MIME multipart messages. If we start > decoding, we really should decode all the parts, and score on the > decoded text of them all, I think. All text/* parts, I assume? I had a quick look at mm handling in gnus. My idea would be to - run mm-dissect-buffer on the message body (any idea which args would be appropriate?) - for multipart messages, pick the handles with text/* type, run them through their respective mm-inline-* function as defined in mm-inline-media-tests - score on all the decoded text parts Is this the way to go? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-04-11 19:34 ` Jan Tatarik @ 2012-04-12 18:45 ` Lars Magne Ingebrigtsen 2012-04-12 22:58 ` Jan Tatarik 0 siblings, 1 reply; 20+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-04-12 18:45 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: > All text/* parts, I assume? It's not uncommon to stash texts bits in application/octet-stream parts, too, but I guess limiting this to text/* would be sensible... > I had a quick look at mm handling in gnus. My idea would be to > > - run mm-dissect-buffer on the message body (any idea which args would > be appropriate?) NO-STRICT-MIME, I think. > - for multipart messages, pick the handles with text/* type, run them > through their respective mm-inline-* function as defined in > mm-inline-media-tests > > - score on all the decoded text parts > > Is this the way to go? Yup; sounds good. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-04-12 18:45 ` Lars Magne Ingebrigtsen @ 2012-04-12 22:58 ` Jan Tatarik 2012-06-10 21:08 ` Lars Magne Ingebrigtsen 0 siblings, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-04-12 22:58 UTC (permalink / raw) To: ding [-- Attachment #1: Type: text/plain, Size: 486 bytes --] On Thu, Apr 12 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote: >> - run mm-dissect-buffer on the message body (any idea which args would >> be appropriate?) > NO-STRICT-MIME, I think. >> - for multipart messages, pick the handles with text/* type, run them >> through their respective mm-inline-* function as defined in >> mm-inline-media-tests >> - score on all the decoded text parts >> Is this the way to go? > Yup; sounds good. And here is the new patch. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: decode mm messages when scoring on body --] [-- Type: text/x-diff, Size: 3002 bytes --] diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el index 954295438c953c2500b9c1959a49e52312cc9653..38442a406dd6fb6ef47cd468248618e68337a26a 100644 --- a/lisp/gnus-logic.el +++ b/lisp/gnus-logic.el @@ -181,8 +181,10 @@ (with-current-buffer nntp-server-buffer (let* ((request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect the + ;; content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) ofunc article) ;; Not all backends support partial fetching. In that case, we @@ -196,6 +198,7 @@ (gnus-message 7 "Scoring article %s..." article) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + (gnus-score-decode-text-parts) ;; If just parts of the article is to be searched and the ;; backend didn't support partial fetching, we just narrow to ;; the relevant parts. diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el index f86b6f837a70ce54b06668187821fe57c3f80f4c..003355dd2c91847241dc67263f83d26ae52920de 100644 --- a/lisp/gnus-score.el +++ b/lisp/gnus-score.el @@ -1736,6 +1736,24 @@ score in `gnus-newsgroup-scored' by SCORE." (setq entries rest))))) nil) +(defun gnus-score-decode-text-parts () + (let ((handles (mm-dissect-buffer t))) + (cond ((stringp (car handles)) (pop handles)) + ((and (bufferp (car handles)) + (stringp (car (mm-handle-type handles)))) + (setq handles (list handles)))) + + (save-excursion + (article-goto-body) + (delete-region (point) (point-max)) + (save-restriction + (narrow-to-region (point) (point)) + (mapc #'mm-display-inline + (remove-if-not + (lambda (handle) + (string-match "^text/" (mm-handle-media-type handle))) + handles)))))) + (defun gnus-score-body (scores header now expire &optional trace) (if gnus-agent-fetching nil @@ -1752,8 +1770,10 @@ score in `gnus-newsgroup-scored' by SCORE." (all-scores scores) (request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect + ;; the content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) entries alist ofunc article last) (when articles @@ -1773,6 +1793,7 @@ score in `gnus-newsgroup-scored' by SCORE." (widen) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + (gnus-score-decode-text-parts) ;; If just parts of the article is to be searched, but the ;; backend didn't support partial fetching, we just narrow ;; to the relevant parts. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-04-12 22:58 ` Jan Tatarik @ 2012-06-10 21:08 ` Lars Magne Ingebrigtsen 2012-06-28 9:45 ` Jan Tatarik 0 siblings, 1 reply; 20+ messages in thread From: Lars Magne Ingebrigtsen @ 2012-06-10 21:08 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: > And here is the new patch. Sorry for the extreme delays here. The patch looks good. I don't remember whether I asked you whether you had FSF copyright assignments on file or not? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-06-10 21:08 ` Lars Magne Ingebrigtsen @ 2012-06-28 9:45 ` Jan Tatarik 2012-09-05 13:40 ` Lars Ingebrigtsen 0 siblings, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-06-28 9:45 UTC (permalink / raw) To: ding; +Cc: Lars Magne Ingebrigtsen [-- Attachment #1: Type: text/plain, Size: 329 bytes --] On Sun, Jun 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote: > Jan Tatarik <jan.tatarik@gmail.com> writes: >> And here is the new patch. > Sorry for the extreme delays here. > The patch looks good. > I don't remember whether I asked you whether you had FSF copyright > assignments on file or not? I do now. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: score by body --] [-- Type: text/x-diff, Size: 14559 bytes --] diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el index 954295438c953c2500b9c1959a49e52312cc9653..a440b7799303ddf46674c35c20f5bba51388b50c 100644 --- a/lisp/gnus-logic.el +++ b/lisp/gnus-logic.el @@ -180,46 +180,51 @@ (setq header "article")) (with-current-buffer nntp-server-buffer (let* ((request-func (cond ((string= "head" header) - 'gnus-request-head) - ((string= "body" header) - 'gnus-request-body) - (t 'gnus-request-article))) - ofunc article) + 'gnus-request-head) + ;; We need to peek at the headers to detect the + ;; content encoding + ((string= "body" header) + 'gnus-request-article) + (t 'gnus-request-article))) + ofunc article handles) ;; Not all backends support partial fetching. In that case, we ;; just fetch the entire article. (unless (gnus-check-backend-function - (intern (concat "request-" header)) - gnus-newsgroup-name) - (setq ofunc request-func) - (setq request-func 'gnus-request-article)) + (intern (concat "request-" header)) + gnus-newsgroup-name) + (setq ofunc request-func) + (setq request-func 'gnus-request-article)) (setq article (mail-header-number gnus-advanced-headers)) (gnus-message 7 "Scoring article %s..." article) (when (funcall request-func article gnus-newsgroup-name) - (goto-char (point-min)) - ;; If just parts of the article is to be searched and the - ;; backend didn't support partial fetching, we just narrow to - ;; the relevant parts. - (when ofunc - (if (eq ofunc 'gnus-request-head) - (narrow-to-region - (point) - (or (search-forward "\n\n" nil t) (point-max))) - (narrow-to-region - (or (search-forward "\n\n" nil t) (point)) - (point-max)))) - (let* ((case-fold-search (not (eq (downcase (symbol-name type)) - (symbol-name type)))) - (search-func - (cond ((memq type '(r R regexp Regexp)) - 're-search-forward) - ((memq type '(s S string String)) - 'search-forward) - (t - (error "Invalid match type: %s" type))))) - (goto-char (point-min)) - (prog1 - (funcall search-func match nil t) - (widen))))))) + (when (string= "body" header) + (setq handles (gnus-score-decode-text-parts))) + (goto-char (point-min)) + ;; If just parts of the article is to be searched and the + ;; backend didn't support partial fetching, we just narrow to + ;; the relevant parts. + (when ofunc + (if (eq ofunc 'gnus-request-head) + (narrow-to-region + (point) + (or (search-forward "\n\n" nil t) (point-max))) + (narrow-to-region + (or (search-forward "\n\n" nil t) (point)) + (point-max)))) + (let* ((case-fold-search (not (eq (downcase (symbol-name type)) + (symbol-name type)))) + (search-func + (cond ((memq type '(r R regexp Regexp)) + 're-search-forward) + ((memq type '(s S string String)) + 'search-forward) + (t + (error "Invalid match type: %s" type))))) + (goto-char (point-min)) + (prog1 + (funcall search-func match nil t) + (widen))) + (when handles (mm-destroy-parts handles)))))) (provide 'gnus-logic) diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el index f86b6f837a70ce54b06668187821fe57c3f80f4c..d50b3cc0cf02812ef32f53beaf85cd27171680d4 100644 --- a/lisp/gnus-score.el +++ b/lisp/gnus-score.el @@ -1736,105 +1736,140 @@ score in `gnus-newsgroup-scored' by SCORE." (setq entries rest))))) nil) +(defun gnus-score-decode-text-parts () + (labels ((mm-text-parts (handle) + (cond ((stringp (car handle)) + (let ((parts (mapcan 'mm-text-parts (cdr handle)))) + (if (equal "multipart/alternative" (car handle)) + ;; pick the first supported alternative + (list (car parts)) + parts))) + + ((bufferp (car handle)) + (when (string-match "^text/" (mm-handle-media-type handle)) + (list handle))) + + (t (mapcan 'mm-text-parts handle)))) + (my-mm-display-part (handle) + (when handle + (save-restriction + (narrow-to-region (point) (point)) + (mm-display-inline handle) + (goto-char (point-max)))))) + + (let (;(mm-text-html-renderer 'w3m-standalone) + (handles (mm-dissect-buffer t))) + (save-excursion + (article-goto-body) + (delete-region (point) (point-max)) + (mapc 'my-mm-display-part (mm-text-parts handles)) + handles)))) + (defun gnus-score-body (scores header now expire &optional trace) - (if gnus-agent-fetching - nil - (save-excursion - (setq gnus-scores-articles - (sort gnus-scores-articles - (lambda (a1 a2) - (< (mail-header-number (car a1)) - (mail-header-number (car a2)))))) - (set-buffer nntp-server-buffer) - (save-restriction - (let* ((buffer-read-only nil) - (articles gnus-scores-articles) - (all-scores scores) - (request-func (cond ((string= "head" header) - 'gnus-request-head) - ((string= "body" header) - 'gnus-request-body) - (t 'gnus-request-article))) - entries alist ofunc article last) - (when articles - (setq last (mail-header-number (caar (last articles)))) - ;; Not all backends support partial fetching. In that case, - ;; we just fetch the entire article. - (unless (gnus-check-backend-function - (and (string-match "^gnus-" (symbol-name request-func)) - (intern (substring (symbol-name request-func) - (match-end 0)))) - gnus-newsgroup-name) - (setq ofunc request-func) - (setq request-func 'gnus-request-article)) - (while articles - (setq article (mail-header-number (caar articles))) - (gnus-message 7 "Scoring article %s of %s..." article last) - (widen) - (when (funcall request-func article gnus-newsgroup-name) - (goto-char (point-min)) - ;; If just parts of the article is to be searched, but the - ;; backend didn't support partial fetching, we just narrow - ;; to the relevant parts. - (when ofunc - (if (eq ofunc 'gnus-request-head) - (narrow-to-region - (point) - (or (search-forward "\n\n" nil t) (point-max))) - (narrow-to-region - (or (search-forward "\n\n" nil t) (point)) - (point-max)))) - (setq scores all-scores) - ;; Find matches. - (while scores - (setq alist (pop scores) - entries (assoc header alist)) - (while (cdr entries) ;First entry is the header index. - (let* ((rest (cdr entries)) - (kill (car rest)) - (match (nth 0 kill)) - (type (or (nth 3 kill) 's)) - (score (or (nth 1 kill) - gnus-score-interactive-default-score)) - (date (nth 2 kill)) - (found nil) - (case-fold-search - (not (or (eq type 'R) (eq type 'S) - (eq type 'Regexp) (eq type 'String)))) - (search-func - (cond ((or (eq type 'r) (eq type 'R) - (eq type 'regexp) (eq type 'Regexp)) - 're-search-forward) - ((or (eq type 's) (eq type 'S) - (eq type 'string) (eq type 'String)) - 'search-forward) - (t - (error "Invalid match type: %s" type))))) - (goto-char (point-min)) - (when (funcall search-func match nil t) - ;; Found a match, update scores. - (setcdr (car articles) (+ score (cdar articles))) - (setq found t) - (when trace - (push - (cons (car-safe (rassq alist gnus-score-cache)) - kill) - gnus-score-trace))) - ;; Update expire date - (unless trace - (cond - ((null date)) ;Permanent entry. - ((and found gnus-update-score-entry-dates) - ;; Match, update date. - (gnus-score-set 'touched '(t) alist) - (setcar (nthcdr 2 kill) now)) - ((and expire (< date expire)) ;Old entry, remove. - (gnus-score-set 'touched '(t) alist) - (setcdr entries (cdr rest)) - (setq rest entries)))) - (setq entries rest))))) - (setq articles (cdr articles))))))) - nil)) + (if gnus-agent-fetching + nil + (save-excursion + (setq gnus-scores-articles + (sort gnus-scores-articles + (lambda (a1 a2) + (< (mail-header-number (car a1)) + (mail-header-number (car a2)))))) + (set-buffer nntp-server-buffer) + (save-restriction + (let* ((buffer-read-only nil) + (articles gnus-scores-articles) + (all-scores scores) + (request-func (cond ((string= "head" header) + 'gnus-request-head) + ;; We need to peek at the headers to detect + ;; the content encoding + ((string= "body" header) + 'gnus-request-article) + (t 'gnus-request-article))) + entries alist ofunc article last) + (when articles + (setq last (mail-header-number (caar (last articles)))) + ;; Not all backends support partial fetching. In that case, + ;; we just fetch the entire article. + (unless (gnus-check-backend-function + (and (string-match "^gnus-" (symbol-name request-func)) + (intern (substring (symbol-name request-func) + (match-end 0)))) + gnus-newsgroup-name) + (setq ofunc request-func) + (setq request-func 'gnus-request-article)) + (while articles + (setq article (mail-header-number (caar articles))) + (gnus-message 7 "Scoring article %s of %s..." article last) + (widen) + (let (handles) + (when (funcall request-func article gnus-newsgroup-name) + (when (string= "body" header) + (setq handles (gnus-score-decode-text-parts))) + (goto-char (point-min)) + ;; If just parts of the article is to be searched, but the + ;; backend didn't support partial fetching, we just narrow + ;; to the relevant parts. + (when ofunc + (if (eq ofunc 'gnus-request-head) + (narrow-to-region + (point) + (or (search-forward "\n\n" nil t) (point-max))) + (narrow-to-region + (or (search-forward "\n\n" nil t) (point)) + (point-max)))) + (setq scores all-scores) + ;; Find matches. + (while scores + (setq alist (pop scores) + entries (assoc header alist)) + (while (cdr entries) ;First entry is the header index. + (let* ((rest (cdr entries)) + (kill (car rest)) + (match (nth 0 kill)) + (type (or (nth 3 kill) 's)) + (score (or (nth 1 kill) + gnus-score-interactive-default-score)) + (date (nth 2 kill)) + (found nil) + (case-fold-search + (not (or (eq type 'R) (eq type 'S) + (eq type 'Regexp) (eq type 'String)))) + (search-func + (cond ((or (eq type 'r) (eq type 'R) + (eq type 'regexp) (eq type 'Regexp)) + 're-search-forward) + ((or (eq type 's) (eq type 'S) + (eq type 'string) (eq type 'String)) + 'search-forward) + (t + (error "Invalid match type: %s" type))))) + (goto-char (point-min)) + (when (funcall search-func match nil t) + ;; Found a match, update scores. + (setcdr (car articles) (+ score (cdar articles))) + (setq found t) + (when trace + (push + (cons (car-safe (rassq alist gnus-score-cache)) + kill) + gnus-score-trace))) + ;; Update expire date + (unless trace + (cond + ((null date)) ;Permanent entry. + ((and found gnus-update-score-entry-dates) + ;; Match, update date. + (gnus-score-set 'touched '(t) alist) + (setcar (nthcdr 2 kill) now)) + ((and expire (< date expire)) ;Old entry, remove. + (gnus-score-set 'touched '(t) alist) + (setcdr entries (cdr rest)) + (setq rest entries)))) + (setq entries rest)))) + (when handles (mm-destroy-parts handles)))) + (setq articles (cdr articles))))))) + nil)) (defun gnus-score-thread (scores header now expire &optional trace) (gnus-score-followup scores header now expire trace t)) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-06-28 9:45 ` Jan Tatarik @ 2012-09-05 13:40 ` Lars Ingebrigtsen 2012-09-05 14:39 ` Jan Tatarik 0 siblings, 1 reply; 20+ messages in thread From: Lars Ingebrigtsen @ 2012-09-05 13:40 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: >> I don't remember whether I asked you whether you had FSF copyright >> assignments on file or not? > > I do now. Thanks; I've now applied your patch to Ma Gnus. -- (domestic pets only, the antidote for overdose, milk.) http://lars.ingebrigtsen.no * Sent from my Emacs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-09-05 13:40 ` Lars Ingebrigtsen @ 2012-09-05 14:39 ` Jan Tatarik 2012-09-05 14:43 ` Lars Ingebrigtsen 0 siblings, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-09-05 14:39 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: ding [-- Attachment #1: Type: text/plain, Size: 364 bytes --] On Wed, Sep 05 2012, Lars Ingebrigtsen wrote: > Jan Tatarik <jan.tatarik@gmail.com> writes: >>> I don't remember whether I asked you whether you had FSF copyright >>> assignments on file or not? >> I do now. > Thanks; I've now applied your patch to Ma Gnus. Cool. There's one small issue, however, that started popping up for me with recent Emacs versions: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: proper function quoting --] [-- Type: text/x-diff, Size: 573 bytes --] diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el index 948c1c3b62dcebb3f71b0b97bb4f99ee5f4b00af..bc35cf3dea5ff791c8d53dd32330e584567ea1ee 100644 --- a/lisp/gnus-score.el +++ b/lisp/gnus-score.el @@ -1743,7 +1743,7 @@ score in `gnus-newsgroup-scored' by SCORE." (save-excursion (article-goto-body) (delete-region (point) (point-max)) - (mapc 'my-mm-display-part (mm-text-parts handles)) + (mapc #'my-mm-display-part (mm-text-parts handles)) handles)))) (defun gnus-score-body (scores header now expire &optional trace) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-09-05 14:39 ` Jan Tatarik @ 2012-09-05 14:43 ` Lars Ingebrigtsen 2012-09-05 15:07 ` Jan Tatarik 2012-09-05 15:42 ` Andreas Schwab 0 siblings, 2 replies; 20+ messages in thread From: Lars Ingebrigtsen @ 2012-09-05 14:43 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: > Cool. There's one small issue, however, that started popping up for me > with recent Emacs versions: [...] > - (mapc 'my-mm-display-part (mm-text-parts handles)) > + (mapc #'my-mm-display-part (mm-text-parts handles)) Has #' started to have a meaning in newer Emacsen? It used to ... not mean very much. Anyway, I've applied the patch. -- (domestic pets only, the antidote for overdose, milk.) http://lars.ingebrigtsen.no * Lars Magne Ingebrigtsen ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-09-05 14:43 ` Lars Ingebrigtsen @ 2012-09-05 15:07 ` Jan Tatarik 2012-09-05 15:35 ` Lars Ingebrigtsen 2012-09-05 15:42 ` Andreas Schwab 1 sibling, 1 reply; 20+ messages in thread From: Jan Tatarik @ 2012-09-05 15:07 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: ding On Wed, Sep 05 2012, Lars Ingebrigtsen wrote: > Jan Tatarik <jan.tatarik@gmail.com> writes: >> Cool. There's one small issue, however, that started popping up for me >> with recent Emacs versions: > [...] >> - (mapc 'my-mm-display-part (mm-text-parts handles)) >> + (mapc #'my-mm-display-part (mm-text-parts handles)) > Has #' started to have a meaning in newer Emacsen? It used to ... not > mean very much. I think it's only problem with temporary functions (labels). Then I get mapc: Symbol's function definition is void: my-mm-display-part unless I use #'my-mm-display-part. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-09-05 15:07 ` Jan Tatarik @ 2012-09-05 15:35 ` Lars Ingebrigtsen 0 siblings, 0 replies; 20+ messages in thread From: Lars Ingebrigtsen @ 2012-09-05 15:35 UTC (permalink / raw) To: Jan Tatarik; +Cc: ding Jan Tatarik <jan.tatarik@gmail.com> writes: > I think it's only problem with temporary functions (labels). Then I get > > mapc: Symbol's function definition is void: my-mm-display-part > > unless I use #'my-mm-display-part. Ah, I see. -- (domestic pets only, the antidote for overdose, milk.) http://lars.ingebrigtsen.no * Lars Magne Ingebrigtsen ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body 2012-09-05 14:43 ` Lars Ingebrigtsen 2012-09-05 15:07 ` Jan Tatarik @ 2012-09-05 15:42 ` Andreas Schwab 1 sibling, 0 replies; 20+ messages in thread From: Andreas Schwab @ 2012-09-05 15:42 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Jan Tatarik, ding Lars Ingebrigtsen <larsi@gnus.org> writes: > Has #' started to have a meaning in newer Emacsen? Only for lambda expressions (it allows them to be byte-compiled). Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-09-05 15:42 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-13 10:39 Scoring on basee64 encoded message body Jan Tatarik 2012-03-14 14:38 ` Lars Magne Ingebrigtsen 2012-03-14 20:21 ` Reiner Steib 2012-03-15 1:29 ` Lars Magne Ingebrigtsen 2012-03-15 21:05 ` Jan Tatarik 2012-03-22 20:38 ` Lars Magne Ingebrigtsen 2012-03-23 12:11 ` Jan Tatarik 2012-04-10 19:32 ` Lars Magne Ingebrigtsen 2012-04-11 7:30 ` Jan Tatarik 2012-04-11 19:34 ` Jan Tatarik 2012-04-12 18:45 ` Lars Magne Ingebrigtsen 2012-04-12 22:58 ` Jan Tatarik 2012-06-10 21:08 ` Lars Magne Ingebrigtsen 2012-06-28 9:45 ` Jan Tatarik 2012-09-05 13:40 ` Lars Ingebrigtsen 2012-09-05 14:39 ` Jan Tatarik 2012-09-05 14:43 ` Lars Ingebrigtsen 2012-09-05 15:07 ` Jan Tatarik 2012-09-05 15:35 ` Lars Ingebrigtsen 2012-09-05 15:42 ` Andreas Schwab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).