Gnus development mailing list
 help / color / mirror / Atom feed
* Scoring on basee64 encoded message body
@ 2012-03-13 10:39 Jan Tatarik
  2012-03-14 14:38 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-03-13 10:39 UTC (permalink / raw)
  To: ding


[-- Attachment #1.1: Type: text/plain, Size: 410 bytes --]

Hi,

I just spent some time trying to get scoring by message body work in one of
my nnimap groups.

I finally realized the content of the messages is base64 encoded, so
matching on the raw body cannot work.

The attached patch fixes the problem for me, but I have no idea whether
it's a generally acceptable solution. I'm only using the body match in a
low-traffic group, so speed is not an issue for me.

Jan

[-- Attachment #1.2: Type: text/html, Size: 534 bytes --]

[-- Attachment #2: score-body-base64-encoded.diff --]
[-- Type: application/octet-stream, Size: 1142 bytes --]

diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..85a1f83466a246c26be5f8b05263a045c7928f0e 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -204,6 +204,8 @@
 	      (narrow-to-region
 	       (point)
 	       (or (search-forward "\n\n" nil t) (point-max)))
+        (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
+          (article-de-base64-unreadable t))
 	    (narrow-to-region
 	     (or (search-forward "\n\n" nil t) (point))
 	     (point-max))))
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..231c942bb34c97d66563ca7a399c9808b1f66663 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1781,6 +1781,8 @@ score in `gnus-newsgroup-scored' by SCORE."
 		      (narrow-to-region
 		       (point)
 		       (or (search-forward "\n\n" nil t) (point-max)))
+            (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
+              (article-de-base64-unreadable t))
 		    (narrow-to-region
 		     (or (search-forward "\n\n" nil t) (point))
 		     (point-max))))

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-03-13 10:39 Scoring on basee64 encoded message body Jan Tatarik
@ 2012-03-14 14:38 ` Lars Magne Ingebrigtsen
  2012-03-14 20:21   ` Reiner Steib
  2012-03-15 21:05   ` Jan Tatarik
  0 siblings, 2 replies; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-14 14:38 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

> I finally realized the content of the messages is base64 encoded, so
> matching on the raw body cannot work.
>
> The attached patch fixes the problem for me, but I have no idea
> whether it's a generally acceptable solution. I'm only using the body
> match in a low-traffic group, so speed is not an issue for me.

[...]

> +            (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
> +              (article-de-base64-unreadable t))

This isn't a general enough solution here.  QP-encoded messages also
want decoding.

But the more general issue is -- should scoring on bodies be done on the
decoded messages or the encoded messages?  I think it would make more
sense to do it on decoded messages, and since these are body matches,
speed don't really matter that much, because body matches are s-l-o-w
anyway.

Scoring on headers are done on the decoded headers, right?  So it would
be more consistent to decode the bodies, too.  Wouldn't it?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-03-14 14:38 ` Lars Magne Ingebrigtsen
@ 2012-03-14 20:21   ` Reiner Steib
  2012-03-15  1:29     ` Lars Magne Ingebrigtsen
  2012-03-15 21:05   ` Jan Tatarik
  1 sibling, 1 reply; 20+ messages in thread
From: Reiner Steib @ 2012-03-14 20:21 UTC (permalink / raw)
  To: ding

On Wed, Mar 14 2012, Lars Magne Ingebrigtsen wrote:

> But the more general issue is -- should scoring on bodies be done on the
> decoded messages or the encoded messages?  I think it would make more
> sense to do it on decoded messages, 

Sure.

I remember that we had this problem WRT mail splitting.  And I'm a
little surprised that `nnmail-mail-splitting-decodes' is nil by
default.  Probably because users might filter spam on "strange"
charsets.

> Scoring on headers are done on the decoded headers, right?  So it would
> be more consistent to decode the bodies, too.  Wouldn't it?

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-03-14 20:21   ` Reiner Steib
@ 2012-03-15  1:29     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-15  1:29 UTC (permalink / raw)
  To: ding

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> I remember that we had this problem WRT mail splitting.  And I'm a
> little surprised that `nnmail-mail-splitting-decodes' is nil by
> default.  Probably because users might filter spam on "strange"
> charsets.

I do that -- for instance to filter all the Russian spam into the spam
group.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-03-14 14:38 ` Lars Magne Ingebrigtsen
  2012-03-14 20:21   ` Reiner Steib
@ 2012-03-15 21:05   ` Jan Tatarik
  2012-03-22 20:38     ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-03-15 21:05 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]

On Wed, Mar 14 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:

> Jan Tatarik <jan.tatarik@gmail.com> writes:

>> I finally realized the content of the messages is base64 encoded, so
>> matching on the raw body cannot work.

>> The attached patch fixes the problem for me, but I have no idea
>> whether it's a generally acceptable solution. I'm only using the body
>> match in a low-traffic group, so speed is not an issue for me.

> [...]

>> +            (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
>> +              (article-de-base64-unreadable t))

> This isn't a general enough solution here.  QP-encoded messages also
> want decoding.

> But the more general issue is -- should scoring on bodies be done on the
> decoded messages or the encoded messages?  I think it would make more
> sense to do it on decoded messages, and since these are body matches,
> speed don't really matter that much, because body matches are s-l-o-w
> anyway.

This better?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode-message-before-scoring-on-body.diff --]
[-- Type: text/x-diff, Size: 3182 bytes --]

diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..1b4fc22bc11ee0d599fb40a0adef53588b4f9ca4 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
   (with-current-buffer nntp-server-buffer
     (let* ((request-func (cond ((string= "head" header)
 				'gnus-request-head)
+                               ;; We need to peek at the headers to detect the
+                               ;; content encoding
 			       ((string= "body" header)
-				'gnus-request-body)
+                                'gnus-request-article)
 			       (t 'gnus-request-article)))
 	   ofunc article)
       ;; Not all backends support partial fetching.  In that case, we
@@ -196,6 +198,14 @@
       (gnus-message 7 "Scoring article %s..." article)
       (when (funcall request-func article gnus-newsgroup-name)
 	(goto-char (point-min))
+        ;; Searching base64/qp-encoded message body produces more
+        ;; satisfactory results if we decode the message first
+        (unless (or (eq ofunc 'gnus-request-head)
+                    (eq request-func 'gnus-request-head))
+          (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+            (cond
+             ((string= "base64" encoding) (article-de-base64-unreadable t))
+             ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
 	;; If just parts of the article is to be searched and the
 	;; backend didn't support partial fetching, we just narrow to
 	;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..776194a31c6702441d3bae74c8f6048778270e67 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE."
 	       (all-scores scores)
 	       (request-func (cond ((string= "head" header)
 				    'gnus-request-head)
+                                   ;; We need to peek at the headers to detect
+                                   ;; the content encoding
 				   ((string= "body" header)
-				    'gnus-request-body)
+                                    'gnus-request-article)
 				   (t 'gnus-request-article)))
 	       entries alist ofunc article last)
 	  (when articles
@@ -1773,6 +1775,14 @@ score in `gnus-newsgroup-scored' by SCORE."
 	      (widen)
 	      (when (funcall request-func article gnus-newsgroup-name)
 		(goto-char (point-min))
+                ;; Searching base64/qp-encoded message body produces more
+                ;; satisfactory results if we decode the message first
+                (unless (or (eq ofunc 'gnus-request-head)
+                            (eq request-func 'gnus-request-head))
+                  (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+                    (cond
+                     ((string= "base64" encoding) (article-de-base64-unreadable t))
+                     ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
 	    ;; If just parts of the article is to be searched, but the
 	    ;; backend didn't support partial fetching, we just narrow
 		;; to the relevant parts.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-03-15 21:05   ` Jan Tatarik
@ 2012-03-22 20:38     ` Lars Magne Ingebrigtsen
  2012-03-23 12:11       ` Jan Tatarik
  0 siblings, 1 reply; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-22 20:38 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

> This better?

Yes, that looks better, but it should probably just call
`mm-decode-content-transfer-encoding' instead, I think?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-03-22 20:38     ` Lars Magne Ingebrigtsen
@ 2012-03-23 12:11       ` Jan Tatarik
  2012-04-10 19:32         ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-03-23 12:11 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 271 bytes --]

On Thu, Mar 22 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:

> Jan Tatarik <jan.tatarik@gmail.com> writes:

>> This better?

> Yes, that looks better, but it should probably just call
> `mm-decode-content-transfer-encoding' instead, I think?

Like this?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode message body before scoring on it --]
[-- Type: text/x-diff, Size: 3608 bytes --]

diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..9216f5699ce1ed9a8c39dd03257a17885f6e8490 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
   (with-current-buffer nntp-server-buffer
     (let* ((request-func (cond ((string= "head" header)
 				'gnus-request-head)
+                               ;; We need to peek at the headers to detect the
+                               ;; content encoding
 			       ((string= "body" header)
-				'gnus-request-body)
+                                'gnus-request-article)
 			       (t 'gnus-request-article)))
 	   ofunc article)
       ;; Not all backends support partial fetching.  In that case, we
@@ -196,6 +198,20 @@
       (gnus-message 7 "Scoring article %s..." article)
       (when (funcall request-func article gnus-newsgroup-name)
 	(goto-char (point-min))
+        ;; Searching base64/qp-encoded message body produces more
+        ;; satisfactory results if we decode the message first
+        (unless (or (eq ofunc 'gnus-request-head)
+                    (eq request-func 'gnus-request-head))
+          (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+            (when encoding
+              (save-excursion
+                (save-restriction
+                  ;; narrow to body
+                  (narrow-to-region
+                   (or (search-forward "\n\n" nil t) (point))
+                   (point-max))
+                  (mm-decode-content-transfer-encoding
+                   (intern (downcase encoding))))))))
 	;; If just parts of the article is to be searched and the
 	;; backend didn't support partial fetching, we just narrow to
 	;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..322aed78fa374b873fb8604482adead77362c5be 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE."
 	       (all-scores scores)
 	       (request-func (cond ((string= "head" header)
 				    'gnus-request-head)
+                                   ;; We need to peek at the headers to detect
+                                   ;; the content encoding
 				   ((string= "body" header)
-				    'gnus-request-body)
+                                    'gnus-request-article)
 				   (t 'gnus-request-article)))
 	       entries alist ofunc article last)
 	  (when articles
@@ -1773,6 +1775,20 @@ score in `gnus-newsgroup-scored' by SCORE."
 	      (widen)
 	      (when (funcall request-func article gnus-newsgroup-name)
 		(goto-char (point-min))
+                ;; Searching base64/qp-encoded message body produces more
+                ;; satisfactory results if we decode the message first
+                (unless (or (eq ofunc 'gnus-request-head)
+                            (eq request-func 'gnus-request-head))
+                  (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+                    (when encoding
+                      (save-excursion
+                        (save-restriction
+                          ;; narrow to body
+                          (narrow-to-region
+                           (or (search-forward "\n\n" nil t) (point))
+                           (point-max))
+                          (mm-decode-content-transfer-encoding
+                           (intern (downcase encoding))))))))
 	    ;; If just parts of the article is to be searched, but the
 	    ;; backend didn't support partial fetching, we just narrow
 		;; to the relevant parts.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-03-23 12:11       ` Jan Tatarik
@ 2012-04-10 19:32         ` Lars Magne Ingebrigtsen
  2012-04-11  7:30           ` Jan Tatarik
  2012-04-11 19:34           ` Jan Tatarik
  0 siblings, 2 replies; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-04-10 19:32 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

>> Yes, that looks better, but it should probably just call
>> `mm-decode-content-transfer-encoding' instead, I think?
>
> Like this?

Yes.  And I almost applied it, but then I remembered that this
wouldn't, in general, really give us what we wanted, anyway.  :-/

That is, a lot of messages are MIME multipart messages.  If we start
decoding, we really should decode all the parts, and score on the
decoded text of them all, I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-04-10 19:32         ` Lars Magne Ingebrigtsen
@ 2012-04-11  7:30           ` Jan Tatarik
  2012-04-11 19:34           ` Jan Tatarik
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Tatarik @ 2012-04-11  7:30 UTC (permalink / raw)
  To: ding

On Tue, Apr 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:

>> Like this?

> Yes.  And I almost applied it, but then I remembered that this
> wouldn't, in general, really give us what we wanted, anyway.  :-/

> That is, a lot of messages are MIME multipart messages.  If we start
> decoding, we really should decode all the parts, and score on the
> decoded text of them all, I think.

Any other scenario I should be aware of, before I send the next patch?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-04-10 19:32         ` Lars Magne Ingebrigtsen
  2012-04-11  7:30           ` Jan Tatarik
@ 2012-04-11 19:34           ` Jan Tatarik
  2012-04-12 18:45             ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-04-11 19:34 UTC (permalink / raw)
  To: ding

On Tue, Apr 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:

> Yes.  And I almost applied it, but then I remembered that this
> wouldn't, in general, really give us what we wanted, anyway.  :-/

> That is, a lot of messages are MIME multipart messages.  If we start
> decoding, we really should decode all the parts, and score on the
> decoded text of them all, I think.

All text/* parts, I assume?

I had a quick look at mm handling in gnus. My idea would be to

- run mm-dissect-buffer on the message body (any idea which args would
  be appropriate?)

- for multipart messages, pick the handles with text/* type, run them
  through their respective mm-inline-* function as defined in
  mm-inline-media-tests

- score on all the decoded text parts

Is this the way to go?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-04-11 19:34           ` Jan Tatarik
@ 2012-04-12 18:45             ` Lars Magne Ingebrigtsen
  2012-04-12 22:58               ` Jan Tatarik
  0 siblings, 1 reply; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-04-12 18:45 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

> All text/* parts, I assume?

It's not uncommon to stash texts bits in application/octet-stream parts,
too, but I guess limiting this to text/* would be sensible...

> I had a quick look at mm handling in gnus. My idea would be to
>
> - run mm-dissect-buffer on the message body (any idea which args would
>   be appropriate?)

NO-STRICT-MIME, I think.

> - for multipart messages, pick the handles with text/* type, run them
>   through their respective mm-inline-* function as defined in
>   mm-inline-media-tests
>
> - score on all the decoded text parts
>
> Is this the way to go?

Yup; sounds good.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-04-12 18:45             ` Lars Magne Ingebrigtsen
@ 2012-04-12 22:58               ` Jan Tatarik
  2012-06-10 21:08                 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-04-12 22:58 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 486 bytes --]

On Thu, Apr 12 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:

>> - run mm-dissect-buffer on the message body (any idea which args would
>>   be appropriate?)

> NO-STRICT-MIME, I think.

>> - for multipart messages, pick the handles with text/* type, run them
>>   through their respective mm-inline-* function as defined in
>>   mm-inline-media-tests

>> - score on all the decoded text parts

>> Is this the way to go?

> Yup; sounds good.

And here is the new patch.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode mm messages when scoring on body --]
[-- Type: text/x-diff, Size: 3002 bytes --]

diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..38442a406dd6fb6ef47cd468248618e68337a26a 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
   (with-current-buffer nntp-server-buffer
     (let* ((request-func (cond ((string= "head" header)
 				'gnus-request-head)
+                               ;; We need to peek at the headers to detect the
+                               ;; content encoding
 			       ((string= "body" header)
-				'gnus-request-body)
+                    'gnus-request-article)
 			       (t 'gnus-request-article)))
 	   ofunc article)
       ;; Not all backends support partial fetching.  In that case, we
@@ -196,6 +198,7 @@
       (gnus-message 7 "Scoring article %s..." article)
       (when (funcall request-func article gnus-newsgroup-name)
 	(goto-char (point-min))
+    (gnus-score-decode-text-parts)
 	;; If just parts of the article is to be searched and the
 	;; backend didn't support partial fetching, we just narrow to
 	;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..003355dd2c91847241dc67263f83d26ae52920de 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1736,6 +1736,24 @@ score in `gnus-newsgroup-scored' by SCORE."
 	  (setq entries rest)))))
   nil)
 
+(defun gnus-score-decode-text-parts ()
+  (let ((handles (mm-dissect-buffer t)))
+    (cond ((stringp (car handles)) (pop handles))
+          ((and (bufferp (car handles))
+                (stringp (car (mm-handle-type handles))))
+           (setq handles (list handles))))
+
+    (save-excursion
+      (article-goto-body)
+      (delete-region (point) (point-max))
+      (save-restriction
+        (narrow-to-region (point) (point))
+        (mapc #'mm-display-inline
+              (remove-if-not
+               (lambda (handle)
+                 (string-match "^text/" (mm-handle-media-type handle)))
+               handles))))))
+
 (defun gnus-score-body (scores header now expire &optional trace)
   (if gnus-agent-fetching
       nil
@@ -1752,8 +1770,10 @@ score in `gnus-newsgroup-scored' by SCORE."
 	       (all-scores scores)
 	       (request-func (cond ((string= "head" header)
 				    'gnus-request-head)
+                               ;; We need to peek at the headers to detect
+                               ;; the content encoding
 				   ((string= "body" header)
-				    'gnus-request-body)
+                    'gnus-request-article)
 				   (t 'gnus-request-article)))
 	       entries alist ofunc article last)
 	  (when articles
@@ -1773,6 +1793,7 @@ score in `gnus-newsgroup-scored' by SCORE."
 	      (widen)
 	      (when (funcall request-func article gnus-newsgroup-name)
 		(goto-char (point-min))
+        (gnus-score-decode-text-parts)
 	    ;; If just parts of the article is to be searched, but the
 	    ;; backend didn't support partial fetching, we just narrow
 		;; to the relevant parts.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-04-12 22:58               ` Jan Tatarik
@ 2012-06-10 21:08                 ` Lars Magne Ingebrigtsen
  2012-06-28  9:45                   ` Jan Tatarik
  0 siblings, 1 reply; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-06-10 21:08 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

> And here is the new patch.

Sorry for the extreme delays here.

The patch looks good.

I don't remember whether I asked you whether you had FSF copyright
assignments on file or not?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-06-10 21:08                 ` Lars Magne Ingebrigtsen
@ 2012-06-28  9:45                   ` Jan Tatarik
  2012-09-05 13:40                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-06-28  9:45 UTC (permalink / raw)
  To: ding; +Cc: Lars Magne Ingebrigtsen

[-- Attachment #1: Type: text/plain, Size: 329 bytes --]

On Sun, Jun 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:

> Jan Tatarik <jan.tatarik@gmail.com> writes:

>> And here is the new patch.

> Sorry for the extreme delays here.

> The patch looks good.

> I don't remember whether I asked you whether you had FSF copyright
> assignments on file or not?

I do now.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: score by body --]
[-- Type: text/x-diff, Size: 14559 bytes --]

diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..a440b7799303ddf46674c35c20f5bba51388b50c 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -180,46 +180,51 @@
     (setq header "article"))
   (with-current-buffer nntp-server-buffer
     (let* ((request-func (cond ((string= "head" header)
-				'gnus-request-head)
-			       ((string= "body" header)
-				'gnus-request-body)
-			       (t 'gnus-request-article)))
-	   ofunc article)
+                                'gnus-request-head)
+                               ;; We need to peek at the headers to detect the
+                               ;; content encoding
+                               ((string= "body" header)
+                                'gnus-request-article)
+                               (t 'gnus-request-article)))
+           ofunc article handles)
       ;; Not all backends support partial fetching.  In that case, we
       ;; just fetch the entire article.
       (unless (gnus-check-backend-function
-	       (intern (concat "request-" header))
-	       gnus-newsgroup-name)
-	(setq ofunc request-func)
-	(setq request-func 'gnus-request-article))
+               (intern (concat "request-" header))
+               gnus-newsgroup-name)
+        (setq ofunc request-func)
+        (setq request-func 'gnus-request-article))
       (setq article (mail-header-number gnus-advanced-headers))
       (gnus-message 7 "Scoring article %s..." article)
       (when (funcall request-func article gnus-newsgroup-name)
-	(goto-char (point-min))
-	;; If just parts of the article is to be searched and the
-	;; backend didn't support partial fetching, we just narrow to
-	;; the relevant parts.
-	(when ofunc
-	  (if (eq ofunc 'gnus-request-head)
-	      (narrow-to-region
-	       (point)
-	       (or (search-forward "\n\n" nil t) (point-max)))
-	    (narrow-to-region
-	     (or (search-forward "\n\n" nil t) (point))
-	     (point-max))))
-	(let* ((case-fold-search (not (eq (downcase (symbol-name type))
-					  (symbol-name type))))
-	       (search-func
-		(cond ((memq type '(r R regexp Regexp))
-		       're-search-forward)
-		      ((memq type '(s S string String))
-		       'search-forward)
-		      (t
-		       (error "Invalid match type: %s" type)))))
-	  (goto-char (point-min))
-	  (prog1
-	      (funcall search-func match nil t)
-	    (widen)))))))
+        (when (string= "body" header)
+          (setq handles (gnus-score-decode-text-parts)))
+        (goto-char (point-min))
+        ;; If just parts of the article is to be searched and the
+        ;; backend didn't support partial fetching, we just narrow to
+        ;; the relevant parts.
+        (when ofunc
+          (if (eq ofunc 'gnus-request-head)
+              (narrow-to-region
+               (point)
+               (or (search-forward "\n\n" nil t) (point-max)))
+            (narrow-to-region
+             (or (search-forward "\n\n" nil t) (point))
+             (point-max))))
+        (let* ((case-fold-search (not (eq (downcase (symbol-name type))
+                                          (symbol-name type))))
+               (search-func
+                (cond ((memq type '(r R regexp Regexp))
+                       're-search-forward)
+                      ((memq type '(s S string String))
+                       'search-forward)
+                      (t
+                       (error "Invalid match type: %s" type)))))
+          (goto-char (point-min))
+          (prog1
+              (funcall search-func match nil t)
+            (widen)))
+        (when handles (mm-destroy-parts handles))))))
 
 (provide 'gnus-logic)
 
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..d50b3cc0cf02812ef32f53beaf85cd27171680d4 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1736,105 +1736,140 @@ score in `gnus-newsgroup-scored' by SCORE."
 	  (setq entries rest)))))
   nil)
 
+(defun gnus-score-decode-text-parts ()
+  (labels ((mm-text-parts (handle)
+                        (cond ((stringp (car handle))
+                               (let ((parts (mapcan 'mm-text-parts (cdr handle))))
+                                 (if (equal "multipart/alternative" (car handle))
+                                     ;; pick the first supported alternative
+                                     (list (car parts))
+                                   parts)))
+
+                              ((bufferp (car handle))
+                               (when (string-match "^text/" (mm-handle-media-type handle))
+                                 (list handle)))
+
+                              (t (mapcan 'mm-text-parts handle))))
+           (my-mm-display-part (handle)
+                               (when handle
+                                 (save-restriction
+                                   (narrow-to-region (point) (point))
+                                   (mm-display-inline handle)
+                                   (goto-char (point-max))))))
+
+    (let (;(mm-text-html-renderer 'w3m-standalone)
+          (handles (mm-dissect-buffer t)))
+      (save-excursion
+        (article-goto-body)
+        (delete-region (point) (point-max))
+        (mapc 'my-mm-display-part (mm-text-parts handles))
+        handles))))
+
 (defun gnus-score-body (scores header now expire &optional trace)
-  (if gnus-agent-fetching
-      nil
-    (save-excursion
-      (setq gnus-scores-articles
-	    (sort gnus-scores-articles
-		  (lambda (a1 a2)
-		    (< (mail-header-number (car a1))
-		       (mail-header-number (car a2))))))
-      (set-buffer nntp-server-buffer)
-      (save-restriction
-	(let* ((buffer-read-only nil)
-	       (articles gnus-scores-articles)
-	       (all-scores scores)
-	       (request-func (cond ((string= "head" header)
-				    'gnus-request-head)
-				   ((string= "body" header)
-				    'gnus-request-body)
-				   (t 'gnus-request-article)))
-	       entries alist ofunc article last)
-	  (when articles
-	    (setq last (mail-header-number (caar (last articles))))
-	  ;; Not all backends support partial fetching.  In that case,
-	    ;; we just fetch the entire article.
-	    (unless (gnus-check-backend-function
-		     (and (string-match "^gnus-" (symbol-name request-func))
-			  (intern (substring (symbol-name request-func)
-					     (match-end 0))))
-		     gnus-newsgroup-name)
-	      (setq ofunc request-func)
-	      (setq request-func 'gnus-request-article))
-	    (while articles
-	      (setq article (mail-header-number (caar articles)))
-	      (gnus-message 7 "Scoring article %s of %s..." article last)
-	      (widen)
-	      (when (funcall request-func article gnus-newsgroup-name)
-		(goto-char (point-min))
-	    ;; If just parts of the article is to be searched, but the
-	    ;; backend didn't support partial fetching, we just narrow
-		;; to the relevant parts.
-		(when ofunc
-		  (if (eq ofunc 'gnus-request-head)
-		      (narrow-to-region
-		       (point)
-		       (or (search-forward "\n\n" nil t) (point-max)))
-		    (narrow-to-region
-		     (or (search-forward "\n\n" nil t) (point))
-		     (point-max))))
-		(setq scores all-scores)
-		;; Find matches.
-		(while scores
-		  (setq alist (pop scores)
-			entries (assoc header alist))
-		  (while (cdr entries) ;First entry is the header index.
-		    (let* ((rest (cdr entries))
-			   (kill (car rest))
-			   (match (nth 0 kill))
-			   (type (or (nth 3 kill) 's))
-			   (score (or (nth 1 kill)
-				      gnus-score-interactive-default-score))
-			   (date (nth 2 kill))
-			   (found nil)
-			   (case-fold-search
-			    (not (or (eq type 'R) (eq type 'S)
-				     (eq type 'Regexp) (eq type 'String))))
-			   (search-func
-			    (cond ((or (eq type 'r) (eq type 'R)
-				       (eq type 'regexp) (eq type 'Regexp))
-				   're-search-forward)
-				  ((or (eq type 's) (eq type 'S)
-				       (eq type 'string) (eq type 'String))
-				   'search-forward)
-				  (t
-				   (error "Invalid match type: %s" type)))))
-		      (goto-char (point-min))
-		      (when (funcall search-func match nil t)
-			;; Found a match, update scores.
-			(setcdr (car articles) (+ score (cdar articles)))
-			(setq found t)
-			(when trace
-			  (push
-			   (cons (car-safe (rassq alist gnus-score-cache))
-				 kill)
-			   gnus-score-trace)))
-		      ;; Update expire date
-		      (unless trace
-			(cond
-			 ((null date))	;Permanent entry.
-			 ((and found gnus-update-score-entry-dates)
-			  ;; Match, update date.
-			  (gnus-score-set 'touched '(t) alist)
-			  (setcar (nthcdr 2 kill) now))
-			 ((and expire (< date expire)) ;Old entry, remove.
-			  (gnus-score-set 'touched '(t) alist)
-			  (setcdr entries (cdr rest))
-			  (setq rest entries))))
-		      (setq entries rest)))))
-	      (setq articles (cdr articles)))))))
-    nil))
+    (if gnus-agent-fetching
+       nil
+     (save-excursion
+       (setq gnus-scores-articles
+             (sort gnus-scores-articles
+                   (lambda (a1 a2)
+                     (< (mail-header-number (car a1))
+                        (mail-header-number (car a2))))))
+       (set-buffer nntp-server-buffer)
+       (save-restriction
+         (let* ((buffer-read-only nil)
+                (articles gnus-scores-articles)
+                (all-scores scores)
+                (request-func (cond ((string= "head" header)
+                                     'gnus-request-head)
+                                    ;; We need to peek at the headers to detect
+                                    ;; the content encoding
+                                    ((string= "body" header)
+                                     'gnus-request-article)
+                                    (t 'gnus-request-article)))
+                entries alist ofunc article last)
+           (when articles
+             (setq last (mail-header-number (caar (last articles))))
+             ;; Not all backends support partial fetching.  In that case,
+             ;; we just fetch the entire article.
+             (unless (gnus-check-backend-function
+                      (and (string-match "^gnus-" (symbol-name request-func))
+                           (intern (substring (symbol-name request-func)
+                                              (match-end 0))))
+                      gnus-newsgroup-name)
+               (setq ofunc request-func)
+               (setq request-func 'gnus-request-article))
+             (while articles
+               (setq article (mail-header-number (caar articles)))
+               (gnus-message 7 "Scoring article %s of %s..." article last)
+               (widen)
+               (let (handles)
+                 (when (funcall request-func article gnus-newsgroup-name)
+                  (when (string= "body" header)
+                    (setq handles (gnus-score-decode-text-parts)))
+                  (goto-char (point-min))
+                  ;; If just parts of the article is to be searched, but the
+                  ;; backend didn't support partial fetching, we just narrow
+                  ;; to the relevant parts.
+                  (when ofunc
+                    (if (eq ofunc 'gnus-request-head)
+                        (narrow-to-region
+                         (point)
+                         (or (search-forward "\n\n" nil t) (point-max)))
+                      (narrow-to-region
+                       (or (search-forward "\n\n" nil t) (point))
+                       (point-max))))
+                  (setq scores all-scores)
+                  ;; Find matches.
+                  (while scores
+                    (setq alist (pop scores)
+                          entries (assoc header alist))
+                    (while (cdr entries) ;First entry is the header index.
+                      (let* ((rest (cdr entries))
+                             (kill (car rest))
+                             (match (nth 0 kill))
+                             (type (or (nth 3 kill) 's))
+                             (score (or (nth 1 kill)
+                                        gnus-score-interactive-default-score))
+                             (date (nth 2 kill))
+                             (found nil)
+                             (case-fold-search
+                              (not (or (eq type 'R) (eq type 'S)
+                                       (eq type 'Regexp) (eq type 'String))))
+                             (search-func
+                              (cond ((or (eq type 'r) (eq type 'R)
+                                         (eq type 'regexp) (eq type 'Regexp))
+                                     're-search-forward)
+                                    ((or (eq type 's) (eq type 'S)
+                                         (eq type 'string) (eq type 'String))
+                                     'search-forward)
+                                    (t
+                                     (error "Invalid match type: %s" type)))))
+                        (goto-char (point-min))
+                        (when (funcall search-func match nil t)
+                          ;; Found a match, update scores.
+                          (setcdr (car articles) (+ score (cdar articles)))
+                          (setq found t)
+                          (when trace
+                            (push
+                             (cons (car-safe (rassq alist gnus-score-cache))
+                                   kill)
+                             gnus-score-trace)))
+                        ;; Update expire date
+                        (unless trace
+                          (cond
+                           ((null date)) ;Permanent entry.
+                           ((and found gnus-update-score-entry-dates)
+                            ;; Match, update date.
+                            (gnus-score-set 'touched '(t) alist)
+                            (setcar (nthcdr 2 kill) now))
+                           ((and expire (< date expire)) ;Old entry, remove.
+                            (gnus-score-set 'touched '(t) alist)
+                            (setcdr entries (cdr rest))
+                            (setq rest entries))))
+                        (setq entries rest))))
+                  (when handles (mm-destroy-parts handles))))
+               (setq articles (cdr articles)))))))
+     nil))
 
 (defun gnus-score-thread (scores header now expire &optional trace)
   (gnus-score-followup scores header now expire trace t))

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-06-28  9:45                   ` Jan Tatarik
@ 2012-09-05 13:40                     ` Lars Ingebrigtsen
  2012-09-05 14:39                       ` Jan Tatarik
  0 siblings, 1 reply; 20+ messages in thread
From: Lars Ingebrigtsen @ 2012-09-05 13:40 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

>> I don't remember whether I asked you whether you had FSF copyright
>> assignments on file or not?
>
> I do now.

Thanks; I've now applied your patch to Ma Gnus.

-- 
(domestic pets only, the antidote for overdose, milk.)
  http://lars.ingebrigtsen.no  *  Sent from my Emacs



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-09-05 13:40                     ` Lars Ingebrigtsen
@ 2012-09-05 14:39                       ` Jan Tatarik
  2012-09-05 14:43                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-09-05 14:39 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: ding

[-- Attachment #1: Type: text/plain, Size: 364 bytes --]

On Wed, Sep 05 2012, Lars Ingebrigtsen wrote:

> Jan Tatarik <jan.tatarik@gmail.com> writes:

>>> I don't remember whether I asked you whether you had FSF copyright
>>> assignments on file or not?

>> I do now.

> Thanks; I've now applied your patch to Ma Gnus.

Cool. There's one small issue, however, that started popping up for me
with recent Emacs versions:



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: proper function quoting --]
[-- Type: text/x-diff, Size: 573 bytes --]

diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index 948c1c3b62dcebb3f71b0b97bb4f99ee5f4b00af..bc35cf3dea5ff791c8d53dd32330e584567ea1ee 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1743,7 +1743,7 @@ score in `gnus-newsgroup-scored' by SCORE."
       (save-excursion
         (article-goto-body)
         (delete-region (point) (point-max))
-        (mapc 'my-mm-display-part (mm-text-parts handles))
+        (mapc #'my-mm-display-part (mm-text-parts handles))
         handles))))
 
 (defun gnus-score-body (scores header now expire &optional trace)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-09-05 14:39                       ` Jan Tatarik
@ 2012-09-05 14:43                         ` Lars Ingebrigtsen
  2012-09-05 15:07                           ` Jan Tatarik
  2012-09-05 15:42                           ` Andreas Schwab
  0 siblings, 2 replies; 20+ messages in thread
From: Lars Ingebrigtsen @ 2012-09-05 14:43 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

> Cool. There's one small issue, however, that started popping up for me
> with recent Emacs versions:

[...]

> -        (mapc 'my-mm-display-part (mm-text-parts handles))
> +        (mapc #'my-mm-display-part (mm-text-parts handles))

Has #' started to have a meaning in newer Emacsen?  It used to ... not
mean very much.

Anyway, I've applied the patch.

-- 
(domestic pets only, the antidote for overdose, milk.)
  http://lars.ingebrigtsen.no  *  Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-09-05 14:43                         ` Lars Ingebrigtsen
@ 2012-09-05 15:07                           ` Jan Tatarik
  2012-09-05 15:35                             ` Lars Ingebrigtsen
  2012-09-05 15:42                           ` Andreas Schwab
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-09-05 15:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: ding

On Wed, Sep 05 2012, Lars Ingebrigtsen wrote:

> Jan Tatarik <jan.tatarik@gmail.com> writes:

>> Cool. There's one small issue, however, that started popping up for me
>> with recent Emacs versions:

> [...]

>> -        (mapc 'my-mm-display-part (mm-text-parts handles))
>> +        (mapc #'my-mm-display-part (mm-text-parts handles))

> Has #' started to have a meaning in newer Emacsen?  It used to ... not
> mean very much.

I think it's only problem with temporary functions (labels). Then I get

   mapc: Symbol's function definition is void: my-mm-display-part

unless I use #'my-mm-display-part.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-09-05 15:07                           ` Jan Tatarik
@ 2012-09-05 15:35                             ` Lars Ingebrigtsen
  0 siblings, 0 replies; 20+ messages in thread
From: Lars Ingebrigtsen @ 2012-09-05 15:35 UTC (permalink / raw)
  To: Jan Tatarik; +Cc: ding

Jan Tatarik <jan.tatarik@gmail.com> writes:

> I think it's only problem with temporary functions (labels). Then I get
>
>    mapc: Symbol's function definition is void: my-mm-display-part
>
> unless I use #'my-mm-display-part.

Ah, I see.

-- 
(domestic pets only, the antidote for overdose, milk.)
  http://lars.ingebrigtsen.no  *  Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Scoring on basee64 encoded message body
  2012-09-05 14:43                         ` Lars Ingebrigtsen
  2012-09-05 15:07                           ` Jan Tatarik
@ 2012-09-05 15:42                           ` Andreas Schwab
  1 sibling, 0 replies; 20+ messages in thread
From: Andreas Schwab @ 2012-09-05 15:42 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Jan Tatarik, ding

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Has #' started to have a meaning in newer Emacsen?

Only for lambda expressions (it allows them to be byte-compiled).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-09-05 15:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13 10:39 Scoring on basee64 encoded message body Jan Tatarik
2012-03-14 14:38 ` Lars Magne Ingebrigtsen
2012-03-14 20:21   ` Reiner Steib
2012-03-15  1:29     ` Lars Magne Ingebrigtsen
2012-03-15 21:05   ` Jan Tatarik
2012-03-22 20:38     ` Lars Magne Ingebrigtsen
2012-03-23 12:11       ` Jan Tatarik
2012-04-10 19:32         ` Lars Magne Ingebrigtsen
2012-04-11  7:30           ` Jan Tatarik
2012-04-11 19:34           ` Jan Tatarik
2012-04-12 18:45             ` Lars Magne Ingebrigtsen
2012-04-12 22:58               ` Jan Tatarik
2012-06-10 21:08                 ` Lars Magne Ingebrigtsen
2012-06-28  9:45                   ` Jan Tatarik
2012-09-05 13:40                     ` Lars Ingebrigtsen
2012-09-05 14:39                       ` Jan Tatarik
2012-09-05 14:43                         ` Lars Ingebrigtsen
2012-09-05 15:07                           ` Jan Tatarik
2012-09-05 15:35                             ` Lars Ingebrigtsen
2012-09-05 15:42                           ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).