Gnus development mailing list
 help / color / mirror / Atom feed
From: Jan Tatarik <jan.tatarik@gmail.com>
To: ding@gnus.org
Subject: Re: Scoring on basee64 encoded message body
Date: Thu, 15 Mar 2012 22:05:01 +0100	[thread overview]
Message-ID: <5n5x2rvcm5zibm.fsf@nb-jtatarik2.xing.hh> (raw)
In-Reply-To: <m3r4wvp7rl.fsf@stories.gnus.org> (Lars Magne Ingebrigtsen's message of "Wed, 14 Mar 2012 15:38:38 +0100")

[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]

On Wed, Mar 14 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:

> Jan Tatarik <jan.tatarik@gmail.com> writes:

>> I finally realized the content of the messages is base64 encoded, so
>> matching on the raw body cannot work.

>> The attached patch fixes the problem for me, but I have no idea
>> whether it's a generally acceptable solution. I'm only using the body
>> match in a low-traffic group, so speed is not an issue for me.

> [...]

>> +            (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
>> +              (article-de-base64-unreadable t))

> This isn't a general enough solution here.  QP-encoded messages also
> want decoding.

> But the more general issue is -- should scoring on bodies be done on the
> decoded messages or the encoded messages?  I think it would make more
> sense to do it on decoded messages, and since these are body matches,
> speed don't really matter that much, because body matches are s-l-o-w
> anyway.

This better?


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode-message-before-scoring-on-body.diff --]
[-- Type: text/x-diff, Size: 3182 bytes --]

diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..1b4fc22bc11ee0d599fb40a0adef53588b4f9ca4 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
   (with-current-buffer nntp-server-buffer
     (let* ((request-func (cond ((string= "head" header)
 				'gnus-request-head)
+                               ;; We need to peek at the headers to detect the
+                               ;; content encoding
 			       ((string= "body" header)
-				'gnus-request-body)
+                                'gnus-request-article)
 			       (t 'gnus-request-article)))
 	   ofunc article)
       ;; Not all backends support partial fetching.  In that case, we
@@ -196,6 +198,14 @@
       (gnus-message 7 "Scoring article %s..." article)
       (when (funcall request-func article gnus-newsgroup-name)
 	(goto-char (point-min))
+        ;; Searching base64/qp-encoded message body produces more
+        ;; satisfactory results if we decode the message first
+        (unless (or (eq ofunc 'gnus-request-head)
+                    (eq request-func 'gnus-request-head))
+          (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+            (cond
+             ((string= "base64" encoding) (article-de-base64-unreadable t))
+             ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
 	;; If just parts of the article is to be searched and the
 	;; backend didn't support partial fetching, we just narrow to
 	;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..776194a31c6702441d3bae74c8f6048778270e67 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE."
 	       (all-scores scores)
 	       (request-func (cond ((string= "head" header)
 				    'gnus-request-head)
+                                   ;; We need to peek at the headers to detect
+                                   ;; the content encoding
 				   ((string= "body" header)
-				    'gnus-request-body)
+                                    'gnus-request-article)
 				   (t 'gnus-request-article)))
 	       entries alist ofunc article last)
 	  (when articles
@@ -1773,6 +1775,14 @@ score in `gnus-newsgroup-scored' by SCORE."
 	      (widen)
 	      (when (funcall request-func article gnus-newsgroup-name)
 		(goto-char (point-min))
+                ;; Searching base64/qp-encoded message body produces more
+                ;; satisfactory results if we decode the message first
+                (unless (or (eq ofunc 'gnus-request-head)
+                            (eq request-func 'gnus-request-head))
+                  (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+                    (cond
+                     ((string= "base64" encoding) (article-de-base64-unreadable t))
+                     ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
 	    ;; If just parts of the article is to be searched, but the
 	    ;; backend didn't support partial fetching, we just narrow
 		;; to the relevant parts.

  parent reply	other threads:[~2012-03-15 21:05 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-13 10:39 Jan Tatarik
2012-03-14 14:38 ` Lars Magne Ingebrigtsen
2012-03-14 20:21   ` Reiner Steib
2012-03-15  1:29     ` Lars Magne Ingebrigtsen
2012-03-15 21:05   ` Jan Tatarik [this message]
2012-03-22 20:38     ` Lars Magne Ingebrigtsen
2012-03-23 12:11       ` Jan Tatarik
2012-04-10 19:32         ` Lars Magne Ingebrigtsen
2012-04-11  7:30           ` Jan Tatarik
2012-04-11 19:34           ` Jan Tatarik
2012-04-12 18:45             ` Lars Magne Ingebrigtsen
2012-04-12 22:58               ` Jan Tatarik
2012-06-10 21:08                 ` Lars Magne Ingebrigtsen
2012-06-28  9:45                   ` Jan Tatarik
2012-09-05 13:40                     ` Lars Ingebrigtsen
2012-09-05 14:39                       ` Jan Tatarik
2012-09-05 14:43                         ` Lars Ingebrigtsen
2012-09-05 15:07                           ` Jan Tatarik
2012-09-05 15:35                             ` Lars Ingebrigtsen
2012-09-05 15:42                           ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5n5x2rvcm5zibm.fsf@nb-jtatarik2.xing.hh \
    --to=jan.tatarik@gmail.com \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).