From: Jan Tatarik <jan.tatarik@gmail.com>
To: ding@gnus.org
Subject: Re: Scoring on basee64 encoded message body
Date: Thu, 15 Mar 2012 22:05:01 +0100 [thread overview]
Message-ID: <5n5x2rvcm5zibm.fsf@nb-jtatarik2.xing.hh> (raw)
In-Reply-To: <m3r4wvp7rl.fsf@stories.gnus.org> (Lars Magne Ingebrigtsen's message of "Wed, 14 Mar 2012 15:38:38 +0100")
[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]
On Wed, Mar 14 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:
> Jan Tatarik <jan.tatarik@gmail.com> writes:
>> I finally realized the content of the messages is base64 encoded, so
>> matching on the raw body cannot work.
>> The attached patch fixes the problem for me, but I have no idea
>> whether it's a generally acceptable solution. I'm only using the body
>> match in a low-traffic group, so speed is not an issue for me.
> [...]
>> + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
>> + (article-de-base64-unreadable t))
> This isn't a general enough solution here. QP-encoded messages also
> want decoding.
> But the more general issue is -- should scoring on bodies be done on the
> decoded messages or the encoded messages? I think it would make more
> sense to do it on decoded messages, and since these are body matches,
> speed don't really matter that much, because body matches are s-l-o-w
> anyway.
This better?
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode-message-before-scoring-on-body.diff --]
[-- Type: text/x-diff, Size: 3182 bytes --]
diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..1b4fc22bc11ee0d599fb40a0adef53588b4f9ca4 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
(with-current-buffer nntp-server-buffer
(let* ((request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect the
+ ;; content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
ofunc article)
;; Not all backends support partial fetching. In that case, we
@@ -196,6 +198,14 @@
(gnus-message 7 "Scoring article %s..." article)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ ;; Searching base64/qp-encoded message body produces more
+ ;; satisfactory results if we decode the message first
+ (unless (or (eq ofunc 'gnus-request-head)
+ (eq request-func 'gnus-request-head))
+ (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+ (cond
+ ((string= "base64" encoding) (article-de-base64-unreadable t))
+ ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
;; If just parts of the article is to be searched and the
;; backend didn't support partial fetching, we just narrow to
;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..776194a31c6702441d3bae74c8f6048778270e67 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE."
(all-scores scores)
(request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect
+ ;; the content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
entries alist ofunc article last)
(when articles
@@ -1773,6 +1775,14 @@ score in `gnus-newsgroup-scored' by SCORE."
(widen)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ ;; Searching base64/qp-encoded message body produces more
+ ;; satisfactory results if we decode the message first
+ (unless (or (eq ofunc 'gnus-request-head)
+ (eq request-func 'gnus-request-head))
+ (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+ (cond
+ ((string= "base64" encoding) (article-de-base64-unreadable t))
+ ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
;; If just parts of the article is to be searched, but the
;; backend didn't support partial fetching, we just narrow
;; to the relevant parts.
next prev parent reply other threads:[~2012-03-15 21:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-13 10:39 Jan Tatarik
2012-03-14 14:38 ` Lars Magne Ingebrigtsen
2012-03-14 20:21 ` Reiner Steib
2012-03-15 1:29 ` Lars Magne Ingebrigtsen
2012-03-15 21:05 ` Jan Tatarik [this message]
2012-03-22 20:38 ` Lars Magne Ingebrigtsen
2012-03-23 12:11 ` Jan Tatarik
2012-04-10 19:32 ` Lars Magne Ingebrigtsen
2012-04-11 7:30 ` Jan Tatarik
2012-04-11 19:34 ` Jan Tatarik
2012-04-12 18:45 ` Lars Magne Ingebrigtsen
2012-04-12 22:58 ` Jan Tatarik
2012-06-10 21:08 ` Lars Magne Ingebrigtsen
2012-06-28 9:45 ` Jan Tatarik
2012-09-05 13:40 ` Lars Ingebrigtsen
2012-09-05 14:39 ` Jan Tatarik
2012-09-05 14:43 ` Lars Ingebrigtsen
2012-09-05 15:07 ` Jan Tatarik
2012-09-05 15:35 ` Lars Ingebrigtsen
2012-09-05 15:42 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5n5x2rvcm5zibm.fsf@nb-jtatarik2.xing.hh \
--to=jan.tatarik@gmail.com \
--cc=ding@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).