* Scoring on basee64 encoded message body
@ 2012-03-13 10:39 Jan Tatarik
2012-03-14 14:38 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-03-13 10:39 UTC (permalink / raw)
To: ding
[-- Attachment #1.1: Type: text/plain, Size: 410 bytes --]
Hi,
I just spent some time trying to get scoring by message body work in one of
my nnimap groups.
I finally realized the content of the messages is base64 encoded, so
matching on the raw body cannot work.
The attached patch fixes the problem for me, but I have no idea whether
it's a generally acceptable solution. I'm only using the body match in a
low-traffic group, so speed is not an issue for me.
Jan
[-- Attachment #1.2: Type: text/html, Size: 534 bytes --]
[-- Attachment #2: score-body-base64-encoded.diff --]
[-- Type: application/octet-stream, Size: 1142 bytes --]
diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..85a1f83466a246c26be5f8b05263a045c7928f0e 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -204,6 +204,8 @@
(narrow-to-region
(point)
(or (search-forward "\n\n" nil t) (point-max)))
+ (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
+ (article-de-base64-unreadable t))
(narrow-to-region
(or (search-forward "\n\n" nil t) (point))
(point-max))))
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..231c942bb34c97d66563ca7a399c9808b1f66663 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1781,6 +1781,8 @@ score in `gnus-newsgroup-scored' by SCORE."
(narrow-to-region
(point)
(or (search-forward "\n\n" nil t) (point-max)))
+ (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
+ (article-de-base64-unreadable t))
(narrow-to-region
(or (search-forward "\n\n" nil t) (point))
(point-max))))
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-03-13 10:39 Scoring on basee64 encoded message body Jan Tatarik
@ 2012-03-14 14:38 ` Lars Magne Ingebrigtsen
2012-03-14 20:21 ` Reiner Steib
2012-03-15 21:05 ` Jan Tatarik
0 siblings, 2 replies; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-14 14:38 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
> I finally realized the content of the messages is base64 encoded, so
> matching on the raw body cannot work.
>
> The attached patch fixes the problem for me, but I have no idea
> whether it's a generally acceptable solution. I'm only using the body
> match in a low-traffic group, so speed is not an issue for me.
[...]
> + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
> + (article-de-base64-unreadable t))
This isn't a general enough solution here. QP-encoded messages also
want decoding.
But the more general issue is -- should scoring on bodies be done on the
decoded messages or the encoded messages? I think it would make more
sense to do it on decoded messages, and since these are body matches,
speed don't really matter that much, because body matches are s-l-o-w
anyway.
Scoring on headers are done on the decoded headers, right? So it would
be more consistent to decode the bodies, too. Wouldn't it?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-03-14 14:38 ` Lars Magne Ingebrigtsen
@ 2012-03-14 20:21 ` Reiner Steib
2012-03-15 1:29 ` Lars Magne Ingebrigtsen
2012-03-15 21:05 ` Jan Tatarik
1 sibling, 1 reply; 20+ messages in thread
From: Reiner Steib @ 2012-03-14 20:21 UTC (permalink / raw)
To: ding
On Wed, Mar 14 2012, Lars Magne Ingebrigtsen wrote:
> But the more general issue is -- should scoring on bodies be done on the
> decoded messages or the encoded messages? I think it would make more
> sense to do it on decoded messages,
Sure.
I remember that we had this problem WRT mail splitting. And I'm a
little surprised that `nnmail-mail-splitting-decodes' is nil by
default. Probably because users might filter spam on "strange"
charsets.
> Scoring on headers are done on the decoded headers, right? So it would
> be more consistent to decode the bodies, too. Wouldn't it?
Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-03-14 20:21 ` Reiner Steib
@ 2012-03-15 1:29 ` Lars Magne Ingebrigtsen
0 siblings, 0 replies; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-15 1:29 UTC (permalink / raw)
To: ding
Reiner Steib <reinersteib+gmane@imap.cc> writes:
> I remember that we had this problem WRT mail splitting. And I'm a
> little surprised that `nnmail-mail-splitting-decodes' is nil by
> default. Probably because users might filter spam on "strange"
> charsets.
I do that -- for instance to filter all the Russian spam into the spam
group.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-03-14 14:38 ` Lars Magne Ingebrigtsen
2012-03-14 20:21 ` Reiner Steib
@ 2012-03-15 21:05 ` Jan Tatarik
2012-03-22 20:38 ` Lars Magne Ingebrigtsen
1 sibling, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-03-15 21:05 UTC (permalink / raw)
To: ding
[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]
On Wed, Mar 14 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:
> Jan Tatarik <jan.tatarik@gmail.com> writes:
>> I finally realized the content of the messages is base64 encoded, so
>> matching on the raw body cannot work.
>> The attached patch fixes the problem for me, but I have no idea
>> whether it's a generally acceptable solution. I'm only using the body
>> match in a low-traffic group, so speed is not an issue for me.
> [...]
>> + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64")
>> + (article-de-base64-unreadable t))
> This isn't a general enough solution here. QP-encoded messages also
> want decoding.
> But the more general issue is -- should scoring on bodies be done on the
> decoded messages or the encoded messages? I think it would make more
> sense to do it on decoded messages, and since these are body matches,
> speed don't really matter that much, because body matches are s-l-o-w
> anyway.
This better?
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode-message-before-scoring-on-body.diff --]
[-- Type: text/x-diff, Size: 3182 bytes --]
diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..1b4fc22bc11ee0d599fb40a0adef53588b4f9ca4 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
(with-current-buffer nntp-server-buffer
(let* ((request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect the
+ ;; content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
ofunc article)
;; Not all backends support partial fetching. In that case, we
@@ -196,6 +198,14 @@
(gnus-message 7 "Scoring article %s..." article)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ ;; Searching base64/qp-encoded message body produces more
+ ;; satisfactory results if we decode the message first
+ (unless (or (eq ofunc 'gnus-request-head)
+ (eq request-func 'gnus-request-head))
+ (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+ (cond
+ ((string= "base64" encoding) (article-de-base64-unreadable t))
+ ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
;; If just parts of the article is to be searched and the
;; backend didn't support partial fetching, we just narrow to
;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..776194a31c6702441d3bae74c8f6048778270e67 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE."
(all-scores scores)
(request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect
+ ;; the content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
entries alist ofunc article last)
(when articles
@@ -1773,6 +1775,14 @@ score in `gnus-newsgroup-scored' by SCORE."
(widen)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ ;; Searching base64/qp-encoded message body produces more
+ ;; satisfactory results if we decode the message first
+ (unless (or (eq ofunc 'gnus-request-head)
+ (eq request-func 'gnus-request-head))
+ (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+ (cond
+ ((string= "base64" encoding) (article-de-base64-unreadable t))
+ ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t)))))
;; If just parts of the article is to be searched, but the
;; backend didn't support partial fetching, we just narrow
;; to the relevant parts.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-03-15 21:05 ` Jan Tatarik
@ 2012-03-22 20:38 ` Lars Magne Ingebrigtsen
2012-03-23 12:11 ` Jan Tatarik
0 siblings, 1 reply; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-03-22 20:38 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
> This better?
Yes, that looks better, but it should probably just call
`mm-decode-content-transfer-encoding' instead, I think?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-03-22 20:38 ` Lars Magne Ingebrigtsen
@ 2012-03-23 12:11 ` Jan Tatarik
2012-04-10 19:32 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-03-23 12:11 UTC (permalink / raw)
To: ding
[-- Attachment #1: Type: text/plain, Size: 271 bytes --]
On Thu, Mar 22 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:
> Jan Tatarik <jan.tatarik@gmail.com> writes:
>> This better?
> Yes, that looks better, but it should probably just call
> `mm-decode-content-transfer-encoding' instead, I think?
Like this?
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode message body before scoring on it --]
[-- Type: text/x-diff, Size: 3608 bytes --]
diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..9216f5699ce1ed9a8c39dd03257a17885f6e8490 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
(with-current-buffer nntp-server-buffer
(let* ((request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect the
+ ;; content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
ofunc article)
;; Not all backends support partial fetching. In that case, we
@@ -196,6 +198,20 @@
(gnus-message 7 "Scoring article %s..." article)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ ;; Searching base64/qp-encoded message body produces more
+ ;; satisfactory results if we decode the message first
+ (unless (or (eq ofunc 'gnus-request-head)
+ (eq request-func 'gnus-request-head))
+ (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+ (when encoding
+ (save-excursion
+ (save-restriction
+ ;; narrow to body
+ (narrow-to-region
+ (or (search-forward "\n\n" nil t) (point))
+ (point-max))
+ (mm-decode-content-transfer-encoding
+ (intern (downcase encoding))))))))
;; If just parts of the article is to be searched and the
;; backend didn't support partial fetching, we just narrow to
;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..322aed78fa374b873fb8604482adead77362c5be 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE."
(all-scores scores)
(request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect
+ ;; the content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
entries alist ofunc article last)
(when articles
@@ -1773,6 +1775,20 @@ score in `gnus-newsgroup-scored' by SCORE."
(widen)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ ;; Searching base64/qp-encoded message body produces more
+ ;; satisfactory results if we decode the message first
+ (unless (or (eq ofunc 'gnus-request-head)
+ (eq request-func 'gnus-request-head))
+ (let ((encoding (gnus-fetch-field "content-transfer-encoding")))
+ (when encoding
+ (save-excursion
+ (save-restriction
+ ;; narrow to body
+ (narrow-to-region
+ (or (search-forward "\n\n" nil t) (point))
+ (point-max))
+ (mm-decode-content-transfer-encoding
+ (intern (downcase encoding))))))))
;; If just parts of the article is to be searched, but the
;; backend didn't support partial fetching, we just narrow
;; to the relevant parts.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-03-23 12:11 ` Jan Tatarik
@ 2012-04-10 19:32 ` Lars Magne Ingebrigtsen
2012-04-11 7:30 ` Jan Tatarik
2012-04-11 19:34 ` Jan Tatarik
0 siblings, 2 replies; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-04-10 19:32 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
>> Yes, that looks better, but it should probably just call
>> `mm-decode-content-transfer-encoding' instead, I think?
>
> Like this?
Yes. And I almost applied it, but then I remembered that this
wouldn't, in general, really give us what we wanted, anyway. :-/
That is, a lot of messages are MIME multipart messages. If we start
decoding, we really should decode all the parts, and score on the
decoded text of them all, I think.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-04-10 19:32 ` Lars Magne Ingebrigtsen
@ 2012-04-11 7:30 ` Jan Tatarik
2012-04-11 19:34 ` Jan Tatarik
1 sibling, 0 replies; 20+ messages in thread
From: Jan Tatarik @ 2012-04-11 7:30 UTC (permalink / raw)
To: ding
On Tue, Apr 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:
>> Like this?
> Yes. And I almost applied it, but then I remembered that this
> wouldn't, in general, really give us what we wanted, anyway. :-/
> That is, a lot of messages are MIME multipart messages. If we start
> decoding, we really should decode all the parts, and score on the
> decoded text of them all, I think.
Any other scenario I should be aware of, before I send the next patch?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-04-10 19:32 ` Lars Magne Ingebrigtsen
2012-04-11 7:30 ` Jan Tatarik
@ 2012-04-11 19:34 ` Jan Tatarik
2012-04-12 18:45 ` Lars Magne Ingebrigtsen
1 sibling, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-04-11 19:34 UTC (permalink / raw)
To: ding
On Tue, Apr 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:
> Yes. And I almost applied it, but then I remembered that this
> wouldn't, in general, really give us what we wanted, anyway. :-/
> That is, a lot of messages are MIME multipart messages. If we start
> decoding, we really should decode all the parts, and score on the
> decoded text of them all, I think.
All text/* parts, I assume?
I had a quick look at mm handling in gnus. My idea would be to
- run mm-dissect-buffer on the message body (any idea which args would
be appropriate?)
- for multipart messages, pick the handles with text/* type, run them
through their respective mm-inline-* function as defined in
mm-inline-media-tests
- score on all the decoded text parts
Is this the way to go?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-04-11 19:34 ` Jan Tatarik
@ 2012-04-12 18:45 ` Lars Magne Ingebrigtsen
2012-04-12 22:58 ` Jan Tatarik
0 siblings, 1 reply; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-04-12 18:45 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
> All text/* parts, I assume?
It's not uncommon to stash texts bits in application/octet-stream parts,
too, but I guess limiting this to text/* would be sensible...
> I had a quick look at mm handling in gnus. My idea would be to
>
> - run mm-dissect-buffer on the message body (any idea which args would
> be appropriate?)
NO-STRICT-MIME, I think.
> - for multipart messages, pick the handles with text/* type, run them
> through their respective mm-inline-* function as defined in
> mm-inline-media-tests
>
> - score on all the decoded text parts
>
> Is this the way to go?
Yup; sounds good.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-04-12 18:45 ` Lars Magne Ingebrigtsen
@ 2012-04-12 22:58 ` Jan Tatarik
2012-06-10 21:08 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-04-12 22:58 UTC (permalink / raw)
To: ding
[-- Attachment #1: Type: text/plain, Size: 486 bytes --]
On Thu, Apr 12 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:
>> - run mm-dissect-buffer on the message body (any idea which args would
>> be appropriate?)
> NO-STRICT-MIME, I think.
>> - for multipart messages, pick the handles with text/* type, run them
>> through their respective mm-inline-* function as defined in
>> mm-inline-media-tests
>> - score on all the decoded text parts
>> Is this the way to go?
> Yup; sounds good.
And here is the new patch.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: decode mm messages when scoring on body --]
[-- Type: text/x-diff, Size: 3002 bytes --]
diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..38442a406dd6fb6ef47cd468248618e68337a26a 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -181,8 +181,10 @@
(with-current-buffer nntp-server-buffer
(let* ((request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect the
+ ;; content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
ofunc article)
;; Not all backends support partial fetching. In that case, we
@@ -196,6 +198,7 @@
(gnus-message 7 "Scoring article %s..." article)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ (gnus-score-decode-text-parts)
;; If just parts of the article is to be searched and the
;; backend didn't support partial fetching, we just narrow to
;; the relevant parts.
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..003355dd2c91847241dc67263f83d26ae52920de 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1736,6 +1736,24 @@ score in `gnus-newsgroup-scored' by SCORE."
(setq entries rest)))))
nil)
+(defun gnus-score-decode-text-parts ()
+ (let ((handles (mm-dissect-buffer t)))
+ (cond ((stringp (car handles)) (pop handles))
+ ((and (bufferp (car handles))
+ (stringp (car (mm-handle-type handles))))
+ (setq handles (list handles))))
+
+ (save-excursion
+ (article-goto-body)
+ (delete-region (point) (point-max))
+ (save-restriction
+ (narrow-to-region (point) (point))
+ (mapc #'mm-display-inline
+ (remove-if-not
+ (lambda (handle)
+ (string-match "^text/" (mm-handle-media-type handle)))
+ handles))))))
+
(defun gnus-score-body (scores header now expire &optional trace)
(if gnus-agent-fetching
nil
@@ -1752,8 +1770,10 @@ score in `gnus-newsgroup-scored' by SCORE."
(all-scores scores)
(request-func (cond ((string= "head" header)
'gnus-request-head)
+ ;; We need to peek at the headers to detect
+ ;; the content encoding
((string= "body" header)
- 'gnus-request-body)
+ 'gnus-request-article)
(t 'gnus-request-article)))
entries alist ofunc article last)
(when articles
@@ -1773,6 +1793,7 @@ score in `gnus-newsgroup-scored' by SCORE."
(widen)
(when (funcall request-func article gnus-newsgroup-name)
(goto-char (point-min))
+ (gnus-score-decode-text-parts)
;; If just parts of the article is to be searched, but the
;; backend didn't support partial fetching, we just narrow
;; to the relevant parts.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-04-12 22:58 ` Jan Tatarik
@ 2012-06-10 21:08 ` Lars Magne Ingebrigtsen
2012-06-28 9:45 ` Jan Tatarik
0 siblings, 1 reply; 20+ messages in thread
From: Lars Magne Ingebrigtsen @ 2012-06-10 21:08 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
> And here is the new patch.
Sorry for the extreme delays here.
The patch looks good.
I don't remember whether I asked you whether you had FSF copyright
assignments on file or not?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-06-10 21:08 ` Lars Magne Ingebrigtsen
@ 2012-06-28 9:45 ` Jan Tatarik
2012-09-05 13:40 ` Lars Ingebrigtsen
0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-06-28 9:45 UTC (permalink / raw)
To: ding; +Cc: Lars Magne Ingebrigtsen
[-- Attachment #1: Type: text/plain, Size: 329 bytes --]
On Sun, Jun 10 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote:
> Jan Tatarik <jan.tatarik@gmail.com> writes:
>> And here is the new patch.
> Sorry for the extreme delays here.
> The patch looks good.
> I don't remember whether I asked you whether you had FSF copyright
> assignments on file or not?
I do now.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: score by body --]
[-- Type: text/x-diff, Size: 14559 bytes --]
diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el
index 954295438c953c2500b9c1959a49e52312cc9653..a440b7799303ddf46674c35c20f5bba51388b50c 100644
--- a/lisp/gnus-logic.el
+++ b/lisp/gnus-logic.el
@@ -180,46 +180,51 @@
(setq header "article"))
(with-current-buffer nntp-server-buffer
(let* ((request-func (cond ((string= "head" header)
- 'gnus-request-head)
- ((string= "body" header)
- 'gnus-request-body)
- (t 'gnus-request-article)))
- ofunc article)
+ 'gnus-request-head)
+ ;; We need to peek at the headers to detect the
+ ;; content encoding
+ ((string= "body" header)
+ 'gnus-request-article)
+ (t 'gnus-request-article)))
+ ofunc article handles)
;; Not all backends support partial fetching. In that case, we
;; just fetch the entire article.
(unless (gnus-check-backend-function
- (intern (concat "request-" header))
- gnus-newsgroup-name)
- (setq ofunc request-func)
- (setq request-func 'gnus-request-article))
+ (intern (concat "request-" header))
+ gnus-newsgroup-name)
+ (setq ofunc request-func)
+ (setq request-func 'gnus-request-article))
(setq article (mail-header-number gnus-advanced-headers))
(gnus-message 7 "Scoring article %s..." article)
(when (funcall request-func article gnus-newsgroup-name)
- (goto-char (point-min))
- ;; If just parts of the article is to be searched and the
- ;; backend didn't support partial fetching, we just narrow to
- ;; the relevant parts.
- (when ofunc
- (if (eq ofunc 'gnus-request-head)
- (narrow-to-region
- (point)
- (or (search-forward "\n\n" nil t) (point-max)))
- (narrow-to-region
- (or (search-forward "\n\n" nil t) (point))
- (point-max))))
- (let* ((case-fold-search (not (eq (downcase (symbol-name type))
- (symbol-name type))))
- (search-func
- (cond ((memq type '(r R regexp Regexp))
- 're-search-forward)
- ((memq type '(s S string String))
- 'search-forward)
- (t
- (error "Invalid match type: %s" type)))))
- (goto-char (point-min))
- (prog1
- (funcall search-func match nil t)
- (widen)))))))
+ (when (string= "body" header)
+ (setq handles (gnus-score-decode-text-parts)))
+ (goto-char (point-min))
+ ;; If just parts of the article is to be searched and the
+ ;; backend didn't support partial fetching, we just narrow to
+ ;; the relevant parts.
+ (when ofunc
+ (if (eq ofunc 'gnus-request-head)
+ (narrow-to-region
+ (point)
+ (or (search-forward "\n\n" nil t) (point-max)))
+ (narrow-to-region
+ (or (search-forward "\n\n" nil t) (point))
+ (point-max))))
+ (let* ((case-fold-search (not (eq (downcase (symbol-name type))
+ (symbol-name type))))
+ (search-func
+ (cond ((memq type '(r R regexp Regexp))
+ 're-search-forward)
+ ((memq type '(s S string String))
+ 'search-forward)
+ (t
+ (error "Invalid match type: %s" type)))))
+ (goto-char (point-min))
+ (prog1
+ (funcall search-func match nil t)
+ (widen)))
+ (when handles (mm-destroy-parts handles))))))
(provide 'gnus-logic)
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index f86b6f837a70ce54b06668187821fe57c3f80f4c..d50b3cc0cf02812ef32f53beaf85cd27171680d4 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1736,105 +1736,140 @@ score in `gnus-newsgroup-scored' by SCORE."
(setq entries rest)))))
nil)
+(defun gnus-score-decode-text-parts ()
+ (labels ((mm-text-parts (handle)
+ (cond ((stringp (car handle))
+ (let ((parts (mapcan 'mm-text-parts (cdr handle))))
+ (if (equal "multipart/alternative" (car handle))
+ ;; pick the first supported alternative
+ (list (car parts))
+ parts)))
+
+ ((bufferp (car handle))
+ (when (string-match "^text/" (mm-handle-media-type handle))
+ (list handle)))
+
+ (t (mapcan 'mm-text-parts handle))))
+ (my-mm-display-part (handle)
+ (when handle
+ (save-restriction
+ (narrow-to-region (point) (point))
+ (mm-display-inline handle)
+ (goto-char (point-max))))))
+
+ (let (;(mm-text-html-renderer 'w3m-standalone)
+ (handles (mm-dissect-buffer t)))
+ (save-excursion
+ (article-goto-body)
+ (delete-region (point) (point-max))
+ (mapc 'my-mm-display-part (mm-text-parts handles))
+ handles))))
+
(defun gnus-score-body (scores header now expire &optional trace)
- (if gnus-agent-fetching
- nil
- (save-excursion
- (setq gnus-scores-articles
- (sort gnus-scores-articles
- (lambda (a1 a2)
- (< (mail-header-number (car a1))
- (mail-header-number (car a2))))))
- (set-buffer nntp-server-buffer)
- (save-restriction
- (let* ((buffer-read-only nil)
- (articles gnus-scores-articles)
- (all-scores scores)
- (request-func (cond ((string= "head" header)
- 'gnus-request-head)
- ((string= "body" header)
- 'gnus-request-body)
- (t 'gnus-request-article)))
- entries alist ofunc article last)
- (when articles
- (setq last (mail-header-number (caar (last articles))))
- ;; Not all backends support partial fetching. In that case,
- ;; we just fetch the entire article.
- (unless (gnus-check-backend-function
- (and (string-match "^gnus-" (symbol-name request-func))
- (intern (substring (symbol-name request-func)
- (match-end 0))))
- gnus-newsgroup-name)
- (setq ofunc request-func)
- (setq request-func 'gnus-request-article))
- (while articles
- (setq article (mail-header-number (caar articles)))
- (gnus-message 7 "Scoring article %s of %s..." article last)
- (widen)
- (when (funcall request-func article gnus-newsgroup-name)
- (goto-char (point-min))
- ;; If just parts of the article is to be searched, but the
- ;; backend didn't support partial fetching, we just narrow
- ;; to the relevant parts.
- (when ofunc
- (if (eq ofunc 'gnus-request-head)
- (narrow-to-region
- (point)
- (or (search-forward "\n\n" nil t) (point-max)))
- (narrow-to-region
- (or (search-forward "\n\n" nil t) (point))
- (point-max))))
- (setq scores all-scores)
- ;; Find matches.
- (while scores
- (setq alist (pop scores)
- entries (assoc header alist))
- (while (cdr entries) ;First entry is the header index.
- (let* ((rest (cdr entries))
- (kill (car rest))
- (match (nth 0 kill))
- (type (or (nth 3 kill) 's))
- (score (or (nth 1 kill)
- gnus-score-interactive-default-score))
- (date (nth 2 kill))
- (found nil)
- (case-fold-search
- (not (or (eq type 'R) (eq type 'S)
- (eq type 'Regexp) (eq type 'String))))
- (search-func
- (cond ((or (eq type 'r) (eq type 'R)
- (eq type 'regexp) (eq type 'Regexp))
- 're-search-forward)
- ((or (eq type 's) (eq type 'S)
- (eq type 'string) (eq type 'String))
- 'search-forward)
- (t
- (error "Invalid match type: %s" type)))))
- (goto-char (point-min))
- (when (funcall search-func match nil t)
- ;; Found a match, update scores.
- (setcdr (car articles) (+ score (cdar articles)))
- (setq found t)
- (when trace
- (push
- (cons (car-safe (rassq alist gnus-score-cache))
- kill)
- gnus-score-trace)))
- ;; Update expire date
- (unless trace
- (cond
- ((null date)) ;Permanent entry.
- ((and found gnus-update-score-entry-dates)
- ;; Match, update date.
- (gnus-score-set 'touched '(t) alist)
- (setcar (nthcdr 2 kill) now))
- ((and expire (< date expire)) ;Old entry, remove.
- (gnus-score-set 'touched '(t) alist)
- (setcdr entries (cdr rest))
- (setq rest entries))))
- (setq entries rest)))))
- (setq articles (cdr articles)))))))
- nil))
+ (if gnus-agent-fetching
+ nil
+ (save-excursion
+ (setq gnus-scores-articles
+ (sort gnus-scores-articles
+ (lambda (a1 a2)
+ (< (mail-header-number (car a1))
+ (mail-header-number (car a2))))))
+ (set-buffer nntp-server-buffer)
+ (save-restriction
+ (let* ((buffer-read-only nil)
+ (articles gnus-scores-articles)
+ (all-scores scores)
+ (request-func (cond ((string= "head" header)
+ 'gnus-request-head)
+ ;; We need to peek at the headers to detect
+ ;; the content encoding
+ ((string= "body" header)
+ 'gnus-request-article)
+ (t 'gnus-request-article)))
+ entries alist ofunc article last)
+ (when articles
+ (setq last (mail-header-number (caar (last articles))))
+ ;; Not all backends support partial fetching. In that case,
+ ;; we just fetch the entire article.
+ (unless (gnus-check-backend-function
+ (and (string-match "^gnus-" (symbol-name request-func))
+ (intern (substring (symbol-name request-func)
+ (match-end 0))))
+ gnus-newsgroup-name)
+ (setq ofunc request-func)
+ (setq request-func 'gnus-request-article))
+ (while articles
+ (setq article (mail-header-number (caar articles)))
+ (gnus-message 7 "Scoring article %s of %s..." article last)
+ (widen)
+ (let (handles)
+ (when (funcall request-func article gnus-newsgroup-name)
+ (when (string= "body" header)
+ (setq handles (gnus-score-decode-text-parts)))
+ (goto-char (point-min))
+ ;; If just parts of the article is to be searched, but the
+ ;; backend didn't support partial fetching, we just narrow
+ ;; to the relevant parts.
+ (when ofunc
+ (if (eq ofunc 'gnus-request-head)
+ (narrow-to-region
+ (point)
+ (or (search-forward "\n\n" nil t) (point-max)))
+ (narrow-to-region
+ (or (search-forward "\n\n" nil t) (point))
+ (point-max))))
+ (setq scores all-scores)
+ ;; Find matches.
+ (while scores
+ (setq alist (pop scores)
+ entries (assoc header alist))
+ (while (cdr entries) ;First entry is the header index.
+ (let* ((rest (cdr entries))
+ (kill (car rest))
+ (match (nth 0 kill))
+ (type (or (nth 3 kill) 's))
+ (score (or (nth 1 kill)
+ gnus-score-interactive-default-score))
+ (date (nth 2 kill))
+ (found nil)
+ (case-fold-search
+ (not (or (eq type 'R) (eq type 'S)
+ (eq type 'Regexp) (eq type 'String))))
+ (search-func
+ (cond ((or (eq type 'r) (eq type 'R)
+ (eq type 'regexp) (eq type 'Regexp))
+ 're-search-forward)
+ ((or (eq type 's) (eq type 'S)
+ (eq type 'string) (eq type 'String))
+ 'search-forward)
+ (t
+ (error "Invalid match type: %s" type)))))
+ (goto-char (point-min))
+ (when (funcall search-func match nil t)
+ ;; Found a match, update scores.
+ (setcdr (car articles) (+ score (cdar articles)))
+ (setq found t)
+ (when trace
+ (push
+ (cons (car-safe (rassq alist gnus-score-cache))
+ kill)
+ gnus-score-trace)))
+ ;; Update expire date
+ (unless trace
+ (cond
+ ((null date)) ;Permanent entry.
+ ((and found gnus-update-score-entry-dates)
+ ;; Match, update date.
+ (gnus-score-set 'touched '(t) alist)
+ (setcar (nthcdr 2 kill) now))
+ ((and expire (< date expire)) ;Old entry, remove.
+ (gnus-score-set 'touched '(t) alist)
+ (setcdr entries (cdr rest))
+ (setq rest entries))))
+ (setq entries rest))))
+ (when handles (mm-destroy-parts handles))))
+ (setq articles (cdr articles)))))))
+ nil))
(defun gnus-score-thread (scores header now expire &optional trace)
(gnus-score-followup scores header now expire trace t))
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-06-28 9:45 ` Jan Tatarik
@ 2012-09-05 13:40 ` Lars Ingebrigtsen
2012-09-05 14:39 ` Jan Tatarik
0 siblings, 1 reply; 20+ messages in thread
From: Lars Ingebrigtsen @ 2012-09-05 13:40 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
>> I don't remember whether I asked you whether you had FSF copyright
>> assignments on file or not?
>
> I do now.
Thanks; I've now applied your patch to Ma Gnus.
--
(domestic pets only, the antidote for overdose, milk.)
http://lars.ingebrigtsen.no * Sent from my Emacs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-09-05 13:40 ` Lars Ingebrigtsen
@ 2012-09-05 14:39 ` Jan Tatarik
2012-09-05 14:43 ` Lars Ingebrigtsen
0 siblings, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-09-05 14:39 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: ding
[-- Attachment #1: Type: text/plain, Size: 364 bytes --]
On Wed, Sep 05 2012, Lars Ingebrigtsen wrote:
> Jan Tatarik <jan.tatarik@gmail.com> writes:
>>> I don't remember whether I asked you whether you had FSF copyright
>>> assignments on file or not?
>> I do now.
> Thanks; I've now applied your patch to Ma Gnus.
Cool. There's one small issue, however, that started popping up for me
with recent Emacs versions:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: proper function quoting --]
[-- Type: text/x-diff, Size: 573 bytes --]
diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el
index 948c1c3b62dcebb3f71b0b97bb4f99ee5f4b00af..bc35cf3dea5ff791c8d53dd32330e584567ea1ee 100644
--- a/lisp/gnus-score.el
+++ b/lisp/gnus-score.el
@@ -1743,7 +1743,7 @@ score in `gnus-newsgroup-scored' by SCORE."
(save-excursion
(article-goto-body)
(delete-region (point) (point-max))
- (mapc 'my-mm-display-part (mm-text-parts handles))
+ (mapc #'my-mm-display-part (mm-text-parts handles))
handles))))
(defun gnus-score-body (scores header now expire &optional trace)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-09-05 14:39 ` Jan Tatarik
@ 2012-09-05 14:43 ` Lars Ingebrigtsen
2012-09-05 15:07 ` Jan Tatarik
2012-09-05 15:42 ` Andreas Schwab
0 siblings, 2 replies; 20+ messages in thread
From: Lars Ingebrigtsen @ 2012-09-05 14:43 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
> Cool. There's one small issue, however, that started popping up for me
> with recent Emacs versions:
[...]
> - (mapc 'my-mm-display-part (mm-text-parts handles))
> + (mapc #'my-mm-display-part (mm-text-parts handles))
Has #' started to have a meaning in newer Emacsen? It used to ... not
mean very much.
Anyway, I've applied the patch.
--
(domestic pets only, the antidote for overdose, milk.)
http://lars.ingebrigtsen.no * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-09-05 14:43 ` Lars Ingebrigtsen
@ 2012-09-05 15:07 ` Jan Tatarik
2012-09-05 15:35 ` Lars Ingebrigtsen
2012-09-05 15:42 ` Andreas Schwab
1 sibling, 1 reply; 20+ messages in thread
From: Jan Tatarik @ 2012-09-05 15:07 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: ding
On Wed, Sep 05 2012, Lars Ingebrigtsen wrote:
> Jan Tatarik <jan.tatarik@gmail.com> writes:
>> Cool. There's one small issue, however, that started popping up for me
>> with recent Emacs versions:
> [...]
>> - (mapc 'my-mm-display-part (mm-text-parts handles))
>> + (mapc #'my-mm-display-part (mm-text-parts handles))
> Has #' started to have a meaning in newer Emacsen? It used to ... not
> mean very much.
I think it's only problem with temporary functions (labels). Then I get
mapc: Symbol's function definition is void: my-mm-display-part
unless I use #'my-mm-display-part.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-09-05 15:07 ` Jan Tatarik
@ 2012-09-05 15:35 ` Lars Ingebrigtsen
0 siblings, 0 replies; 20+ messages in thread
From: Lars Ingebrigtsen @ 2012-09-05 15:35 UTC (permalink / raw)
To: Jan Tatarik; +Cc: ding
Jan Tatarik <jan.tatarik@gmail.com> writes:
> I think it's only problem with temporary functions (labels). Then I get
>
> mapc: Symbol's function definition is void: my-mm-display-part
>
> unless I use #'my-mm-display-part.
Ah, I see.
--
(domestic pets only, the antidote for overdose, milk.)
http://lars.ingebrigtsen.no * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scoring on basee64 encoded message body
2012-09-05 14:43 ` Lars Ingebrigtsen
2012-09-05 15:07 ` Jan Tatarik
@ 2012-09-05 15:42 ` Andreas Schwab
1 sibling, 0 replies; 20+ messages in thread
From: Andreas Schwab @ 2012-09-05 15:42 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Jan Tatarik, ding
Lars Ingebrigtsen <larsi@gnus.org> writes:
> Has #' started to have a meaning in newer Emacsen?
Only for lambda expressions (it allows them to be byte-compiled).
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-09-05 15:42 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-13 10:39 Scoring on basee64 encoded message body Jan Tatarik
2012-03-14 14:38 ` Lars Magne Ingebrigtsen
2012-03-14 20:21 ` Reiner Steib
2012-03-15 1:29 ` Lars Magne Ingebrigtsen
2012-03-15 21:05 ` Jan Tatarik
2012-03-22 20:38 ` Lars Magne Ingebrigtsen
2012-03-23 12:11 ` Jan Tatarik
2012-04-10 19:32 ` Lars Magne Ingebrigtsen
2012-04-11 7:30 ` Jan Tatarik
2012-04-11 19:34 ` Jan Tatarik
2012-04-12 18:45 ` Lars Magne Ingebrigtsen
2012-04-12 22:58 ` Jan Tatarik
2012-06-10 21:08 ` Lars Magne Ingebrigtsen
2012-06-28 9:45 ` Jan Tatarik
2012-09-05 13:40 ` Lars Ingebrigtsen
2012-09-05 14:39 ` Jan Tatarik
2012-09-05 14:43 ` Lars Ingebrigtsen
2012-09-05 15:07 ` Jan Tatarik
2012-09-05 15:35 ` Lars Ingebrigtsen
2012-09-05 15:42 ` Andreas Schwab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).