From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/81613 Path: news.gmane.org!not-for-mail From: Jan Tatarik Newsgroups: gmane.emacs.gnus.general Subject: Re: Scoring on basee64 encoded message body Date: Thu, 15 Mar 2012 22:05:01 +0100 Message-ID: <5n5x2rvcm5zibm.fsf@nb-jtatarik2.xing.hh> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: dough.gmane.org 1331845555 1254 80.91.229.3 (15 Mar 2012 21:05:55 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 15 Mar 2012 21:05:55 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M29893@lists.math.uh.edu Thu Mar 15 22:05:54 2012 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1S8Hrp-0004AQ-MD for ding-account@gmane.org; Thu, 15 Mar 2012 22:05:50 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1S8HrG-0004Xa-PA; Thu, 15 Mar 2012 16:05:14 -0500 Original-Received: from mx1.math.uh.edu ([129.7.128.32]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1S8HrF-0004XQ-9x for ding@lists.math.uh.edu; Thu, 15 Mar 2012 16:05:13 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx1.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76) (envelope-from ) id 1S8HrD-0007Sk-O8 for ding@lists.math.uh.edu; Thu, 15 Mar 2012 16:05:12 -0500 Original-Received: from mail-bk0-f44.google.com ([209.85.214.44]) by quimby.gnus.org with esmtp (Exim 4.72) (envelope-from ) id 1S8HrC-0004bC-8A for ding@gnus.org; Thu, 15 Mar 2012 22:05:10 +0100 Original-Received: by bkuw5 with SMTP id w5so3992784bku.17 for ; Thu, 15 Mar 2012 14:05:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:references:date:in-reply-to:message-id:user-agent :mime-version:content-type; bh=0lPuRabNNG15qX2izMExwj+FYUdDVlQyNGd4f4WY1gQ=; b=hxbUGH1oOYIeNSLa4iOk0Z76xUjRZxl56jJlThpHGHZgOduqXbqk2o9Bskt4h+lhy5 bsVTJCmG9INWyjpb0crr2J83qwu47GN5EriTp9KcccfgitwYXu6Dcr95upzxDUeIJVv/ EW6odkQWDbbplifWbpT0jiikgBNWACksTGJDhoestPpBWmKhnjJf3OeCAB+9WlASEy2k cz5nG2p1+xqrZvrUjV4OPNwfkGsIFWCMQQUZVg0ykH8gbcQ9ExqQiH8zsoPYKmnSuBGv JVc3GckO4jY+38zDhAt9LQxC83uL6BnmnZA1UOEfTiRn/mYuaoTAHrUqmCG59rNzSU76 SqCQ== Original-Received: by 10.204.154.133 with SMTP id o5mr28211bkw.100.1331845504690; Thu, 15 Mar 2012 14:05:04 -0700 (PDT) Original-Received: from nb-jtatarik2.xing.hh (e177253132.adsl.alicedsl.de. [85.177.253.132]) by mx.google.com with ESMTPS id d4sm5916846bky.13.2012.03.15.14.05.02 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 15 Mar 2012 14:05:03 -0700 (PDT) In-Reply-To: (Lars Magne Ingebrigtsen's message of "Wed, 14 Mar 2012 15:38:38 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) X-Spam-Score: -3.0 (---) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:81613 Archived-At: --=-=-= Content-Type: text/plain On Wed, Mar 14 2012, Lars Magne Ingebrigtsen Lars Magne Ingebrigtsen wrote: > Jan Tatarik writes: >> I finally realized the content of the messages is base64 encoded, so >> matching on the raw body cannot work. >> The attached patch fixes the problem for me, but I have no idea >> whether it's a generally acceptable solution. I'm only using the body >> match in a low-traffic group, so speed is not an issue for me. > [...] >> + (when (string= (gnus-fetch-field "content-transfer-encoding") "base64") >> + (article-de-base64-unreadable t)) > This isn't a general enough solution here. QP-encoded messages also > want decoding. > But the more general issue is -- should scoring on bodies be done on the > decoded messages or the encoded messages? I think it would make more > sense to do it on decoded messages, and since these are body matches, > speed don't really matter that much, because body matches are s-l-o-w > anyway. This better? --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=decode-message-before-scoring-on-body.diff diff --git a/lisp/gnus-logic.el b/lisp/gnus-logic.el index 954295438c953c2500b9c1959a49e52312cc9653..1b4fc22bc11ee0d599fb40a0adef53588b4f9ca4 100644 --- a/lisp/gnus-logic.el +++ b/lisp/gnus-logic.el @@ -181,8 +181,10 @@ (with-current-buffer nntp-server-buffer (let* ((request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect the + ;; content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) ofunc article) ;; Not all backends support partial fetching. In that case, we @@ -196,6 +198,14 @@ (gnus-message 7 "Scoring article %s..." article) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + ;; Searching base64/qp-encoded message body produces more + ;; satisfactory results if we decode the message first + (unless (or (eq ofunc 'gnus-request-head) + (eq request-func 'gnus-request-head)) + (let ((encoding (gnus-fetch-field "content-transfer-encoding"))) + (cond + ((string= "base64" encoding) (article-de-base64-unreadable t)) + ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t))))) ;; If just parts of the article is to be searched and the ;; backend didn't support partial fetching, we just narrow to ;; the relevant parts. diff --git a/lisp/gnus-score.el b/lisp/gnus-score.el index f86b6f837a70ce54b06668187821fe57c3f80f4c..776194a31c6702441d3bae74c8f6048778270e67 100644 --- a/lisp/gnus-score.el +++ b/lisp/gnus-score.el @@ -1752,8 +1752,10 @@ score in `gnus-newsgroup-scored' by SCORE." (all-scores scores) (request-func (cond ((string= "head" header) 'gnus-request-head) + ;; We need to peek at the headers to detect + ;; the content encoding ((string= "body" header) - 'gnus-request-body) + 'gnus-request-article) (t 'gnus-request-article))) entries alist ofunc article last) (when articles @@ -1773,6 +1775,14 @@ score in `gnus-newsgroup-scored' by SCORE." (widen) (when (funcall request-func article gnus-newsgroup-name) (goto-char (point-min)) + ;; Searching base64/qp-encoded message body produces more + ;; satisfactory results if we decode the message first + (unless (or (eq ofunc 'gnus-request-head) + (eq request-func 'gnus-request-head)) + (let ((encoding (gnus-fetch-field "content-transfer-encoding"))) + (cond + ((string= "base64" encoding) (article-de-base64-unreadable t)) + ((string= "quoted-printable" encoding) (article-de-quoted-unreadable t))))) ;; If just parts of the article is to be searched, but the ;; backend didn't support partial fetching, we just narrow ;; to the relevant parts. --=-=-=--