Gnus development mailing list
 help / color / mirror / Atom feed
From: Oystein Viggen <oysteivi@tihlde.org>
Subject: Re: spam-stat and base64 encoded messages
Date: Sat, 07 Jun 2003 01:21:19 +0200	[thread overview]
Message-ID: <0365niyeq8.fsf@msgid.viggen.net> (raw)
In-Reply-To: <m3llwfz0ja.fsf@defun.localdomain>

* [Jesper Harder] 

> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> You're right.  Assuming we don't care about the attachments as
>> entities, but only want to inline them in the message as plain text,
>> what Gnus functionality can I use to do this?

As for html, I think we might as well want to inline it in the message
as html code instead of rendering it to plain text.  The bayesian filter
might benefit from recognizing words like "href" as spammy.  (in short,
I'd like some code that identifies and decodes any base64 parts but does
nothing else to the buffer)

> I don't think there's any existing functionality that does exactly
> what we want.  `gnus-display-mime' is the closest, but it does far
> too much.  

I did some let'ing around gnus-display-mime, but wasn't able to get it
to work reliably.  As you say, the function does far too much.

I also did some experimenting with using article-de-base64-unreadable,
which seems the closest to what I wanted.  Spam-stat-test-directory with
a de-base64-hack added would recognize 1643 messages in my 1737 message
spam folder instead of 1622 without the patch.  This was with no
retraining of spam-stat, so any text previously hidden in base64 can be
considered new to spam-stat.  Not much of an improvement, but it's
measurable.  (and still no false positives in my trained ham folders)

After retraining with base64 decoding, the number of recognized spams in
the folder sank to 1636.  Don't really know why..

A small patch:

Index: spam-stat.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/spam-stat.el,v
retrieving revision 6.12
diff -u -r6.12 spam-stat.el
--- spam-stat.el	1 May 2003 14:14:31 -0000	6.12
+++ spam-stat.el	6 Jun 2003 23:28:27 -0000
@@ -229,6 +229,9 @@
       (set-buffer (get-buffer-create spam-stat-buffer-name))
       (erase-buffer)
       (insert str)
+      (ignore-errors 
+	(let ((gnus-original-article-buffer (current-buffer)))
+	  (article-de-base64-unreadable)))
       (setq spam-stat-buffer (current-buffer)))))
 
 (defun spam-stat-store-gnus-article-buffer ()
@@ -509,6 +512,9 @@
 	  (setq count (1+ count))
 	  (message "Reading %s: %.2f%%" dir (/ count max))
 	  (insert-file-contents f)
+	  (ignore-errors 
+	    (let ((gnus-original-article-buffer (current-buffer)))
+	      (article-de-base64-unreadable)))
 	  (funcall func)
 	  (erase-buffer))))))
 
@@ -547,6 +553,9 @@
 	  (message "Reading %.2f%%, score %.2f%%"
 		   (/ count max) (/ score count))
 	  (insert-file-contents f)
+	  (ignore-errors 
+	    (let ((gnus-original-article-buffer (current-buffer)))
+	      (article-de-base64-unreadable)))
 	  (when (> (spam-stat-score-buffer) 0.9)
 	    (setq score (1+ score)))
 	  (erase-buffer))))


Ignore-errors is used to avoid the process choking on malformed base64
and quitting, which would be quite irritating.  There's probably a
better way to fix this -- Comments are very welcome  :)

> You can hack it a bit and wrap some `flet's and `let's around it to
> make it sort of work, but it's not really the right way (at least
> without some more work):
>
> (require 'cl)
>
> (defun my-decode (&optional ihandles)

Haven't looked at your-decode yet.  I'll check it out later when I have
time.  (hopefully, someone who knows gnus and lisp will beat me to it :)

Øystein
-- 
If it ain't broke, don't break it.



  reply	other threads:[~2003-06-06 23:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-06-05  7:01 Oystein Viggen
2003-06-05 20:05 ` Ted Zlatanov
2003-06-06  2:02   ` Jesper Harder
2003-06-06  3:22     ` Ted Zlatanov
2003-06-06 15:30       ` Jesper Harder
2003-06-06 23:21         ` Oystein Viggen [this message]
2003-06-09  1:21           ` Jesper Harder
2003-06-09 20:06             ` Ted Zlatanov
2003-06-11 19:42               ` Jesper Harder
2003-08-02 21:17                 ` Alex Schroeder
2003-08-04  7:36                   ` Adam Sjøgren
2003-08-08  0:02                     ` Alex Schroeder
2003-06-06  1:59 ` Jesper Harder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0365niyeq8.fsf@msgid.viggen.net \
    --to=oysteivi@tihlde.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).