Gnus development mailing list
 help / color / mirror / Atom feed
From: Christian Lynbech <christian@defun.dk>
To: ding@gnus.org
Subject: Handling spam
Date: Wed, 22 Mar 2023 08:49:07 +0100	[thread overview]
Message-ID: <m27cv9ql8s.fsf@defun.dk> (raw)

Does any of you use gnus to handle spam, and if so, how do you do it?

I have for quite some time been using the spam-stat library that is
bundled with emacs, but it is not working so well for me.

Spam-stat uses statistical distribution of words to distinguish and for
that it needs to be trained. It has a function that will process a
directory of messages and another function that can be added as split
rule to filter away the spam messages.

I use gnus to download emails into nnml groups so it is quite easy to
process the spam folder to seed spam-stat. However, here is a problem,
the messages in the nnml directory are in raw form but the split
function looks at the formatted message (I think), so for an html
formatted mail, the word distribution can be quite different. I have
verified that with a recent spam message, scoring the formatted text
yields a strong indication of it being non-spam, doing the same on the
raw message, gives a strong indication that it is indeed spam.

So I am not sure what to do, either I need to teach the split rule to
look at the raw message or I need to retrain my spam detection on
formatted messages, something I can certainly do but which perhaps is
less efficient in distinguishing between spam and non-spam. Certainly,
being able to quickly process whole directories is rather convenient.

So, any ideas or recommendations?


--

------------------------+-----------------------------------------------------
Christian Lynbech       | christian #\@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - petonic@hal.com (Michael A. Petonic)


             reply	other threads:[~2023-03-22  7:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-22  7:49 Christian Lynbech [this message]
2023-03-22  8:11 ` Andrew Cohen
2023-03-22 14:37   ` Christian Lynbech
2023-03-22 12:06 ` Byung-Hee HWANG
2023-03-22 14:33   ` Christian Lynbech
2023-03-23 13:31   ` Emanuel Berg
2023-04-23 19:28     ` Christian Lynbech
2023-03-22 17:27 ` Peter Münster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m27cv9ql8s.fsf@defun.dk \
    --to=christian@defun.dk \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).