Gnus development mailing list
 help / color / mirror / Atom feed
* Handling spam
@ 2023-03-22  7:49 Christian Lynbech
  2023-03-22  8:11 ` Andrew Cohen
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Christian Lynbech @ 2023-03-22  7:49 UTC (permalink / raw)
  To: ding

Does any of you use gnus to handle spam, and if so, how do you do it?

I have for quite some time been using the spam-stat library that is
bundled with emacs, but it is not working so well for me.

Spam-stat uses statistical distribution of words to distinguish and for
that it needs to be trained. It has a function that will process a
directory of messages and another function that can be added as split
rule to filter away the spam messages.

I use gnus to download emails into nnml groups so it is quite easy to
process the spam folder to seed spam-stat. However, here is a problem,
the messages in the nnml directory are in raw form but the split
function looks at the formatted message (I think), so for an html
formatted mail, the word distribution can be quite different. I have
verified that with a recent spam message, scoring the formatted text
yields a strong indication of it being non-spam, doing the same on the
raw message, gives a strong indication that it is indeed spam.

So I am not sure what to do, either I need to teach the split rule to
look at the raw message or I need to retrain my spam detection on
formatted messages, something I can certainly do but which perhaps is
less efficient in distinguishing between spam and non-spam. Certainly,
being able to quickly process whole directories is rather convenient.

So, any ideas or recommendations?


--

------------------------+-----------------------------------------------------
Christian Lynbech       | christian #\@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - petonic@hal.com (Michael A. Petonic)


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-23 19:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22  7:49 Handling spam Christian Lynbech
2023-03-22  8:11 ` Andrew Cohen
2023-03-22 14:37   ` Christian Lynbech
2023-03-22 12:06 ` Byung-Hee HWANG
2023-03-22 14:33   ` Christian Lynbech
2023-03-23 13:31   ` Emanuel Berg
2023-04-23 19:28     ` Christian Lynbech
2023-03-22 17:27 ` Peter Münster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).