Gnus development mailing list
 help / color / mirror / Atom feed
* Handling spam
@ 2023-03-22  7:49 Christian Lynbech
  2023-03-22  8:11 ` Andrew Cohen
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Christian Lynbech @ 2023-03-22  7:49 UTC (permalink / raw)
  To: ding

Does any of you use gnus to handle spam, and if so, how do you do it?

I have for quite some time been using the spam-stat library that is
bundled with emacs, but it is not working so well for me.

Spam-stat uses statistical distribution of words to distinguish and for
that it needs to be trained. It has a function that will process a
directory of messages and another function that can be added as split
rule to filter away the spam messages.

I use gnus to download emails into nnml groups so it is quite easy to
process the spam folder to seed spam-stat. However, here is a problem,
the messages in the nnml directory are in raw form but the split
function looks at the formatted message (I think), so for an html
formatted mail, the word distribution can be quite different. I have
verified that with a recent spam message, scoring the formatted text
yields a strong indication of it being non-spam, doing the same on the
raw message, gives a strong indication that it is indeed spam.

So I am not sure what to do, either I need to teach the split rule to
look at the raw message or I need to retrain my spam detection on
formatted messages, something I can certainly do but which perhaps is
less efficient in distinguishing between spam and non-spam. Certainly,
being able to quickly process whole directories is rather convenient.

So, any ideas or recommendations?


--

------------------------+-----------------------------------------------------
Christian Lynbech       | christian #\@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - petonic@hal.com (Michael A. Petonic)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Handling spam
  2023-03-22  7:49 Handling spam Christian Lynbech
@ 2023-03-22  8:11 ` Andrew Cohen
  2023-03-22 14:37   ` Christian Lynbech
  2023-03-22 12:06 ` Byung-Hee HWANG
  2023-03-22 17:27 ` Peter Münster
  2 siblings, 1 reply; 8+ messages in thread
From: Andrew Cohen @ 2023-03-22  8:11 UTC (permalink / raw)
  To: ding

>>>>> "CL" == Christian Lynbech <christian@defun.dk> writes:

    CL> Does any of you use gnus to handle spam, and if so, how do you
    CL> do it?  I have for quite some time been using the spam-stat
    CL> library that is bundled with emacs, but it is not working so
    CL> well for me.

[...]

    CL> So I am not sure what to do, either I need to teach the split
    CL> rule to look at the raw message or I need to retrain my spam
    CL> detection on formatted messages, something I can certainly do
    CL> but which perhaps is less efficient in distinguishing between
    CL> spam and non-spam. Certainly, being able to quickly process
    CL> whole directories is rather convenient.

You need to look at 'spam-stat-washing-hook:
  "Hook applied to each message before analysis."

With this you can manipulate the article before the spam analysis. A
common choice is

(require 'spam-wash)
(add-hook 'spam-stat-washing-hook 'spam-wash)

which will decode MIME encodings before doing the spam analysis.  I
don't recall if it deals with html email but you should be able to
modify the function 'spam-wash easily enough to do what you want.

Best,
Andy
-- 
Andrew Cohen



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Handling spam
  2023-03-22  7:49 Handling spam Christian Lynbech
  2023-03-22  8:11 ` Andrew Cohen
@ 2023-03-22 12:06 ` Byung-Hee HWANG
  2023-03-22 14:33   ` Christian Lynbech
  2023-03-23 13:31   ` Emanuel Berg
  2023-03-22 17:27 ` Peter Münster
  2 siblings, 2 replies; 8+ messages in thread
From: Byung-Hee HWANG @ 2023-03-22 12:06 UTC (permalink / raw)
  To: The Gnus

Christian Lynbech <christian@defun.dk> writes:

> Does any of you use gnus to handle spam, and if so, how do you do it?

I use IMAP, so Gmail does filter spam mails.
Sorry for easy way...

> [...sorry for long line snip...]

Sincerely,

-- 
^고맙습니다 _地平天成_ 감사합니다_^))//


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Handling spam
  2023-03-22 12:06 ` Byung-Hee HWANG
@ 2023-03-22 14:33   ` Christian Lynbech
  2023-03-23 13:31   ` Emanuel Berg
  1 sibling, 0 replies; 8+ messages in thread
From: Christian Lynbech @ 2023-03-22 14:33 UTC (permalink / raw)
  To: The Gnus


It does not really have anything to do with IMAP, it is rather that your
mail provider (in this case Google) does the spam filtering and is good
enough at it. I use IMAP too, even if I (for this particular account)
only use it to download the messages and then using the NNML backend for
the reading.

I might be able to enable something on my provider (which is not Google)
and will also look into that.

                               /Christian


-----------------------
On Wed, Mar 22 2023, Byung-Hee HWANG wrote:

Christian Lynbech <christian@defun.dk> writes:

> Does any of you use gnus to handle spam, and if so, how do you do it?

I use IMAP, so Gmail does filter spam mails.
Sorry for easy way...

> [...sorry for long line snip...]

Sincerely,

-- 
^고맙습니다 _地平天成_ 감사합니다_^))//


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Handling spam
  2023-03-22  8:11 ` Andrew Cohen
@ 2023-03-22 14:37   ` Christian Lynbech
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Lynbech @ 2023-03-22 14:37 UTC (permalink / raw)
  To: Andrew Cohen; +Cc: ding

Thanks for the hint, I guess this is then shifting the analysis to work
on the formatted message rather than the raw.

It will actually not be hard to change my workflow to work on the
formatted message (all uncaught spam is manually moved to a separate
folder which I then process with a command I have written myself, this
command takes care to go to the raw message but I can just not do that).

                               /Christian

-----------------------
On Wed, Mar 22 2023, Andrew Cohen wrote:

>>>>> "CL" == Christian Lynbech <christian@defun.dk> writes:

    CL> Does any of you use gnus to handle spam, and if so, how do you
    CL> do it?  I have for quite some time been using the spam-stat
    CL> library that is bundled with emacs, but it is not working so
    CL> well for me.

[...]

    CL> So I am not sure what to do, either I need to teach the split
    CL> rule to look at the raw message or I need to retrain my spam
    CL> detection on formatted messages, something I can certainly do
    CL> but which perhaps is less efficient in distinguishing between
    CL> spam and non-spam. Certainly, being able to quickly process
    CL> whole directories is rather convenient.

You need to look at 'spam-stat-washing-hook:
  "Hook applied to each message before analysis."

With this you can manipulate the article before the spam analysis. A
common choice is

(require 'spam-wash)
(add-hook 'spam-stat-washing-hook 'spam-wash)

which will decode MIME encodings before doing the spam analysis.  I
don't recall if it deals with html email but you should be able to
modify the function 'spam-wash easily enough to do what you want.

Best,
Andy
-- 
Andrew Cohen


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Handling spam
  2023-03-22  7:49 Handling spam Christian Lynbech
  2023-03-22  8:11 ` Andrew Cohen
  2023-03-22 12:06 ` Byung-Hee HWANG
@ 2023-03-22 17:27 ` Peter Münster
  2 siblings, 0 replies; 8+ messages in thread
From: Peter Münster @ 2023-03-22 17:27 UTC (permalink / raw)
  To: ding

On Wed, Mar 22 2023, Christian Lynbech wrote:

> So, any ideas or recommendations?

A bit OT, but perhaps a valuable advice: Whatever you do, IMHO the
sender should be informed, when his message won't be read because of the
filtering. Here is my way doing that:

fetchmail (IMAP) -> procmail -> Gnus (nnml)

And in ~/.procmailrc I have something like this:

--8<---------------cut here---------------start------------->8---
:0fw
* < $SPAM_MAX_SIZE
| spamassassin

:0
* 1^1    ^X-Spam-Flag: YES
* -10^0  ^Subject:.*no-spam
* -3^0   ^Content-Type: .*signed
* -3^0   ^To:.*special@address-1
* -3^0   ^From: Special <special@address-2>
* -1^0   ^List-Id:
{
	:0 ch
	| (formail -r -t -A"From: $NOREPLY" \
	-A"Content-Type: text/plain; charset=utf-8"; \
	cat $SPAMMSG $SIG) | $SENDMAIL -r $NOREPLY -t

	:0:
	spam.spool
}
--8<---------------cut here---------------end--------------->8---

And in $SPAMMSG:

--8<---------------cut here---------------start------------->8---
Your message has been rejected and won't be read, because it appears
to be spam. If this is not the case, please prepend `no-spam' to the
beginning of the subject field and send again.
--8<---------------cut here---------------end--------------->8---

Cheers,
-- 
           Peter



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Handling spam
  2023-03-22 12:06 ` Byung-Hee HWANG
  2023-03-22 14:33   ` Christian Lynbech
@ 2023-03-23 13:31   ` Emanuel Berg
  2023-04-23 19:28     ` Christian Lynbech
  1 sibling, 1 reply; 8+ messages in thread
From: Emanuel Berg @ 2023-03-23 13:31 UTC (permalink / raw)
  To: ding

Byung-Hee HWANG wrote:

>> Does any of you use gnus to handle spam, and if so, how do
>> you do it?
>
> I use IMAP, so Gmail does filter spam mails. Sorry for easy
> way...

I never ever get any spam - almost, maybe once a year? - don't
know why but I decided to stop blaming myself for it ...

And I even have my mail in plaintext (not just in mailtos) on
my web pile.

But when it happens I downscore the sender permanently, no
idea if that's a good way to do it ...

-- 
underground experts united
https://dataswamp.org/~incal



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Handling spam
  2023-03-23 13:31   ` Emanuel Berg
@ 2023-04-23 19:28     ` Christian Lynbech
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Lynbech @ 2023-04-23 19:28 UTC (permalink / raw)
  To: ding


I do get relatively small amounts of spam, a couple of messages a week,
just downscoring might be enough. Certainly something to think about.

Thanks.

                               /Christian


-----------------------
On Thu, Mar 23 2023, Emanuel Berg wrote:

Byung-Hee HWANG wrote:

>> Does any of you use gnus to handle spam, and if so, how do
>> you do it?
>
> I use IMAP, so Gmail does filter spam mails. Sorry for easy
> way...

I never ever get any spam - almost, maybe once a year? - don't
know why but I decided to stop blaming myself for it ...

And I even have my mail in plaintext (not just in mailtos) on
my web pile.

But when it happens I downscore the sender permanently, no
idea if that's a good way to do it ...

-- 
underground experts united
https://dataswamp.org/~incal


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-23 19:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22  7:49 Handling spam Christian Lynbech
2023-03-22  8:11 ` Andrew Cohen
2023-03-22 14:37   ` Christian Lynbech
2023-03-22 12:06 ` Byung-Hee HWANG
2023-03-22 14:33   ` Christian Lynbech
2023-03-23 13:31   ` Emanuel Berg
2023-04-23 19:28     ` Christian Lynbech
2023-03-22 17:27 ` Peter Münster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).