From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 12699 invoked from network); 22 Mar 2023 07:49:34 -0000
Received: from mx1.math.uh.edu (129.7.128.32)
  by inbox.vuxu.org with ESMTPUTF8; 22 Mar 2023 07:49:34 -0000
Received: from lists1.math.uh.edu ([129.7.128.208])
	by mx1.math.uh.edu with esmtps  (TLS1.3) tls TLS_AES_256_GCM_SHA384
	(Exim 4.94.2)
	(envelope-from <ding-owner+M39330=ml=inbox.vuxu.org@lists.math.uh.edu>)
	id 1petDv-000tlZ-Om
	for ml@inbox.vuxu.org; Wed, 22 Mar 2023 02:49:31 -0500
Received: from lists1.math.uh.edu ([127.0.0.1] helo=lists.math.uh.edu)
	by lists1.math.uh.edu with smtp (Exim 4.96)
	(envelope-from <ding-owner+M39330=ml=inbox.vuxu.org@lists.math.uh.edu>)
	id 1petDv-001vQm-1E
	for ml@inbox.vuxu.org;
	Wed, 22 Mar 2023 02:49:31 -0500
Received: from mx1.math.uh.edu ([129.7.128.32])
	by lists1.math.uh.edu with esmtp (Exim 4.96)
	(envelope-from <christian@defun.dk>)
	id 1petDp-001vQd-1G
	for ding@lists.math.uh.edu;
	Wed, 22 Mar 2023 02:49:25 -0500
Received: from quimby.gnus.org ([95.216.78.240])
	by mx1.math.uh.edu with esmtps  (TLS1.3) tls TLS_AES_256_GCM_SHA384
	(Exim 4.94.2)
	(envelope-from <christian@defun.dk>)
	id 1petDj-000tkV-HU
	for ding@lists.math.uh.edu; Wed, 22 Mar 2023 02:49:24 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org;
	 s=20200322; h=Content-Type:MIME-Version:Message-ID:Date:Subject:To:From:
	Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
	List-Post:List-Owner:List-Archive;
	bh=BS8GHKgZL+F2V1k0jvzJuPNbnx+//eANIXcTW/o7IcY=; b=ccs9BAT/DA+V0UB82Tdn+6N4Yx
	tvLq80MRu+Wj/06q73ZFgppJG2/r5vOo2w34yhmrqTxe8SR4G1a6OEE5IGkodenCAgOrj+aCbpQ9f
	/WlnNe3xnKEjLe2sjzcZEyCGN+HsXVTmNURZC8IRneJT8W3VIp0JnxK7M3f9dwGei+dU=;
Received: from cp06.nordicway.dk ([148.251.244.167])
	by quimby.gnus.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <christian@defun.dk>)
	id 1petDa-000080-9r
	for ding@gnus.org; Wed, 22 Mar 2023 08:49:13 +0100
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=defun.dk;
	s=default; h=Content-Type:MIME-Version:Message-ID:Date:Subject:To:From:Sender
	:Reply-To:Cc:Content-Transfer-Encoding:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:
	List-Post:List-Owner:List-Archive;
	bh=BS8GHKgZL+F2V1k0jvzJuPNbnx+//eANIXcTW/o7IcY=; b=dm+fzrgci7cUkWwIkwAgypn/TS
	FbmEBb26ck92W3vfzxX3Y3oOur3i9H4rRQ6kaTyeeqmXgqfGOCPXvtL0d4mv361wq2t3n1FZQN/ai
	6Tgl6+79XtrJvzQ2kZafb4uBE8WNd+bHYHkGUD0nuNXqBfIiBWnP7ZSy/Lfg57BUg5e/71rGEUBur
	uWn8Q5pRx1ahGfW0AYi9ZOlRYk3mOZInRA0IVQNg874D3P5uo9kMoY3W3yDKlMQ/T4k40X18RPGAt
	125+hOkRaAVP09+jDO3BsmmrtYv5eqLdj3GCMMMMEI2dKRrWYVoR+AGMEkLss2ryu9vMiy6FJX3qO
	y0SPLYsQ==;
Received: from 5.186.54.212.static.fibianet.dk ([5.186.54.212]:61274 helo=iMac)
	by cp06.nordicway.dk with esmtpsa  (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <christian@defun.dk>)
	id 1petDZ-007axF-E5;
	Wed, 22 Mar 2023 08:49:07 +0100
From: Christian Lynbech <christian@defun.dk>
To: ding@gnus.org
Subject: Handling spam
Date: Wed, 22 Mar 2023 08:49:07 +0100
Message-ID: <m27cv9ql8s.fsf@defun.dk>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - cp06.nordicway.dk
X-AntiAbuse: Original Domain - gnus.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - defun.dk
X-Get-Message-Sender-Via: cp06.nordicway.dk: authenticated_id: christian@defun.dk
X-Authenticated-Sender: cp06.nordicway.dk: christian@defun.dk
X-Source: 
X-Source-Args: 
X-Source-Dir: 
List-ID: <ding.gnus.org>
Precedence: bulk

Does any of you use gnus to handle spam, and if so, how do you do it?

I have for quite some time been using the spam-stat library that is
bundled with emacs, but it is not working so well for me.

Spam-stat uses statistical distribution of words to distinguish and for
that it needs to be trained. It has a function that will process a
directory of messages and another function that can be added as split
rule to filter away the spam messages.

I use gnus to download emails into nnml groups so it is quite easy to
process the spam folder to seed spam-stat. However, here is a problem,
the messages in the nnml directory are in raw form but the split
function looks at the formatted message (I think), so for an html
formatted mail, the word distribution can be quite different. I have
verified that with a recent spam message, scoring the formatted text
yields a strong indication of it being non-spam, doing the same on the
raw message, gives a strong indication that it is indeed spam.

So I am not sure what to do, either I need to teach the split rule to
look at the raw message or I need to retrain my spam detection on
formatted messages, something I can certainly do but which perhaps is
less efficient in distinguishing between spam and non-spam. Certainly,
being able to quickly process whole directories is rather convenient.

So, any ideas or recommendations?


--

------------------------+-----------------------------------------------------
Christian Lynbech       | christian #\@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - petonic@hal.com (Michael A. Petonic)