From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 12699 invoked from network); 22 Mar 2023 07:49:34 -0000 Received: from mx1.math.uh.edu (129.7.128.32) by inbox.vuxu.org with ESMTPUTF8; 22 Mar 2023 07:49:34 -0000 Received: from lists1.math.uh.edu ([129.7.128.208]) by mx1.math.uh.edu with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1petDv-000tlZ-Om for ml@inbox.vuxu.org; Wed, 22 Mar 2023 02:49:31 -0500 Received: from lists1.math.uh.edu ([127.0.0.1] helo=lists.math.uh.edu) by lists1.math.uh.edu with smtp (Exim 4.96) (envelope-from ) id 1petDv-001vQm-1E for ml@inbox.vuxu.org; Wed, 22 Mar 2023 02:49:31 -0500 Received: from mx1.math.uh.edu ([129.7.128.32]) by lists1.math.uh.edu with esmtp (Exim 4.96) (envelope-from ) id 1petDp-001vQd-1G for ding@lists.math.uh.edu; Wed, 22 Mar 2023 02:49:25 -0500 Received: from quimby.gnus.org ([95.216.78.240]) by mx1.math.uh.edu with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1petDj-000tkV-HU for ding@lists.math.uh.edu; Wed, 22 Mar 2023 02:49:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:Date:Subject:To:From: Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=BS8GHKgZL+F2V1k0jvzJuPNbnx+//eANIXcTW/o7IcY=; b=ccs9BAT/DA+V0UB82Tdn+6N4Yx tvLq80MRu+Wj/06q73ZFgppJG2/r5vOo2w34yhmrqTxe8SR4G1a6OEE5IGkodenCAgOrj+aCbpQ9f /WlnNe3xnKEjLe2sjzcZEyCGN+HsXVTmNURZC8IRneJT8W3VIp0JnxK7M3f9dwGei+dU=; Received: from cp06.nordicway.dk ([148.251.244.167]) by quimby.gnus.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1petDa-000080-9r for ding@gnus.org; Wed, 22 Mar 2023 08:49:13 +0100 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=defun.dk; s=default; h=Content-Type:MIME-Version:Message-ID:Date:Subject:To:From:Sender :Reply-To:Cc:Content-Transfer-Encoding:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=BS8GHKgZL+F2V1k0jvzJuPNbnx+//eANIXcTW/o7IcY=; b=dm+fzrgci7cUkWwIkwAgypn/TS FbmEBb26ck92W3vfzxX3Y3oOur3i9H4rRQ6kaTyeeqmXgqfGOCPXvtL0d4mv361wq2t3n1FZQN/ai 6Tgl6+79XtrJvzQ2kZafb4uBE8WNd+bHYHkGUD0nuNXqBfIiBWnP7ZSy/Lfg57BUg5e/71rGEUBur uWn8Q5pRx1ahGfW0AYi9ZOlRYk3mOZInRA0IVQNg874D3P5uo9kMoY3W3yDKlMQ/T4k40X18RPGAt 125+hOkRaAVP09+jDO3BsmmrtYv5eqLdj3GCMMMMEI2dKRrWYVoR+AGMEkLss2ryu9vMiy6FJX3qO y0SPLYsQ==; Received: from 5.186.54.212.static.fibianet.dk ([5.186.54.212]:61274 helo=iMac) by cp06.nordicway.dk with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1petDZ-007axF-E5; Wed, 22 Mar 2023 08:49:07 +0100 From: Christian Lynbech To: ding@gnus.org Subject: Handling spam Date: Wed, 22 Mar 2023 08:49:07 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - cp06.nordicway.dk X-AntiAbuse: Original Domain - gnus.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - defun.dk X-Get-Message-Sender-Via: cp06.nordicway.dk: authenticated_id: christian@defun.dk X-Authenticated-Sender: cp06.nordicway.dk: christian@defun.dk X-Source: X-Source-Args: X-Source-Dir: List-ID: Precedence: bulk Does any of you use gnus to handle spam, and if so, how do you do it? I have for quite some time been using the spam-stat library that is bundled with emacs, but it is not working so well for me. Spam-stat uses statistical distribution of words to distinguish and for that it needs to be trained. It has a function that will process a directory of messages and another function that can be added as split rule to filter away the spam messages. I use gnus to download emails into nnml groups so it is quite easy to process the spam folder to seed spam-stat. However, here is a problem, the messages in the nnml directory are in raw form but the split function looks at the formatted message (I think), so for an html formatted mail, the word distribution can be quite different. I have verified that with a recent spam message, scoring the formatted text yields a strong indication of it being non-spam, doing the same on the raw message, gives a strong indication that it is indeed spam. So I am not sure what to do, either I need to teach the split rule to look at the raw message or I need to retrain my spam detection on formatted messages, something I can certainly do but which perhaps is less efficient in distinguishing between spam and non-spam. Certainly, being able to quickly process whole directories is rather convenient. So, any ideas or recommendations? -- ------------------------+----------------------------------------------------- Christian Lynbech | christian #\@ defun #\. dk ------------------------+----------------------------------------------------- Hit the philistines three times over the head with the Elisp reference manual. - petonic@hal.com (Michael A. Petonic)