From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/40048 Path: main.gmane.org!not-for-mail From: Stainless Steel Rat Newsgroups: gmane.emacs.gnus.general Subject: Re: thoughts on spam Date: 05 Nov 2001 20:52:31 -0500 Organization: The Happy Fun Ball Brigade Sender: owner-ding@hpc.uh.edu Message-ID: References: <87y9m9fs6b.fsf@squeaker.lickey.com> <20011102160930.CC3D1BD52@squeaker.lickey.com> <87wv192jzh.fsf_-_@mclinux.com> <861yjgbygz.fsf@duchess.twilley.org> <20011102235444.E9C73BD48@squeaker.lickey.com> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035175660 31184 80.91.224.250 (21 Oct 2002 04:47:40 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 04:47:40 +0000 (UTC) Return-Path: Original-Received: (qmail 3208 invoked from network); 6 Nov 2001 01:53:23 -0000 Original-Received: from malifon.math.uh.edu (mail@129.7.128.13) by mastaler.com with SMTP; 6 Nov 2001 01:53:23 -0000 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 160vQ3-0001nL-00; Mon, 05 Nov 2001 19:52:51 -0600 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Mon, 05 Nov 2001 19:52:32 -0600 (CST) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id TAA15095 for ; Mon, 5 Nov 2001 19:52:18 -0600 (CST) Original-Received: (qmail 3191 invoked by alias); 6 Nov 2001 01:52:32 -0000 Original-Received: (qmail 3186 invoked from network); 6 Nov 2001 01:52:32 -0000 Original-Received: from h0060978d8c91.ne.mediaone.net (HELO peorth.gweep.net) (hdeabt@24.218.202.161) by gnus.org with SMTP; 6 Nov 2001 01:52:32 -0000 Original-Received: (from ratinox@localhost) by peorth.gweep.net (8.11.6/8.11.6) id fA61qVp02659; Mon, 5 Nov 2001 20:52:31 -0500 Original-To: "(ding)" X-Attribution: Rat In-Reply-To: Original-Lines: 50 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:40048 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:40048 * "Jason R. Mastaler" on Mon, 05 Nov 2001 | Yup. That works until the spammers figure this out and start to use | different words and phrases. This is an time-wasting infinite loop. No, it isn't. Watch. Take a well-known spam-friendly but otherwise legitimate mail service like mail.ru. Yes, they really should lock things down, but they do not, and my company has several customers who use mail.ru as their mail service. Sucks, but we're stuck with it. Add spam for "Super Pheromones" which was originated there, or at least appears to be. Here is how it is dealt with heuristically using a scoring system. The spam-friendly ISP gives the message a base score of 1000. The first instance of the word "pheromone" is worth 500 points and each additional occourance is worth 100 points (the message I am using as an example has four occourances). The HTML-only message, no text part, is worth 500 points. There are additional checks that can be applied, but I am keeping this example simple. We decide to set our threshold to 2200. This will cause any message that orginates from mail.ru or another known spam-friendly source (1000), is HMTL only (500), and contains the word "pheromone" three times or more (700) to be marked as spam. No matter how much the spammer changes his form letter, the product he is advertising is going to be mentioned several times, if he doesn't mention his product and what it does then his advertising is going to fail. That is what you scan for. This also works for "You gotta see this", which dubiously advertises unsecured credit cards, sattelite descramblers, long distance telephone theft, water and electric theft, X-Ray envelope spray, internet sleuth, radar jammers, anonymous mail relaying, some kind of test cheat scam, another credit scam, how to pass drug tests, cable TV theft, lie detector fakout, and lockpicking tricks. The actual scores and threshold values are arbitrary, and there are some useful formulae for calculating keyword weights depending on how agressive you wish to be. Now, this message would get a score of absurdly high because of the 14 "you gotta see this" products (for only $19.95!), which is why a better keyword weighting system might be to start small and increase by some scaling factor. So the first instance of "pheromone" would be 50 points, the second 250 points and the third 500 points, for a total of 800 points. I'll let you know what spambouncer says the score of this message is once I get it back from the list :). -- Rat \ Warning: pregnant women, the elderly, and Minion of Nathan - Nathan says Hi! \ children under 10 should avoid prolonged PGP Key: at a key server near you! \ exposure to Happy Fun Ball.