From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/78513 Path: news.gmane.org!not-for-mail From: David Engster Newsgroups: gmane.emacs.gnus.general Subject: Re: Splitting based on character sets Date: Thu, 14 Apr 2011 10:28:24 +0200 Message-ID: References: <87y63qb0b9.fsf@lifelogs.com> <87vcyjo1pl.fsf@topper.koldfront.dk> <87lizfxu7r.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1302769739 7539 80.91.229.12 (14 Apr 2011 08:28:59 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 14 Apr 2011 08:28:59 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M26816@lists.math.uh.edu Thu Apr 14 10:28:50 2011 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QAHuz-0001c3-PC for ding-account@gmane.org; Thu, 14 Apr 2011 10:28:50 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1QAHum-0004wl-O8; Thu, 14 Apr 2011 03:28:36 -0500 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1QAHuj-0004wZ-Q1 for ding@lists.math.uh.edu; Thu, 14 Apr 2011 03:28:33 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1QAHui-0006eK-Tz for ding@lists.math.uh.edu; Thu, 14 Apr 2011 03:28:33 -0500 Original-Received: from v3-1008.vxen.de ([79.140.41.8]) by quimby.gnus.org with esmtp (Exim 4.72) (envelope-from ) id 1QAHug-0008IU-KU for ding@gnus.org; Thu, 14 Apr 2011 10:28:30 +0200 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=randomsample.de; s=a; h=Content-Type:MIME-Version:Message-ID:Date:References:In-Reply-To:Subject:To:From; bh=pyaIcAd2PWNhQUrU2YE8j2bLMQsoFkqs/oU+F5F2Hks=; b=nRQ/kute/TSgW5SITy4kOYBDOD1Lh9RtdwOue0ekKmmCWyAnOOyzB1z8be9Z8y9hioeOauWYLuKASgXLwys/Xxx9uZZd7I6L5SaAWlJdS+EG8xOaKeBXF1mErL5RrREk; Original-Received: from [134.76.4.230] (helo=imac.local) by v3-1008.vxen.de with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1QAHuf-0001Ov-VV for ding@gnus.org; Thu, 14 Apr 2011 10:28:30 +0200 In-Reply-To: <87lizfxu7r.fsf@lifelogs.com> (Ted Zlatanov's message of "Tue, 12 Apr 2011 12:20:08 -0500") User-Agent: Gnus/5.110016 (No Gnus v0.16) Emacs/24.0.50 (darwin) Mail-Copies-To: never Mail-Followup-To: ding@gnus.org X-Spam-Score: -2.0 (--) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:78513 Archived-At: Ted Zlatanov writes: > On Tue, 12 Apr 2011 18:44:49 +0200 David Engster wrote: > DE> There is a crm114 plugin for spamassassin; it's in the "CoolThings" > DE> section of the crm114 site. It may be that it's well suited for foreign > DE> languages, but I tried it some time ago, and wasn't particularly > DE> impressed, especially regarding the elaborated setup. The thing which > DE> made me drop it was that I got false positives (yes, I read the docs and > DE> trained it correctly). Middle-of-the-road Spamassassin in combination > DE> with the Bayes-plugin, Razor and a few blacklists catches practically > DE> all spam for me, without any false positives. > > I've been happy with CRM114 (since the first Spam Conference :) so I > can't say why it didn't work for you. I like that it only has one way > to classify spam, as opposed to the SA multi-pronged approach. See, that's exactly what I like about SA. :-) I've long tried to find a single, pure black-box-machine-learning spam detection which could rival Spamassassin, main reason being that SA can be such a memory hog on smaller servers. But I always came back. > It was definitely better against foreign languages than SA 5 years ago, > when I tested it. At least for German, I subscribe to a custom rule list which is regularly updated. But most of the German spam is actually catched by iXhash (similar to Razor/Pyzor) and iXRBL, which are also located in Germany. But I guess what Lars originally asked was not to classify mails in Russian, but to just classify them as spam, since he doesn't get Russian ham. For this, one can use the Textcat plugin from SA, which will try to guess the language of the mail and include a X-Language header. -David