Gnus development mailing list
 help / color / mirror / Atom feed
From: David Engster <deng@randomsample.de>
To: ding@gnus.org
Subject: Re: Splitting based on character sets
Date: Thu, 14 Apr 2011 10:28:24 +0200	[thread overview]
Message-ID: <m2zkntgrtj.fsf@randomsample.de> (raw)
In-Reply-To: <87lizfxu7r.fsf@lifelogs.com> (Ted Zlatanov's message of "Tue, 12 Apr 2011 12:20:08 -0500")

Ted Zlatanov writes:
> On Tue, 12 Apr 2011 18:44:49 +0200 David Engster <deng@randomsample.de> wrote: 
> DE> There is a crm114 plugin for spamassassin; it's in the "CoolThings"
> DE> section of the crm114 site. It may be that it's well suited for foreign
> DE> languages, but I tried it some time ago, and wasn't particularly
> DE> impressed, especially regarding the elaborated setup. The thing which
> DE> made me drop it was that I got false positives (yes, I read the docs and
> DE> trained it correctly). Middle-of-the-road Spamassassin in combination
> DE> with the Bayes-plugin, Razor and a few blacklists catches practically
> DE> all spam for me, without any false positives.
>
> I've been happy with CRM114 (since the first Spam Conference :) so I
> can't say why it didn't work for you.  I like that it only has one way
> to classify spam, as opposed to the SA multi-pronged approach.

See, that's exactly what I like about SA. :-)

I've long tried to find a single, pure black-box-machine-learning
spam detection which could rival Spamassassin, main reason being that SA
can be such a memory hog on smaller servers. But I always came back.

> It was definitely better against foreign languages than SA 5 years ago,
> when I tested it.

At least for German, I subscribe to a custom rule list which is
regularly updated. But most of the German spam is actually catched by
iXhash (similar to Razor/Pyzor) and iXRBL, which are also located in
Germany.

But I guess what Lars originally asked was not to classify mails in
Russian, but to just classify them as spam, since he doesn't get Russian
ham. For this, one can use the Textcat plugin from SA, which will try to
guess the language of the mail and include a X-Language header.

-David



  reply	other threads:[~2011-04-14  8:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-03 18:03 Lars Magne Ingebrigtsen
2011-04-04 13:42 ` Ted Zlatanov
2011-04-12 16:26   ` Lars Magne Ingebrigtsen
2011-04-12 16:44     ` David Engster
2011-04-12 16:48     ` Adam Sjøgren
2011-04-12 17:20       ` Ted Zlatanov
2011-04-14  8:28         ` David Engster [this message]
2011-04-14 14:54           ` Ted Zlatanov
2011-05-01 16:48           ` Lars Magne Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2zkntgrtj.fsf@randomsample.de \
    --to=deng@randomsample.de \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).