Gnus development mailing list
 help / color / mirror / Atom feed
From: Russ Allbery <rra@stanford.edu>
Subject: Re: Spam spam spam spam spam
Date: Sun, 31 Mar 2002 10:19:44 -0800	[thread overview]
Message-ID: <yleli0ob0f.fsf@windlord.stanford.edu> (raw)
In-Reply-To: <m3663c4v75.fsf@quimbies.gnus.org> (Lars Magne Ingebrigtsen's message of "Sun, 31 Mar 2002 17:23:58 +0200")

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Now, about the white-list thing -- there should also be a black-list
> thing as well, I guess.  "Classify all mail from this From address as
> spam."  `C-u M-y', perhaps.

> What should the format of the black-and-white-lists be?  It could
> just be a one-address-per-line thing, but perhaps it would be nice to
> allow regexps?  Or perhaps a GLOB thing would be better?

> larsi@gnus.org
> larsi@gnus\.org
> .*@gnus\.org
> *@gnus.org

> I think it's easier for people to edit GLOBs than to edit the
> regexps, and I don't think people really need regexps here...

It would be great to somehow plug in the same logic that scoring has so
that we can use all of the same options that we can with score file
entries.  If you think glob would be particularly useful, it could be
added as a scoring type too.  :)

Incidentally, while this isn't a good heuristic for everyone, for those of
us who don't speak any Asian language and don't live in Asia, the
following is remarkably good.  All by itself, it catches something like
90% of all of my spam.  (Asian language spam has increased drastically in
the past year or so.)

 '(nnmail-split-abbrev-alist

[...]

         (cons 'content-spam
               (concat "big5\\|gb2312\\|ks_c_.*\\|shift_jis"
                       "\\|default_charset"))
         (cons 'subject-spam
               (concat ".*=\\?\\(big5\\|gb2312\\|ks_c_\\|shift_jis"
                       "\\|euc[-_]kr\\).*"
                       "\\|.*[¹²°¶÷¾].*"))

I then check content-spam against the Content-Type header and subject-spam
against the Subject header.  The last bit of subject-spam catches a lot of
spam with unencoded Asian languages in the subject header by looking for
characters that are fairly unlikely to be in the Subject for ISO 8859-1 or
-15 languages, but again isn't going to be applicable to everyone.

Not sure if something like this would be worth offering as an option or
example.  It's wholly inappropriate for people who correspond in Asian
languages, or in any language that uses those code points which may
include any UTF-8 encoding, but for those of us who only correspond in
English or western European languages it's staggeringly effective.

Obviously, this is the sort of check that you want to put after you
already split out your mailing list traffic, so as to not get false
positives on mail to public mailing lists from people who just want to
spell their name correctly.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>



  parent reply	other threads:[~2002-03-31 18:19 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-03-30 15:33 Lars Magne Ingebrigtsen
2002-03-30 15:59 ` Lars Magne Ingebrigtsen
2002-04-01 14:44   ` NAGY Andras
2002-04-05 20:01     ` Ted Zlatanov
2002-04-05 21:41       ` Kai Großjohann
2002-04-05 22:27         ` Derrell.Lipman
2002-04-09 17:44           ` Toby Speight
2002-04-05 21:42       ` Jon Ericson
2002-04-05 22:59         ` Ted Zlatanov
2002-04-02 16:31   ` Ted Zlatanov
2002-04-02 19:52     ` Lars Magne Ingebrigtsen
2002-04-02 22:06       ` Ted Zlatanov
2002-04-05 19:33         ` Ted Zlatanov
2002-03-30 16:09 ` Robin S. Socha
2002-03-30 16:32   ` Lars Magne Ingebrigtsen
2002-03-30 16:56     ` Lloyd Zusman
2002-03-30 17:05       ` Lars Magne Ingebrigtsen
2002-03-30 18:27         ` Lloyd Zusman
2002-04-01 14:46       ` NAGY Andras
2002-04-01 15:11         ` Lloyd Zusman
2002-03-30 17:30     ` Robin S. Socha
2002-03-30 17:34       ` Lars Magne Ingebrigtsen
2002-03-30 18:27         ` Robin S. Socha
2002-03-31 22:00     ` Stainless Steel Rat
2002-03-30 16:33   ` Lars Magne Ingebrigtsen
2002-03-30 16:43   ` Henrik Enberg
2002-03-30 16:53     ` Robin S. Socha
2002-03-30 17:35       ` Henrik Enberg
2002-03-30 17:58         ` Robin S. Socha
2002-03-30 18:29           ` Lars Magne Ingebrigtsen
2002-03-30 18:52             ` Harry Putnam
2002-03-30 19:37         ` Jason R. Mastaler
2002-03-30 23:46           ` Lars Magne Ingebrigtsen
2002-03-31  0:38             ` Jason R. Mastaler
2002-04-01 14:07               ` Lloyd Zusman
2002-04-04  3:28                 ` news
2002-03-31  2:07             ` Mark Milhollan
2003-01-01 21:06               ` Lars Magne Ingebrigtsen
2003-01-02 19:02               ` Simon Josefsson
2002-03-30 16:34 ` Henrik Enberg
2002-03-30 16:45   ` Lars Magne Ingebrigtsen
2002-03-30 16:52     ` Lars Magne Ingebrigtsen
2002-03-30 17:45     ` Kai Großjohann
2002-03-30 18:29       ` Lars Magne Ingebrigtsen
2002-03-30 19:28       ` Lars Magne Ingebrigtsen
2002-03-31  1:39         ` Paul Jarc
2002-03-31  1:45           ` Lars Magne Ingebrigtsen
2002-03-31  1:48             ` Paul Jarc
2002-03-31  1:57               ` Lars Magne Ingebrigtsen
2002-03-31 15:23                 ` Lars Magne Ingebrigtsen
2002-03-31 16:20                   ` Romain FRANCOISE
2002-03-31 18:19                   ` Russ Allbery [this message]
2002-04-02  7:09                     ` Michel Schinz
2002-04-03  5:10                       ` Russ Allbery
2002-04-03 13:50                         ` Frank Schmitt
2002-03-31  1:31 ` Daniel Pittman
2003-01-01 21:05   ` Lars Magne Ingebrigtsen
2002-03-31 15:34 ` Fabien Penso
2002-03-31 15:50   ` Lars Magne Ingebrigtsen
2002-03-31 16:06     ` Fabien Penso
2002-03-31 18:11     ` Russ Allbery
2002-03-31 18:31       ` Lars Magne Ingebrigtsen
2002-04-01 17:22       ` Paul Jarc
2002-04-01 19:25         ` Lars Magne Ingebrigtsen
2002-04-01 19:34           ` Paul Jarc
2002-04-01 18:22 ` Chris Shenton
2002-04-13 22:49 ` John H Palmieri
2002-04-13 23:00   ` Nevin Kapur
2002-04-14  0:04   ` Stainless Steel Rat
2002-04-14  0:57   ` Bill White
2002-04-21  3:38   ` Harry Putnam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yleli0ob0f.fsf@windlord.stanford.edu \
    --to=rra@stanford.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).