caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] posting policy and spam
@ 2004-01-03  9:24 Xavier Leroy
  2004-01-03 11:37 ` Claudio Sacerdoti Coen
  2004-01-03 23:28 ` Sven Luther
  0 siblings, 2 replies; 9+ messages in thread
From: Xavier Leroy @ 2004-01-03  9:24 UTC (permalink / raw)
  To: caml-list

There have been several complains recently about spam getting through
the caml-list.

For your information, the list is filtered through SpamOracle, and the
posting address receives several hundred spams a day.  Due to spammers
getting more clever, the efficiency of the filtering went from perfect
to about 99%.  That's enough to let significant amounts of spam slip
through.

By popular demand, I just put the list in "subscribers only" posting
mode.  That should get rid of almost all spam.  (Keep in mind,
however, that a forged From address on a spam can match that of a list
subscriber; it happened in the past.)

The problem with the "subscribers only" policy is that many
subscribers receive the list at one address and post through another.
This will no longer work.  In the past, posts coming from
non-subscribers were run through a moderator.  Moderation is a boring
and time-consuming task for which we no longer have volunteers.

Consequently, all posts not coming from a subscribed address are now
summarily and irremediably discarded.  If your posts don't appear on
the list, that's why.  You need either to subscribe to the list with
the same address that you use for posting (see below for hints), or
adjust the From address in your posts so that it matches the
subscribed address.

Yes, this is a bit drastic, but there is no middle ground between
accepting too many posts (including spams) and rejecting too many
(including legitimate posters).  Pick your poison.

To finish, a few hints on using the caml-list manager:

Q1: How do I know which address I'm subscribed from?

A: Look at the "Received" headers in any message sent from caml-list.
If you're lucky, there will be a line 
  "Received from concorde.inria.fr ... for foo@bar.com"
or
  "Received from nez-perce.inria.fr ... for foo@bar.com"

In this case, your subscription address is "foo@bar.com".

Q2: How to change my subscription address?

A: Send a message to caml-list-request@inria.fr containing the
following two lines:

        unsubscribe caml-list old-address@foo.com
        subscribe caml-list new-address@bar.com

Note that this message can be sent from any mail address, not just 
old-address@foo.com or new-address@bar.com.  You'll receive a message
at new-address asking you to confirm your subscription.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
  2004-01-03  9:24 [Caml-list] posting policy and spam Xavier Leroy
@ 2004-01-03 11:37 ` Claudio Sacerdoti Coen
  2004-01-03 13:34   ` Xavier Leroy
  2004-01-03 23:28 ` Sven Luther
  1 sibling, 1 reply; 9+ messages in thread
From: Claudio Sacerdoti Coen @ 2004-01-03 11:37 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

 Dear Xavier,

 are you sure of your diagnosis?
 I am asking since the spam messages I have received from caml-list
 were marked correctly by spam-oracle as spam:

<<<<
X-Original-To: sacerdot@cs.unibo.it
X-SPAM-Warning: Sending machine is listed in blackholes.five-ten-sg.com
From: Michael Bienenfeld <patrick.trekels1@pi.be>
To: ytolun@isnet.net.tr
Subject: [Caml-list] Don't Miss This Opportunity. iieo
X-Spam: yes; 1.00; iieo:99 biz:99 biz:99 gif:99 $$$:99 monthly:99 images:98
+htm:97 htm:97 click:97 patrick:04 www:91 www:91 michael:08 opportunity:90
X-Spam: unknown; 0.28; ocaml:01 biz:98 htm:98 biz:98 htm:98 gif:98 $$$:98
+beginners:01 bin:01 archives:01 inria:01 inria:01 opportunity:98 bug:02
caml:02
>>>>

 The first X-Spam was added by INRIA, the second one by my copy of
 spamoracle. (I know for sure since I have removed the two header lines
 and I have applied spamoracle -mark by hand, obtaining exactly the
 second line.

 IMHO, the conclusion is that spammers have not become smarter.
 Simply there is something wrong with the INRIA procmail and the X-Spam:yes
 messages are no longer filtered out.

						Regards,
						C.S.C.

-- 
----------------------------------------------------------------
Real name: Claudio Sacerdoti Coen
PhD Student in Computer Science at University of Bologna
E-mail: sacerdot@cs.unibo.it
http://www.cs.unibo.it/~sacerdot
----------------------------------------------------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
  2004-01-03 11:37 ` Claudio Sacerdoti Coen
@ 2004-01-03 13:34   ` Xavier Leroy
  0 siblings, 0 replies; 9+ messages in thread
From: Xavier Leroy @ 2004-01-03 13:34 UTC (permalink / raw)
  To: Claudio Sacerdoti Coen; +Cc: caml-list

>  are you sure of your diagnosis?

You mean, my lack of diagnosis? :-)

>  I am asking since the spam messages I have received from caml-list
>  were marked correctly by spam-oracle as spam:
>  IMHO, the conclusion is that spammers have not become smarter.
>  Simply there is something wrong with the INRIA procmail and the X-Spam:yes
>  messages are no longer filtered out.

You're probably right, and I think I found the problem.  It's a year
2004 problem, or more exactly a New Year's problem: the mailbox where
spam is saved is suffixed by the year, and for some reason procmail
couldn't create the "2004" mailbox on Jan 1st.  

I think I've fixed the problem, and I went back to the previous
posting policy: everyone can post, and SpamOracle filters the spam.
Let us wait for a couple of days and tell me if the spam problem
persists.  (I didn't notice it myself because of my own filtering on
my mail.)

Still, spam filtering is getting increasingly harder.  The latest
fashion among spammers is to pad messages with lots of randomly-chosen
words, which gives Bayesian spam filters a really hard time...

For this reason, it might be necessary in the future to go back to a
"subscribers only" posting policy.

Thanks for the quick feedback.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
  2004-01-03  9:24 [Caml-list] posting policy and spam Xavier Leroy
  2004-01-03 11:37 ` Claudio Sacerdoti Coen
@ 2004-01-03 23:28 ` Sven Luther
  2004-01-04 10:23   ` Richard Zidlicky
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Sven Luther @ 2004-01-03 23:28 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

On Sat, Jan 03, 2004 at 10:24:49AM +0100, Xavier Leroy wrote:
> There have been several complains recently about spam getting through
> the caml-list.
> 
> For your information, the list is filtered through SpamOracle, and the
> posting address receives several hundred spams a day.  Due to spammers
> getting more clever, the efficiency of the filtering went from perfect
> to about 99%.  That's enough to let significant amounts of spam slip
> through.

Well, on a similar subject, is there any chance of implementing a
workaround in spamoracle to counter those spams specifically designed to
fool the bayesian filters ? You know, those who have 4 lines of random
words in a text attachement, and then some html spam. 

I don't know if the bayesian filters or a modification thereof is able
to counter this kind of email, but i don't think so.

> By popular demand, I just put the list in "subscribers only" posting
> mode.  That should get rid of almost all spam.  (Keep in mind,
> however, that a forged From address on a spam can match that of a list
> subscriber; it happened in the past.)
> 
> The problem with the "subscribers only" policy is that many
> subscribers receive the list at one address and post through another.
> This will no longer work.  In the past, posts coming from
> non-subscribers were run through a moderator.  Moderation is a boring
> and time-consuming task for which we no longer have volunteers.

What about adding a way to whitelist a given email address, so you could
post from a list of email addresses you have previously declared,
without necessarily being subscribed to all of those.

Another idea would be to provide all subscribed persons some hash value
they can add to a specific header, which is then removed by the list
software once it is checked to be a valid one. Could easily be automated
on both sides, and would allow to post from everywhere.

Friendly,

Sven Luther

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
  2004-01-03 23:28 ` Sven Luther
@ 2004-01-04 10:23   ` Richard Zidlicky
  2004-01-04 16:22     ` Scott Alexander
  2004-01-04 13:40   ` Vitaly Lugovsky
  2004-01-04 15:43   ` Damien Doligez
  2 siblings, 1 reply; 9+ messages in thread
From: Richard Zidlicky @ 2004-01-04 10:23 UTC (permalink / raw)
  To: Sven Luther; +Cc: Xavier Leroy, caml-list

On Sun, Jan 04, 2004 at 12:28:37AM +0100, Sven Luther wrote:
> On Sat, Jan 03, 2004 at 10:24:49AM +0100, Xavier Leroy wrote:
> > There have been several complains recently about spam getting through
> > the caml-list.
> > 
> > For your information, the list is filtered through SpamOracle, and the
> > posting address receives several hundred spams a day.  Due to spammers
> > getting more clever, the efficiency of the filtering went from perfect
> > to about 99%.  That's enough to let significant amounts of spam slip
> > through.
> 
> Well, on a similar subject, is there any chance of implementing a
> workaround in spamoracle to counter those spams specifically designed to
> fool the bayesian filters ? You know, those who have 4 lines of random
> words in a text attachement, and then some html spam. 
> 
> I don't know if the bayesian filters or a modification thereof is able
> to counter this kind of email, but i don't think so.

n-grams should be able to cope with the random words. There is already
at least one library at sf implementing them so I am not sure it is
worth to reimplement it in spamoracle.

Richard

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
  2004-01-03 23:28 ` Sven Luther
  2004-01-04 10:23   ` Richard Zidlicky
@ 2004-01-04 13:40   ` Vitaly Lugovsky
  2004-01-04 15:43   ` Damien Doligez
  2 siblings, 0 replies; 9+ messages in thread
From: Vitaly Lugovsky @ 2004-01-04 13:40 UTC (permalink / raw)
  To: Sven Luther; +Cc: Xavier Leroy, caml-list


On Sun, 4 Jan 2004, Sven Luther wrote:

> Well, on a similar subject, is there any chance of
> implementing a
> workaround in spamoracle to counter those spams specifically
> designed to
> fool the bayesian filters ? You know, those who have 4 lines
> of random
> words in a text attachement, and then some html spam.

 It's possible to calculate an entropy of a text. If a words
aren't correlated, and a correlation weights distribution is
plain enough - then it's a random text without any meaning
(information content). It's a way how an advanced search engines works.

 I'd be glad to implement this approach if I'd have some free
time. :(



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
  2004-01-03 23:28 ` Sven Luther
  2004-01-04 10:23   ` Richard Zidlicky
  2004-01-04 13:40   ` Vitaly Lugovsky
@ 2004-01-04 15:43   ` Damien Doligez
  2 siblings, 0 replies; 9+ messages in thread
From: Damien Doligez @ 2004-01-04 15:43 UTC (permalink / raw)
  To: caml-list

On Sunday, January 4, 2004, at 12:28 AM, Sven Luther wrote:

> What about adding a way to whitelist a given email address, so you 
> could
> post from a list of email addresses you have previously declared,
> without necessarily being subscribed to all of those.
>
> Another idea would be to provide all subscribed persons some hash value
> they can add to a specific header, which is then removed by the list
> software once it is checked to be a valid one. Could easily be 
> automated
> on both sides, and would allow to post from everywhere.

Without going quite so high-tech...  The moderator of comp.risks
has asked people to include the word "notsp" in their subject
lines, in order to quickly tell the spam from the legitimate
mail, and he says it works quite well.

We could transpose this technique, for example by saying that
mails whose subject already starts with "[Caml-list]" or
"Re: [Caml-list]" go through even if they don't come from a
subscribed address.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
  2004-01-04 10:23   ` Richard Zidlicky
@ 2004-01-04 16:22     ` Scott Alexander
  0 siblings, 0 replies; 9+ messages in thread
From: Scott Alexander @ 2004-01-04 16:22 UTC (permalink / raw)
  To: caml-list; +Cc: Richard Zidlicky

On Sun, 2004-01-04 at 05:23, Richard Zidlicky wrote:
> On Sun, Jan 04, 2004 at 12:28:37AM +0100, Sven Luther wrote:
> > On Sat, Jan 03, 2004 at 10:24:49AM +0100, Xavier Leroy wrote:
> > > There have been several complains recently about spam getting through
> > > the caml-list.
> > > 
> > > For your information, the list is filtered through SpamOracle, and the
> > > posting address receives several hundred spams a day.  Due to spammers
> > > getting more clever, the efficiency of the filtering went from perfect
> > > to about 99%.  That's enough to let significant amounts of spam slip
> > > through.
> > 
> > Well, on a similar subject, is there any chance of implementing a
> > workaround in spamoracle to counter those spams specifically designed to
> > fool the bayesian filters ? You know, those who have 4 lines of random
> > words in a text attachement, and then some html spam. 
> > 
> > I don't know if the bayesian filters or a modification thereof is able
> > to counter this kind of email, but i don't think so.
> 
> n-grams should be able to cope with the random words. There is already
> at least one library at sf implementing them so I am not sure it is
> worth to reimplement it in spamoracle.

FWIW, I've found the Bayesian stuff to do pretty well even with random
words given enough training.  (I'm using spambayes if it matters.)  Most
of the random words they pick aren't in my common words list as it turns
out.  And so many of the words in their actual message are in my spam
list.  (Obviously, this isn't a correct statement of how the algorithm
actually works, but I think it gives the right idea.)  After reading
Paul Graham's look back on Bayesian filtering after a year,
(http://www.paulgraham.com/sofar.html), I looked more closely at how
some of my spam and ham were scoring.  Looking at the misspelling
approach, I current score "viagra" as 0.974978, "vi@gra" as 0.844828,
and "v1@gra" as 0.908163.

As for random words, looking through my list of messages to be trained,
I have a typical spam titled "Re: YGOCP, to the procurator".  With a
long list of random words and breaking up their message ("<p>O</rigid>ur
U</immature>S Li</prominent>censed Doc</shepherd>tors wi</calve>ll<BR>
Prescr</violate>ibes Y</esophagi>our Me</antonym>dication
F</eigenvector>or F</irreversible>ree"), it scores as 99.79%.  Not only
do they have some URL elements (like biz) which are high on my spam
list, but some of the random words have become spam identifiers (euclid,
metalwork, adequacy, bourgeoisie, cornish, rectilinear).  It did hit a
few on the ham list (oregon, weird, and laminar appear in spam for the
first time with this message), but not enough to be significant.

I do train on (almost) every message that I receive and have done so for
several months.  According to the statistics section I have "Total
emails trained: Spam: 3893 Ham: 12685".  And I am having a false
positive problem with Caml-list after the rash of spams.  It seems to be
getting close to being trained back, but Caml-list is a relatively low
volume list for me.

Anyway, enough nattering on.  I'm amazed by the Bayesian stuff and find
it interesting.

Best,
Scott
-- 
Scott Alexander <salex@dsl.cis.upenn.edu>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] posting policy and spam
@ 2004-01-03 12:20 Claudio Sacerdoti Coen
  0 siblings, 0 replies; 9+ messages in thread
From: Claudio Sacerdoti Coen @ 2004-01-03 12:20 UTC (permalink / raw)
  To: caml-list; +Cc: xavier.leroy

[ Message posted again since it was caught by spamoracle ;-( ]

 Dear Xavier,

 are you sure of your diagnosis?
 I am asking since the spam messages I have received from caml-list
 were marked correctly by spam-oracle as spam:

 [ Note: this time I put double underscores here and there to avoid the
   spamoracle check. It is not easy to send a non-spam mail whose
   subject is spam ;-) ]

<<<<
X-Original-To: sacerdot@cs.unibo.it
X-S__PAM-W__arning: Sending machine is listed in blackholes.five-ten-sg.com
From: Michael Bienenfeld <patrick.trekels1@pi.be>
To: ytolun@isnet.net.tr
Subject: [Caml-list] Don't M__iss T__his O__pportunity. i__ieo
X-S__pam: yes; 1.00; i__ieo:99 b__iz:99 b__iz:99 g__if:99 $__$__$:99 m__onthly:99 i__mages:98
+h__tm:97 h__tm:97 c__lick:97 p__atrick:04 w__ww:91 w__ww:91 m__ichael:08 o__pportunity:90
X-S__pam: unknown; 0.28; o__caml:01 b__iz:98 h__tm:98 b__iz:98 h__tm:98 g__if:98 $__$__$:98
+b__eginners:01 b__in:01 a__rchives:01 i__nria:01 i__nria:01 o__pportunity:98 b__ug:02
c__aml:02
>>>>

 The first X-Spam was added by INRIA, the second one by my copy of
 spamoracle. (I know for sure since I have removed the two header lines
 and I have applied spamoracle -mark by hand, obtaining exactly the
 second line.

 IMHO, the conclusion is that spammers have not become smarter.
 Simply there is something wrong with the INRIA procmail and the X-Spam:yes
 messages are no longer filtered out.

						Regards,
						C.S.C.

-- 
----------------------------------------------------------------
Real name: Claudio Sacerdoti Coen
PhD Student in Computer Science at University of Bologna
E-mail: sacerdot@cs.unibo.it
http://www.cs.unibo.it/~sacerdot
----------------------------------------------------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

----- End forwarded message -----

-- 
----------------------------------------------------------------
Real name: Claudio Sacerdoti Coen
PhD Student in Computer Science at University of Bologna
E-mail: sacerdot@cs.unibo.it
http://www.cs.unibo.it/~sacerdot
----------------------------------------------------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-01-04 16:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-03  9:24 [Caml-list] posting policy and spam Xavier Leroy
2004-01-03 11:37 ` Claudio Sacerdoti Coen
2004-01-03 13:34   ` Xavier Leroy
2004-01-03 23:28 ` Sven Luther
2004-01-04 10:23   ` Richard Zidlicky
2004-01-04 16:22     ` Scott Alexander
2004-01-04 13:40   ` Vitaly Lugovsky
2004-01-04 15:43   ` Damien Doligez
2004-01-03 12:20 Claudio Sacerdoti Coen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).