Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* ifile or similar
@ 2002-08-18 19:33 Adam Sjøgren
       [not found] ` <ajotn9$1db2j7$1@ID-125932.news.dfncis.de>
       [not found] ` <ifcrja.d4k.ln@obelix.bakkelygaard.dk>
  0 siblings, 2 replies; 5+ messages in thread
From: Adam Sjøgren @ 2002-08-18 19:33 UTC (permalink / raw)


  Hi.


Has anyone integrated ifile <http://www.ai.mit.edu/~jrennie/ifile/> or
some similar system with Gnus?

Pointers much appriciated!


  Best regards,

-- 
 "Fra én som sover for lidt,                                   Adam Sjøgren
  som synes verden er stor"                               asjo@koldfront.dk


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ifile or similar
       [not found]     ` <ajp14m$1dacii$1@ID-125932.news.dfncis.de>
@ 2002-08-18 21:23       ` Adam Sjøgren
  2002-08-18 23:01         ` Bruce Stephens
  2002-08-19 15:59         ` Christopher Browne
  0 siblings, 2 replies; 5+ messages in thread
From: Adam Sjøgren @ 2002-08-18 21:23 UTC (permalink / raw)


On 18 Aug 2002 20:47:18 GMT, Christopher Browne wrote:

>> I guess the easiest would just be to have procmail/something add an
>> X-header and have Gnus split on that.
[...]

> If you take that approach, I suggest that you have a _lot_ more than
> just one spam category.  There is little reason to expect "phone
> sex" ads to be particularly similar to "Nigerian financial scams" or
> for either to strongly resemble ads about enlarging sexual organs.
> If you put them all together in one folder, that will muddy
> discrimination.

Really? Paul Grahams recent article "A Plan for Spam" seems to
indicate otherwise:

 http://www.paulgraham.com/spam.html

(which is where I found a pointer to ifile). I don't know if ifile
works exactly as Paul Grahams scheme, though. An elisp implementation
of that would be even more fun... :-)

I was thinking of making nonspam, spam and virus. Virus-emails seem to
me to be likely to have a different "pattern".

> You want better results?  Set up several folders; nnml:pyramid,
> nnml:snakeoil, nnml:creditcards, nnml:gambling, nnml:porn, and such.

That would defeat the purpose of not spending time on spam (if I have
to sort my entire backlog of spam into categories first).


  Best wishes,

-- 
 "Fra én som sover for lidt,                                   Adam Sjøgren
  som synes verden er stor"                               asjo@koldfront.dk


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ifile or similar
  2002-08-18 21:23       ` Adam Sjøgren
@ 2002-08-18 23:01         ` Bruce Stephens
  2002-08-19 15:59         ` Christopher Browne
  1 sibling, 0 replies; 5+ messages in thread
From: Bruce Stephens @ 2002-08-18 23:01 UTC (permalink / raw)


spamtrap@koldfront.dk (Adam Sjøgren) writes:

[...]

> I don't know if ifile works exactly as Paul Grahams scheme,
> though. An elisp implementation of that would be even more
> fun... :-)

<http://www.emacswiki.org/cgi-bin/wiki.pl?SpamStat>, which was posted
to gnu.emacs.sources a day or two ago.

[...]


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ifile or similar
  2002-08-18 21:23       ` Adam Sjøgren
  2002-08-18 23:01         ` Bruce Stephens
@ 2002-08-19 15:59         ` Christopher Browne
  1 sibling, 0 replies; 5+ messages in thread
From: Christopher Browne @ 2002-08-19 15:59 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]

spamtrap@koldfront.dk (Adam Sjøgren) wrote:
> On 18 Aug 2002 20:47:18 GMT, Christopher Browne wrote:
>
>>> I guess the easiest would just be to have procmail/something add an
>>> X-header and have Gnus split on that.
> [...]
>
>> If you take that approach, I suggest that you have a _lot_ more than
>> just one spam category.  There is little reason to expect "phone
>> sex" ads to be particularly similar to "Nigerian financial scams" or
>> for either to strongly resemble ads about enlarging sexual organs.
>> If you put them all together in one folder, that will muddy
>> discrimination.
>
> Really? Paul Grahams recent article "A Plan for Spam" seems to
> indicate otherwise:
>
>  http://www.paulgraham.com/spam.html
>
> (which is where I found a pointer to ifile). I don't know if ifile
> works exactly as Paul Grahams scheme, though. An elisp implementation
> of that would be even more fun... :-)

His "plan for spam" seems a rather new toy.  I have been using Ifile
for about five years now.

> I was thinking of making nonspam, spam and virus. Virus-emails seem to
> me to be likely to have a different "pattern".
>
>> You want better results?  Set up several folders; nnml:pyramid,
>> nnml:snakeoil, nnml:creditcards, nnml:gambling, nnml:porn, and such.
>
> That would defeat the purpose of not spending time on spam (if I have
> to sort my entire backlog of spam into categories first).

All that the scheme is about is about classifying messages.

If you make up one "pool" that is murky because it combines a lot of
quite different stuff (nigerian pyramids versus porn versus credit
card), you can't expect to get as good results as you get if you have
a few more categories.

It should be pretty straightforward:

  Better quality corpus -> better quality results.

Consider: If you spend an hour setting up a better corpus, and this
provides better results for the next five years, that's a pretty good
investment of your time, isn't it?
-- 
(reverse (concatenate 'string "gro.mca@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/lsf.html
"I  doubt this language  difference would  confuse anybody  unless you
were providing instructions on the insertion of a caffeine enema."
-- On alt.coffee


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ifile or similar
       [not found] ` <ifcrja.d4k.ln@obelix.bakkelygaard.dk>
@ 2002-08-22 12:26   ` Clemens Fischer
  0 siblings, 0 replies; 5+ messages in thread
From: Clemens Fischer @ 2002-08-22 12:26 UTC (permalink / raw)


Anders Wegge Jakobsen <wegge@bakkelygaard.dk> writes:

> "Adam" == Adam Sjøgren <spamtrap@koldfront.dk> writes:
>
>  If you're already using procmail for splitting mail, it is pretty
> straightforward to do:
>
>         FOLDER=`ifile -gwc -Q |cut -d' ' -f1`.spool
>         #
>         :0:
>         $FOLDER

i'm not sure about the options.  the most recent version ifile-1.0.7
has (finally!) an option for easier scripting.  the call should be
more like:

# -ino: 2-20.08.02-20:08 new option "-c" with version 1.0.7:
FOLDER=`/usr/local/bin/ifile -cQ`

the option "-c" or "--concise" works with "-q" (query database for
classification) and "-Q", which queries the database /and/ inserts the
statistics of the current document into it.  this obsoletes:

#FOLDER=`ifile -g -v0 -Q |head -1 |cut -d' ' -f1`

the "-c" option basically does what the extranous "|cut ..."  pipe is
(propably) supposed to do.

the "-w" option in your invocation would make ifile lex for white
space separated words.  i think it is better to let it use the default
lexer, which considers punctuations as well, thus including statistics
on typical host name particles and header values.

clemens


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-08-22 12:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-18 19:33 ifile or similar Adam Sjøgren
     [not found] ` <ajotn9$1db2j7$1@ID-125932.news.dfncis.de>
     [not found]   ` <87n0rkudz3.fsf@virgil.koldfront.dk>
     [not found]     ` <ajp14m$1dacii$1@ID-125932.news.dfncis.de>
2002-08-18 21:23       ` Adam Sjøgren
2002-08-18 23:01         ` Bruce Stephens
2002-08-19 15:59         ` Christopher Browne
     [not found] ` <ifcrja.d4k.ln@obelix.bakkelygaard.dk>
2002-08-22 12:26   ` Clemens Fischer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).