Gnus development mailing list
 help / color / mirror / Atom feed
* Problems with spam filtering and bogofilter.
@ 2003-01-10 12:33 Malcolm Purvis
  2003-01-10 12:56 ` Ted Zlatanov
  0 siblings, 1 reply; 26+ messages in thread
From: Malcolm Purvis @ 2003-01-10 12:33 UTC (permalink / raw)


All this discussion about spam filtering has made me try spam.el, in my case
with bogofilter and POP.

However, if I add (: spam-split) to the start of nnmail-split-fancy, errors
are produced during the split and all my mail gets sent to the bogus group.

I am using a fresh version of gnus and the latest version of bogofilter from
SourceForce (0.9.1.2).  I see that spam.el's documentation refers to version
0.4 so perhaps there is come incompatibility?  In particular, the output of
bogosort -v is:

         X-Bogosity: No, tests=bogofilter, spamicity=0.355906, version=0.9.1.2

while spam-check-bogofilter is searching for 

	    (re-search-forward "Spamicity: \\(0\\.9\\|1\\.0\\)" nil t)

This is all running under XEmacs 21.4.11 on PPC Linux.

Malcolm

-- 
	       Malcolm Purvis <malcolmpurvis@optushome.com.au>

The hidden, terrible cost of nuclear warfare is Really Bad Public Art.
			        - Angus McIntyre, alt.peeves, 13/3/02.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Problems with spam filtering and bogofilter.
  2003-01-10 12:33 Problems with spam filtering and bogofilter Malcolm Purvis
@ 2003-01-10 12:56 ` Ted Zlatanov
  2003-01-10 17:24   ` Raja R Harinath
  2003-01-11 12:17   ` Problems with spam filtering and bogofilter Malcolm Purvis
  0 siblings, 2 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-10 12:56 UTC (permalink / raw)
  Cc: ding

On Fri, 10 Jan 2003, malcolmpurvis@optushome.com.au wrote:
> All this discussion about spam filtering has made me try spam.el, in
> my case with bogofilter and POP.
> 
> However, if I add (: spam-split) to the start of nnmail-split-fancy,
> errors are produced during the split and all my mail gets sent to
> the bogus group.
> 
> I am using a fresh version of gnus and the latest version of
> bogofilter from SourceForce (0.9.1.2).  I see that spam.el's
> documentation refers to version 0.4 so perhaps there is come
> incompatibility?  In particular, the output of bogosort -v is:
> 
>          X-Bogosity: No, tests=bogofilter, spamicity=0.355906,
>          version=0.9.1.2
> 
> while spam-check-bogofilter is searching for 
> 
> 	    (re-search-forward "Spamicity: \\(0\\.9\\|1\\.0\\)" nil t)
> 
> This is all running under XEmacs 21.4.11 on PPC Linux.

Yes, you need 0.4 currently.  The bogofilter functionality is older
than the more modular implementation I did for ifile and spam-stat, so
it's a little more outdated.  The question is, should I expend the
effort to keep up with bogofilter?  Can you check to see if 0.9.1.2
has the same flags as 0.4?  If so, and all I need to fix is the regex,
no big deal, but if the interface has changed then maybe I need to
rewrite the bogofilter section of spam.el anyway.

This will have to wait until Monday so I hope you don't have urgent
spam meanwhile :)

Ted




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Problems with spam filtering and bogofilter.
  2003-01-10 12:56 ` Ted Zlatanov
@ 2003-01-10 17:24   ` Raja R Harinath
  2003-01-13 19:17     ` Ted Zlatanov
  2003-01-15 19:31     ` Ted Zlatanov
  2003-01-11 12:17   ` Problems with spam filtering and bogofilter Malcolm Purvis
  1 sibling, 2 replies; 26+ messages in thread
From: Raja R Harinath @ 2003-01-10 17:24 UTC (permalink / raw)
  Cc: ding

Ted Zlatanov <tzz@lifelogs.com> writes:
[snip]
> Yes, you need 0.4 currently.  The bogofilter functionality is older
> than the more modular implementation I did for ifile and spam-stat, so
> it's a little more outdated.  The question is, should I expend the
> effort to keep up with bogofilter?  Can you check to see if 0.9.1.2
> has the same flags as 0.4?  If so, and all I need to fix is the regex,
> no big deal, but if the interface has changed then maybe I need to
> rewrite the bogofilter section of spam.el anyway.
>
> This will have to wait until Monday so I hope you don't have urgent
> spam meanwhile :)

The recommended procmail script for bogofilter 0.9.1.2 that seems to be

       The following recipe (a) spam-bins anything that  bogofil-
       ter rates as spam, (b) adds the words in messages rated as
       spam to the spam wordlist, and (c) adds the words in  mes-
       sages  rated  as  non-spam  to the non-spam wordlist. With
       this in place, it will normally only be necessary for  the
       user to intervene (with -N or -S) when bogofilter miscate-
       gorizes something.

       # filter mail through bogofilter, tagging it as spam and
       # updating the word lists

       :0fw
       | bogofilter -u -e -p

       # if bogofilter failed, return the mail to the queue, the MTA will
       # retry to deliver it later
       # 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h
       :0e
       { EXITCODE=75 HOST }

       # file the mail to spam-bogofilter if it's spam.
       :0:
       * ^X-Bogosity: Yes, tests=bogofilter
       spam-bogofilter

The '-u' however means that bogofilter has already integrated the
counts for the mail.  So, the trick would be to do the following in
spam-process-bogofilter:

  * ham marked and X-Bogosity: No        => do nothing
  * ham marked and X-Bogosity: Yes       => | bogofilter -N
  * ham marked and no X-Bogosity header  => | bogofilter -n

  * spam marked and X-Bogosity: No       => | bogofilter -S
  * spam marked and X-Bogosity: Yes      => do nothing
  * spam marked and no X-Bogosity header => | bogofilter -s

This should also handle the older version of bogofilter and the recipe
mentioned in 'spam.el'.  (Also, the name of the X-Bogosity header
should be configurable, as it is configurable on the bogofilter side).

- Hari
-- 
Raja R Harinath ------------------------------ harinath@cs.umn.edu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Problems with spam filtering and bogofilter.
  2003-01-10 12:56 ` Ted Zlatanov
  2003-01-10 17:24   ` Raja R Harinath
@ 2003-01-11 12:17   ` Malcolm Purvis
  2003-01-13 19:16     ` Ted Zlatanov
  1 sibling, 1 reply; 26+ messages in thread
From: Malcolm Purvis @ 2003-01-11 12:17 UTC (permalink / raw)
  Cc: ding

>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:

Ted> Can you check to see if 0.9.1.2 has the same flags as 0.4?

Alas, I think that version 0.4 is no longer available (it's not kept in the
SourceForge project archives as far as I can see).  However all the flags you
pass to bogofilter are still there so it should work (and indeed, the group
exit processing works fine).

One question:  Within spam-check-bogofilter, the message being looked at is
refered to via gnus-summary-article-number.  Is this valid within the context
of a fancy split?  The description of what environment is available to
functions within a fancy split is, ahh, skimpy at best but since I get split
errors on startup, and there is no summary buffer then, perhaps this is the
cause?


Malcolm

-- 
	       Malcolm Purvis <malcolmpurvis@optushome.com.au>

The hidden, terrible cost of nuclear warfare is Really Bad Public Art.
			        - Angus McIntyre, alt.peeves, 13/3/02.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Problems with spam filtering and bogofilter.
  2003-01-11 12:17   ` Problems with spam filtering and bogofilter Malcolm Purvis
@ 2003-01-13 19:16     ` Ted Zlatanov
  2003-01-13 21:24       ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
  0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-13 19:16 UTC (permalink / raw)
  Cc: ding

On Sat, 11 Jan 2003, malcolmpurvis@optushome.com.au wrote:
>>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:
> 
> Ted> Can you check to see if 0.9.1.2 has the same flags as 0.4?
> 
> Alas, I think that version 0.4 is no longer available (it's not kept
> in the SourceForge project archives as far as I can see).  However
> all the flags you pass to bogofilter are still there so it should
> work (and indeed, the group exit processing works fine).

I'm thinking of a rewrite of the bogofilter functionality into
something that looks more like the ifile functionality - cleaner,
takes articles as strings, etc.

> One question: Within spam-check-bogofilter, the message being looked
> at is refered to via gnus-summary-article-number.  Is this valid
> within the context of a fancy split?  The description of what
> environment is available to functions within a fancy split is, ahh,
> skimpy at best but since I get split errors on startup, and there is
> no summary buffer then, perhaps this is the cause?

I didn't write the bogofilter functionality, so I'm not familiar with
all the assumptions it makes.  I think the rewrite will be better in
that it will only work with the current article buffer, like all split
functions are supposed to.  It may have *other* bugs but this one will
be gone.  Hurrah for progress :)

I'll probably get around to it today or tomorrow...

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Problems with spam filtering and bogofilter.
  2003-01-10 17:24   ` Raja R Harinath
@ 2003-01-13 19:17     ` Ted Zlatanov
  2003-01-15 19:31     ` Ted Zlatanov
  1 sibling, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-13 19:17 UTC (permalink / raw)
  Cc: Malcolm Purvis, ding

On Fri, 10 Jan 2003, harinath@cs.umn.edu wrote:
>   * ham marked and X-Bogosity: No        => do nothing
>   * ham marked and X-Bogosity: Yes       => | bogofilter -N
>   * ham marked and no X-Bogosity header  => | bogofilter -n
> 
>   * spam marked and X-Bogosity: No       => | bogofilter -S
>   * spam marked and X-Bogosity: Yes      => do nothing
>   * spam marked and no X-Bogosity header => | bogofilter -s

Thanks for the summary, it was very helpful.

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* requesting articles from a nnxyz backend: what's the fastest
  2003-01-13 19:16     ` Ted Zlatanov
@ 2003-01-13 21:24       ` Ted Zlatanov
  2003-01-14  7:22         ` Kai Großjohann
  2003-01-14 18:16         ` Lars Magne Ingebrigtsen
  0 siblings, 2 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-13 21:24 UTC (permalink / raw)


What's the fastest safe way to request an article from a nnxyz
backend?  Special cases are OK - for instance, optimizing for
nnml/nnmaildir.

I'd like to optimize spam/ham processing, which currently takes an
article as a string out of the article buffer.  This is slow, so I'd
like to map an article number to a file when possible.

This would also allow fast training of the spam/ham backends.

I don't need the article to be treated, washed, or prepared in any
way.  That's why just the file name for file-based backends like
nnml/nnmaildir would be fine.

Thanks
Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-13 21:24       ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
@ 2003-01-14  7:22         ` Kai Großjohann
  2003-01-14 16:42           ` Ted Zlatanov
  2003-01-14 18:16         ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-14  7:22 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> I'd like to optimize spam/ham processing, which currently takes an
> article as a string out of the article buffer.  This is slow, so I'd
> like to map an article number to a file when possible.

I wonder if it is faster to fetch the article into a buffer and to
pipe it to the command?  Would that work with the commands used by
spam.el?

If this is fast enough, it would avoid special-casing for
file-per-msg backends.
-- 
Ambibibentists unite!



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-14  7:22         ` Kai Großjohann
@ 2003-01-14 16:42           ` Ted Zlatanov
  2003-01-14 20:39             ` Kai Großjohann
  0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-14 16:42 UTC (permalink / raw)
  Cc: ding

On Tue, 14 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> I'd like to optimize spam/ham processing, which currently takes an
>> article as a string out of the article buffer.  This is slow, so
>> I'd like to map an article number to a file when possible.
> 
> I wonder if it is faster to fetch the article into a buffer and to
> pipe it to the command?  Would that work with the commands used by
> spam.el?

That's exactly how spam.el operates now, except for bogofilter - the
whole article is passed to the spam/ham processor as a string.

> If this is fast enough, it would avoid special-casing for
> file-per-msg backends.

I am sure it's faster to invoke ifile (for instance) once on a
directory full of files, than it is to invoke ifile repeatedly on each
file in that directory, fetching that file as an article through Gnus.
That's why I asked if there was an "approved" way to map articles to
files for backends that support it.

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-13 21:24       ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
  2003-01-14  7:22         ` Kai Großjohann
@ 2003-01-14 18:16         ` Lars Magne Ingebrigtsen
  2003-01-14 20:42           ` Kai Großjohann
  2003-01-15 19:21           ` Ted Zlatanov
  1 sibling, 2 replies; 26+ messages in thread
From: Lars Magne Ingebrigtsen @ 2003-01-14 18:16 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> What's the fastest safe way to request an article from a nnxyz
> backend?  Special cases are OK - for instance, optimizing for
> nnml/nnmaildir.

In general, you have to call `nnxyx-request-article', but if you have
complete control over which (virtual) server is active and stuff like
that, you can use

   (nnml-possibly-change-directory group)

to set the current group, and

   
   (expand-file-name "number" nnml-current-directory)

to get the path name of article NUMBER.

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-14 16:42           ` Ted Zlatanov
@ 2003-01-14 20:39             ` Kai Großjohann
  2003-01-15 17:27               ` Ted Zlatanov
  0 siblings, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-14 20:39 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> That's exactly how spam.el operates now, except for bogofilter - the
> whole article is passed to the spam/ham processor as a string.

Your wording "as a string" sounds as if buffer-string or
buffer-substring is involved.

What I meant is call-process-region...

(I suspect that string processing in Emacs is slower than doing it
with text in buffers.)
-- 
Ambibibentists unite!



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-14 18:16         ` Lars Magne Ingebrigtsen
@ 2003-01-14 20:42           ` Kai Großjohann
  2003-01-15 19:21           ` Ted Zlatanov
  1 sibling, 0 replies; 26+ messages in thread
From: Kai Großjohann @ 2003-01-14 20:42 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> What's the fastest safe way to request an article from a nnxyz
>> backend?  Special cases are OK - for instance, optimizing for
>> nnml/nnmaildir.
>
> In general, you have to call `nnxyx-request-article',

nnxyz, nnxyx, what will we have next?

Sheesh.  Watch your language!  It's nnchoke!
-- 
Ambibibentists unite!



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-14 20:39             ` Kai Großjohann
@ 2003-01-15 17:27               ` Ted Zlatanov
  2003-01-16 11:56                 ` Kai Großjohann
  0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 17:27 UTC (permalink / raw)
  Cc: ding

On Tue, 14 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> That's exactly how spam.el operates now, except for bogofilter -
>> the whole article is passed to the spam/ham processor as a string.
> 
> Your wording "as a string" sounds as if buffer-string or
> buffer-substring is involved.
> 
> What I meant is call-process-region...

The article is transferred between functions as a string currently,
but passed through call-process-region to the child process.  I'm
working on making the parameter-passing less string-bound and more
oriented towards buffer reuse.  This is my first mid-size Emacs Lisp
project, so bear with me as I learn to use buffers better.

> (I suspect that string processing in Emacs is slower than doing it
> with text in buffers.)

Probably, by a small margin (we're working with one article at a time
anyway).  The only extra penalty is that an article buffer is copied
into a string, and that string is copied into a temporary buffer.
Could this be a problem with large messages in general?  Should I set
a size limit on a message and only process up to that point with
ifile/bogofilter/etc.?

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-14 18:16         ` Lars Magne Ingebrigtsen
  2003-01-14 20:42           ` Kai Großjohann
@ 2003-01-15 19:21           ` Ted Zlatanov
  2003-01-15 20:10             ` Lars Magne Ingebrigtsen
  2003-01-17 17:46             ` Paul Jarc
  1 sibling, 2 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 19:21 UTC (permalink / raw)


On Tue, 14 Jan 2003, larsi@gnus.org wrote:
> In general, you have to call `nnxyx-request-article', but if you
> have complete control over which (virtual) server is active and
> stuff like that, you can use
> 
>    (nnml-possibly-change-directory group)
> 
> to set the current group, and
> 
>    (expand-file-name "number" nnml-current-directory)
> 
> to get the path name of article NUMBER.

Assuming I have to enter the group I want to process, would this work
to get the filename of an article once I've entered the group?

(defun spam-get-article-as-filename (article)
  (let ((article-filename))
    (when (numberp article)
      (nnml-possibly-change-directory (gnus-group-real-name gnus-newsgroup-name))
      (setq article-filename (expand-file-name (int-to-string article) nnml-current-directory)))
    (if (file-exists-p article-filename)
	article-filename
      nil)))

nnml-possibly-change-directory doesn't seem to transform something
like "ding.info" into "ding/info" so I'm definitely missing
something.

More generally, can we add to Gnus the functionality of transforming a
group name + an article number into a filename, when it's possible?
Maybe in gnus-int.el?  I'd like to do something like this:

1. get list of articles

2. optionally filter list for spam/ham articles

3. if the group doesn't have 1 file per article, retrieve each article
   into a temporary file (nnimap, nntp, etc. without the agent)

3. get file names for the articles

4. send file names to bogofilter/ifile/etc. to be studied

This would be significantly faster than requesting the article and
then invoking process-region on the buffer, I think.  The function
should use the agent when possible (for nnimap groups, for instance).

I would do it myself, but I think it would be easier for someone who
understands the internals of the nn*.el code.  I looked at nnml.el and
others, but it would take me a while to write this code.  If no one is
interested I can try it, but it will take a while.

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Problems with spam filtering and bogofilter.
  2003-01-10 17:24   ` Raja R Harinath
  2003-01-13 19:17     ` Ted Zlatanov
@ 2003-01-15 19:31     ` Ted Zlatanov
  2003-01-15 21:17       ` Raja R Harinath
  1 sibling, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 19:31 UTC (permalink / raw)
  Cc: Malcolm Purvis, ding

On Fri, 10 Jan 2003, harinath@cs.umn.edu wrote:
>   * ham marked and X-Bogosity: No        => do nothing
>   * ham marked and X-Bogosity: Yes       => | bogofilter -N
>   * ham marked and no X-Bogosity header  => | bogofilter -n
> 
>   * spam marked and X-Bogosity: No       => | bogofilter -S
>   * spam marked and X-Bogosity: Yes      => do nothing
>   * spam marked and no X-Bogosity header => | bogofilter -s

Hmm, I think I like the exit codes of bogofilter (1 for ham, 0 for
spam).  I'd rather use that than the procmail filtering, since IMAP
users often don't have procmail available.  

We can add a new check (spam-check-bogofilter-headers) if anyone is
interested in filtering on the Bogofilter X-Bogosity header and it
gets inserted elsewhere in the mail delivery chain, but
spam-check-bogofilter should not rely on that.  So speak up if you
want spam-check-bogofilter-headers added.

So the invocations will be:

ham-marked: bogofilter -n
spam-marked: bogofilter -s

Any last-minute protests? :)

I'll also add customization of the directory (-d DIRNAME).  Most of
the current bogofilter complexity in spam.el will go away, and IMO
that is a good thing.  The only thing missing will be training of
spam/ham backends, and I'll work on that next.

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-15 19:21           ` Ted Zlatanov
@ 2003-01-15 20:10             ` Lars Magne Ingebrigtsen
  2003-01-15 22:14               ` Ted Zlatanov
  2003-01-17 17:46             ` Paul Jarc
  1 sibling, 1 reply; 26+ messages in thread
From: Lars Magne Ingebrigtsen @ 2003-01-15 20:10 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> (defun spam-get-article-as-filename (article)
>   (let ((article-filename))
>     (when (numberp article)
>       (nnml-possibly-change-directory (gnus-group-real-name gnus-newsgroup-name))
>       (setq article-filename (expand-file-name (int-to-string article) nnml-current-directory)))
>     (if (file-exists-p article-filename)
> 	article-filename
>       nil)))
>
> nnml-possibly-change-directory doesn't seem to transform something
> like "ding.info" into "ding/info" so I'm definitely missing
> something.

`nnml-current-directory' should really be the actual directory after
selecting the group.

> More generally, can we add to Gnus the functionality of transforming a
> group name + an article number into a filename, when it's possible?

That's only possible for a very limited number of back ends.  nnml
and nnmh, basically, so I'm not sure how useful that would be.

-- 
(domestic pets only, the antidote for overdose, milk.)
   larsi@gnus.org * Lars Magne Ingebrigtsen



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Problems with spam filtering and bogofilter.
  2003-01-15 19:31     ` Ted Zlatanov
@ 2003-01-15 21:17       ` Raja R Harinath
  2003-01-16  0:11         ` new Bogofilter functionality (was: Problems with spam filtering and bogofilter.) Ted Zlatanov
  0 siblings, 1 reply; 26+ messages in thread
From: Raja R Harinath @ 2003-01-15 21:17 UTC (permalink / raw)
  Cc: ding

Hi,

Ted Zlatanov <tzz@lifelogs.com> writes:

> On Fri, 10 Jan 2003, harinath@cs.umn.edu wrote:
>>   * ham marked and X-Bogosity: No        => do nothing
>>   * ham marked and X-Bogosity: Yes       => | bogofilter -N
>>   * ham marked and no X-Bogosity header  => | bogofilter -n
>> 
>>   * spam marked and X-Bogosity: No       => | bogofilter -S
>>   * spam marked and X-Bogosity: Yes      => do nothing
>>   * spam marked and no X-Bogosity header => | bogofilter -s
>
> Hmm, I think I like the exit codes of bogofilter (1 for ham, 0 for
> spam).  I'd rather use that than the procmail filtering, since IMAP
> users often don't have procmail available.  

That's fine.  But, it would be nice to integrate with procmail for
people who can and do use procmail.

> We can add a new check (spam-check-bogofilter-headers) if anyone is
> interested in filtering on the Bogofilter X-Bogosity header and it
> gets inserted elsewhere in the mail delivery chain, 

That can be achieved with the regular gnus splitting machinery.  No
need for 'spam.el' to worry in this case, IMHO.

> but spam-check-bogofilter should not rely on that.  So speak up if
> you want spam-check-bogofilter-headers added.
>
> So the invocations will be:
>
> ham-marked: bogofilter -n
> spam-marked: bogofilter -s

This should be automatically covered in the "no X-Bogosity header"
case above.  You really want to check if 'bogofilter' already has
incorporated the counts for this message -- and the X-Bogosity header
is a good indicator.

Also I meant the above rules to be used in the post-processing routine
(spam-bogofilter-register-routine), not in the filtering part.  My
suggestion above will work if the mail was processed by any of:

  * procmail script with 'bogofilter -u -e -p'
  * spam-split with spam-check-bogofilter
  * any other splitting tool which may not even invoke bogofilter

- Hari
-- 
Raja R Harinath ------------------------------ harinath@cs.umn.edu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-15 20:10             ` Lars Magne Ingebrigtsen
@ 2003-01-15 22:14               ` Ted Zlatanov
  0 siblings, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 22:14 UTC (permalink / raw)


On Wed, 15 Jan 2003, larsi@gnus.org wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> nnml-possibly-change-directory doesn't seem to transform something
>> like "ding.info" into "ding/info" so I'm definitely missing
>> something.
> 
> `nnml-current-directory' should really be the actual directory after
> selecting the group.

OK, I *was* missing something :)

>> More generally, can we add to Gnus the functionality of
>> transforming a group name + an article number into a filename, when
>> it's possible?
> 
> That's only possible for a very limited number of back ends.  nnml
> and nnmh, basically, so I'm not sure how useful that would be.

Considering almost everyone using Gnus for mail has some sort of nnml
set up, I think it makes a lot of sense.  nnmaildir also has the
article <-> filename correspondence, now that I think about it.

It's fine if the function is in nnml.el/nnmaildir.el instead of
gnus-int.el, that's not important.  I am just trying to avoid writing
8 spam-article-to-filename special cases, when it might be something
better handled through the usual internal Gnus system of (1. is the
function supported by the backend?, 2. invoke the function).

Ted




^ permalink raw reply	[flat|nested] 26+ messages in thread

* new Bogofilter functionality (was: Problems with spam filtering and bogofilter.)
  2003-01-15 21:17       ` Raja R Harinath
@ 2003-01-16  0:11         ` Ted Zlatanov
  2003-01-16 11:00           ` new Bogofilter functionality Malcolm Purvis
  0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16  0:11 UTC (permalink / raw)
  Cc: Malcolm Purvis, ding

OK, I've added the revamped Bogofilter functionality.

In gnus.el, I added the gnus-group-ham-exit-processor-bogofilter ham
processor; it's customizable as usual for each group.  So now
bogofilter can be used to process spam and ham on summary exit.

In spam.el, I added the spam-use-bogofilter-headers boolean for users
who just want to check for "X-Bogosity: Yes" in the headers when
invoking spam-split.  It invokes the spam-check-bogofilter-headers
function.  That function is also used by spam-check-bogofilter.

I also rewrote the spam-check-bogofilter and spam/ham bogofilter
registration.  It's much simpler now, which I think is a good thing.

The spam-bogofilter-score functionality is working, using the
"spamicity=%f" header parameter.

I couldn't find a clean way in spam-check-bogofilter to use
call-process-region and get the return value of the process, so I use
the "-v" output of bogofilter.  This works out OK, since it's the same
format as the one used in spam-check-bogofilter-headers, so we reuse
that function nicely.

Ted




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: new Bogofilter functionality
  2003-01-16  0:11         ` new Bogofilter functionality (was: Problems with spam filtering and bogofilter.) Ted Zlatanov
@ 2003-01-16 11:00           ` Malcolm Purvis
  2003-01-16 11:55             ` Ted Zlatanov
  0 siblings, 1 reply; 26+ messages in thread
From: Malcolm Purvis @ 2003-01-16 11:00 UTC (permalink / raw)


>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:

Ted> OK, I've added the revamped Bogofilter functionality.

Thanks!  I know have spam.el working bogofilter.  Hooray!

However, I had two minor problems:

1) You might like to consider re-wording the texinfo entry on the variable
gnus-ham-process-destinations.  It implies that the variable can take a
string, while everywhere else says it needs a list.

2) The patch below enables blacklist exit processing when blacklist is
enabled, not bogofilter.

Malcolm

[malcolmp@c18072 gnus]$ cvs diff -c lisp/spam.el 
Index: lisp/spam.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/spam.el,v
retrieving revision 6.53
diff -c -r6.53 spam.el
*** lisp/spam.el	2003/01/16 00:41:08	6.53
--- lisp/spam.el	2003/01/16 10:56:10
***************
*** 309,315 ****
    (when (spam-group-spam-processor-stat-p gnus-newsgroup-name)
      (spam-stat-register-spam-routine))
  
!   (when (spam-group-spam-processor-bogofilter-p gnus-newsgroup-name)
      (spam-blacklist-register-routine))
  
    (if spam-move-spam-nonspam-groups-only      
--- 309,315 ----
    (when (spam-group-spam-processor-stat-p gnus-newsgroup-name)
      (spam-stat-register-spam-routine))
  
!   (when (spam-group-spam-processor-blacklist-p gnus-newsgroup-name)
      (spam-blacklist-register-routine))
  
    (if spam-move-spam-nonspam-groups-only      

-- 
	       Malcolm Purvis <malcolmpurvis@optushome.com.au>

The hidden, terrible cost of nuclear warfare is Really Bad Public Art.
			        - Angus McIntyre, alt.peeves, 13/3/02.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: new Bogofilter functionality
  2003-01-16 11:00           ` new Bogofilter functionality Malcolm Purvis
@ 2003-01-16 11:55             ` Ted Zlatanov
  0 siblings, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16 11:55 UTC (permalink / raw)
  Cc: ding

On Thu, 16 Jan 2003, malcolmpurvis@optushome.com.au wrote:
> 1) You might like to consider re-wording the texinfo entry on the
> variable gnus-ham-process-destinations.  It implies that the
> variable can take a string, while everywhere else says it needs a
> list.

OK, done.

> 2) The patch below enables blacklist exit processing when blacklist
> is enabled, not bogofilter.

Yes, that was a bug.  Thanks for catching it - fixed.

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-15 17:27               ` Ted Zlatanov
@ 2003-01-16 11:56                 ` Kai Großjohann
  2003-01-16 12:30                   ` Ted Zlatanov
  0 siblings, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-16 11:56 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> Probably, by a small margin (we're working with one article at a time
> anyway).  The only extra penalty is that an article buffer is copied
> into a string, and that string is copied into a temporary buffer.
> Could this be a problem with large messages in general?  Should I set
> a size limit on a message and only process up to that point with
> ifile/bogofilter/etc.?

Why can't you call-process-region on the article buffer?
-- 
Ambibibentists unite!



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-16 11:56                 ` Kai Großjohann
@ 2003-01-16 12:30                   ` Ted Zlatanov
  2003-01-16 12:48                     ` Kai Großjohann
  0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16 12:30 UTC (permalink / raw)
  Cc: ding

On Thu, 16 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> Probably, by a small margin (we're working with one article at a
>> time anyway).  The only extra penalty is that an article buffer is
>> copied into a string, and that string is copied into a temporary
>> buffer.  Could this be a problem with large messages in general?
>> Should I set a size limit on a message and only process up to that
>> point with ifile/bogofilter/etc.?
> 
> Why can't you call-process-region on the article buffer?

I was trying to make the functionality generic, based on strings and
buffers instead of article numbers.  Also, my knowledge of Emacs Lisp
was not quite sufficient :)

I just added spam-get-article-as-buffer, and
spam-get-article-as-string uses it now.  But
spam-get-article-as-string does a string copy; I will gradually
convert the places where it's called into calls to
spam-get-article-as-buffer.  I'll probably end up doing a generic
shell command interface, actually.  That will also deal with the
annoyance of having two cases for each command, depending on whether
a parameter is passed or not.

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-16 12:30                   ` Ted Zlatanov
@ 2003-01-16 12:48                     ` Kai Großjohann
  2003-01-16 13:51                       ` Ted Zlatanov
  0 siblings, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-16 12:48 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> That will also deal with the annoyance of having two cases for each
> command, depending on whether a parameter is passed or not.

Was the problem that you need to call call-process (or
call-process-region) with a different number of parameters?

(apply 'call-process-region PARM1 PARM2 LIST-OF-OTHER-PARMS)

The LIST-OF-OTHER-PARMS can be nil if you need only two params.

I hope I'm not annoying you with this simple stuff.  You keep saying
your Lisp knowledge is not good enough -- I somehow doubt it :-)
-- 
Ambibibentists unite!



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-16 12:48                     ` Kai Großjohann
@ 2003-01-16 13:51                       ` Ted Zlatanov
  0 siblings, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16 13:51 UTC (permalink / raw)
  Cc: ding

On Thu, 16 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> That will also deal with the annoyance of having two cases for each
>> command, depending on whether a parameter is passed or not.
> 
> Was the problem that you need to call call-process (or
> call-process-region) with a different number of parameters?
> 
> (apply 'call-process-region PARM1 PARM2 LIST-OF-OTHER-PARMS)
> 
> The LIST-OF-OTHER-PARMS can be nil if you need only two params.

The problem is (a simple example):

(call-process-region (point-min) (point-max)
   "/bin/cat" nil nil nil "-switch1" "-switch2"  spam-bogofilter-database-directory)

so with apply:

(apply 'call-process-region (point-min) (point-max) "/bin/cat" nil nil nil 
    '("-v" "-d" nil))

I don't think I can use apply on that, when
spam-bogofilter-database-directory is nil.  nil will still get passed
to call-process-region, which requires all command-line arguments to
pass stringp.  I just need a function to filter out nil arguments, and
I will roll that into the generic shell interface.

Ted



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: requesting articles from a nnxyz backend: what's the fastest
  2003-01-15 19:21           ` Ted Zlatanov
  2003-01-15 20:10             ` Lars Magne Ingebrigtsen
@ 2003-01-17 17:46             ` Paul Jarc
  1 sibling, 0 replies; 26+ messages in thread
From: Paul Jarc @ 2003-01-17 17:46 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> wrote:
> More generally, can we add to Gnus the functionality of transforming a
> group name + an article number into a filename, when it's possible?

nnmaildir has:
nnmaildir-article-number-to-file-name
nnmaildir-article-number-to-base-name
nnmaildir-base-name-to-article-number

> 1. get list of articles

You can get the min and max numbers from gnus-active.


paul



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2003-01-17 17:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-10 12:33 Problems with spam filtering and bogofilter Malcolm Purvis
2003-01-10 12:56 ` Ted Zlatanov
2003-01-10 17:24   ` Raja R Harinath
2003-01-13 19:17     ` Ted Zlatanov
2003-01-15 19:31     ` Ted Zlatanov
2003-01-15 21:17       ` Raja R Harinath
2003-01-16  0:11         ` new Bogofilter functionality (was: Problems with spam filtering and bogofilter.) Ted Zlatanov
2003-01-16 11:00           ` new Bogofilter functionality Malcolm Purvis
2003-01-16 11:55             ` Ted Zlatanov
2003-01-11 12:17   ` Problems with spam filtering and bogofilter Malcolm Purvis
2003-01-13 19:16     ` Ted Zlatanov
2003-01-13 21:24       ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
2003-01-14  7:22         ` Kai Großjohann
2003-01-14 16:42           ` Ted Zlatanov
2003-01-14 20:39             ` Kai Großjohann
2003-01-15 17:27               ` Ted Zlatanov
2003-01-16 11:56                 ` Kai Großjohann
2003-01-16 12:30                   ` Ted Zlatanov
2003-01-16 12:48                     ` Kai Großjohann
2003-01-16 13:51                       ` Ted Zlatanov
2003-01-14 18:16         ` Lars Magne Ingebrigtsen
2003-01-14 20:42           ` Kai Großjohann
2003-01-15 19:21           ` Ted Zlatanov
2003-01-15 20:10             ` Lars Magne Ingebrigtsen
2003-01-15 22:14               ` Ted Zlatanov
2003-01-17 17:46             ` Paul Jarc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).