Gnus development mailing list
 help / color / mirror / Atom feed
* Re: (: spam-split) doesn't work
       [not found]                       ` <87wujl8tj8.fsf@splinter.inka.de>
@ 2003-02-27 19:03                         ` Ted Zlatanov
  2003-02-27 20:06                           ` Christopher Splinter
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2003-02-27 19:03 UTC (permalink / raw)
  Cc: Ding Mailing List

On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> Send me the backtrace you'll get;
> 
> Note that the spamicity value differs from what `S t' returns --
> which is a value of 1.0. Moreover, the *Article* buffer of the
> respective article is deprived of its body after hitting `B r'
> (which is what I did to get the backtrace) and the score of the
> header alone (= 0.8477387640) is quite similar to the one below.
> 
> Debugger entered: ("Unsure, tests=bogofilter,
>   spamicity=0.8469313635, version=0.10.3.1.cvs.20030227")

(background for ding readers: Chris was having issues with bogofilter
classification in spam.el)

There's the problem, I think.  "Unsure" is not a recognized spam flag.
spam.el recognizes only "Yes" and "Spam" as spam-positive indicators.

I'm not sure what to do here.  Should I make "Unsure" an optional
positive (and filter on a spamicity threshold then), filter based on
the spamicity value alone, etc?  Please comment if you use bogofilter
with spam.el.

Ted



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-27 19:03                         ` (: spam-split) doesn't work Ted Zlatanov
@ 2003-02-27 20:06                           ` Christopher Splinter
  2003-02-27 20:56                             ` Ted Zlatanov
  0 siblings, 1 reply; 13+ messages in thread
From: Christopher Splinter @ 2003-02-27 20:06 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
>> Note that the spamicity value differs from what `S t' returns --
>> which is a value of 1.0. Moreover, the *Article* buffer of the
>> respective article is deprived of its body after hitting `B r'
>> (which is what I did to get the backtrace) and the score of the
>> header alone (= 0.8477387640) is quite similar to the one below.
>> 
>> Debugger entered: ("Unsure, tests=bogofilter,
>>   spamicity=0.8469313635, version=0.10.3.1.cvs.20030227")
>
> (background for ding readers: Chris was having issues with bogofilter
> classification in spam.el)
>
> There's the problem, I think.   "Unsure" is not a recognized spam flag.

I'm not sure about that. After all, the above value is not what
bogofilter returns when the message, for which bogofilter returns
that value when called via `B t' or `B r', is piped to bogofilter
manually -- in that case, this is returned:

X-Bogosity: Spam, tests=bogofilter, spamicity=1.0000000000, version=0.10.3.1.cvs.20030227

Therefore I suspect that bogofilter isn't fed the complete message.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-27 20:06                           ` Christopher Splinter
@ 2003-02-27 20:56                             ` Ted Zlatanov
  2003-02-27 22:05                               ` Christopher Splinter
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2003-02-27 20:56 UTC (permalink / raw)


On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
>>> Note that the spamicity value differs from what `S t' returns --
>>> which is a value of 1.0. Moreover, the *Article* buffer of the
>>> respective article is deprived of its body after hitting `B r'
>>> (which is what I did to get the backtrace) and the score of the
>>> header alone (= 0.8477387640) is quite similar to the one below.
>>> 
>>> Debugger entered: ("Unsure, tests=bogofilter,
>>>   spamicity=0.8469313635, version=0.10.3.1.cvs.20030227")
>>
>> (background for ding readers: Chris was having issues with
>> bogofilter classification in spam.el)
>>
>> There's the problem, I think.  "Unsure" is not a recognized spam
>> flag.
> 
> I'm not sure about that. After all, the above value is not what
> bogofilter returns when the message, for which bogofilter returns
> that value when called via `B t' or `B r', is piped to bogofilter
> manually -- in that case, this is returned:
> 
> X-Bogosity: Spam, tests=bogofilter, spamicity=1.0000000000,
> version=0.10.3.1.cvs.20030227
> 
> Therefore I suspect that bogofilter isn't fed the complete message.

Sorry for the extensive quoting, there's a lot of context here.

I remember we tested this, and the complete message was indeed fed to
bogofilter, according to a debug statement.  But maybe I was wrong.  I
remember some narrow/widen issues with IMAP, but I thought I had fixed
those.

Try replacing spam-bogofilter-path (which is normally the path to
bogofilter) with the path to something like this (save this script
anywhere and make it executable):

#!/usr/bin/perl -w

open SAVE,">/tmp/bogofilter.input" or die "Could not save output: $!";

while (<>)
{
 print SAVE $_;
}

Then look at the /tmp/bogofilter.input file and see if it matches the
message we're trying to classify.  You can see what bogofilter would
have said about it, too, if you do

bogofilter -v /tmp/bogofilter.input

Thanks
Ted



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-27 20:56                             ` Ted Zlatanov
@ 2003-02-27 22:05                               ` Christopher Splinter
  2003-02-27 22:29                                 ` Ted Zlatanov
  0 siblings, 1 reply; 13+ messages in thread
From: Christopher Splinter @ 2003-02-27 22:05 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
>> Ted Zlatanov <tzz@lifelogs.com> writes:
>>> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
>>>> Note that the spamicity value differs from what `S t' returns --
>>>> which is a value of 1.0. Moreover, the *Article* buffer of the
>>>> respective article is deprived of its body after hitting `B r'
>>>> (which is what I did to get the backtrace) and the score of the
>>>> header alone (= 0.8477387640) is quite similar to the one below.
>>>> 
>>>> Debugger entered: ("Unsure, tests=bogofilter,
>>>>   spamicity=0.8469313635, version=0.10.3.1.cvs.20030227")
>> After all, the above value is not what bogofilter returns when
>> the message, for which bogofilter returns that value when
>> called via `B t' or `B r', is piped to bogofilter manually --
>> in that case, this is returned:
>> 
>> X-Bogosity: Spam, tests=bogofilter, spamicity=1.0000000000,
>> version=0.10.3.1.cvs.20030227
>> 
>> Therefore I suspect that bogofilter isn't fed the complete message.
>
[Problems might be related to narrowing issues]
> Try replacing spam-bogofilter-path (which is normally the path to
> bogofilter) with the path to something like this (save this script
> anywhere and make it executable):

[...]

> Then look at the /tmp/bogofilter.input file and see if it matches the
> message we're trying to classify.

As I had assumed, /tmp/bogofilter.input only contains the header,
whereas the message which is to be classified contains a
non-empty body.

> You can see what bogofilter would have said about it, too, if
> you do
>
> bogofilter -v /tmp/bogofilter.input

Well, almost :-) bogofilter only takes input from stdin.

X-Bogosity: Unsure, tests=bogofilter, spamicity=0.8524075477, version=0.10.3.1.cvs.20030227



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-27 22:05                               ` Christopher Splinter
@ 2003-02-27 22:29                                 ` Ted Zlatanov
  2003-02-27 23:45                                   ` Christopher Splinter
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2003-02-27 22:29 UTC (permalink / raw)


On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
> As I had assumed, /tmp/bogofilter.input only contains the header,
> whereas the message which is to be classified contains a
> non-empty body.
> 
>> You can see what bogofilter would have said about it, too, if
>> you do
>>
>> bogofilter -v /tmp/bogofilter.input
> 
> Well, almost :-) bogofilter only takes input from stdin.
> 
> X-Bogosity: Unsure, tests=bogofilter, spamicity=0.8524075477,
> version=0.10.3.1.cvs.20030227

I see.  Try setting nnimap-split-download-body to t.  Does that fix
it?  If yes, I'll make it the default setting if you enable
spam-use-bogofilter or any other statistical filter.  But it will slow
down IMAP mail retrieval, since the full message body will be
downloaded for analysis.  Does anyone object?

I remember us checking this, and we got the full message body, that's
why I didn't check this sooner.  Sorry!

Thanks
Ted



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-27 22:29                                 ` Ted Zlatanov
@ 2003-02-27 23:45                                   ` Christopher Splinter
  2003-02-28 16:43                                     ` Ted Zlatanov
  0 siblings, 1 reply; 13+ messages in thread
From: Christopher Splinter @ 2003-02-27 23:45 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
>> As I had assumed, /tmp/bogofilter.input only contains the header,
>> whereas the message which is to be classified contains a
>> non-empty body.
[...]
>
> I see.  Try setting nnimap-split-download-body to t.  Does that fix
> it? 

As I don't use IMAP, there's no effect.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-27 23:45                                   ` Christopher Splinter
@ 2003-02-28 16:43                                     ` Ted Zlatanov
  2003-02-28 16:55                                       ` Simon Josefsson
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2003-02-28 16:43 UTC (permalink / raw)
  Cc: Simon Josefsson

On Fri, 28 Feb 2003, chris@splinter.inka.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
>>> As I had assumed, /tmp/bogofilter.input only contains the header,
>>> whereas the message which is to be classified contains a
>>> non-empty body.
> [...]
>>
>> I see.  Try setting nnimap-split-download-body to t.  Does that fix
>> it? 
> 
> As I don't use IMAP, there's no effect.

I'm pretty sure Simon Josefsson did the necessary work to have the
full message available in nnmail (nnml, specifically) splitting.  See
this message on the ding list and the surrounding discussion:

<ilu7kct1e0d.fsf@latte.josefsson.org>

Simon, am I supposed to do something special in spam-split to have the
full message body available?  I think (widen) wouldn't work, because
it will remove all narrowing restrictions and I'll be looking at the
full incoming mbox file, is that correct?  I had assumed that your fix
left the full message body in the buffer, but according to Chris'
experience he's getting only the header.

If it's not set up yet, it would be OK to have a variable analogous to
nnimap-split-download-body to trigger that behavior optionally.

Thanks
Ted



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-28 16:43                                     ` Ted Zlatanov
@ 2003-02-28 16:55                                       ` Simon Josefsson
  2003-02-28 17:07                                         ` Ted Zlatanov
  0 siblings, 1 reply; 13+ messages in thread
From: Simon Josefsson @ 2003-02-28 16:55 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> On Fri, 28 Feb 2003, chris@splinter.inka.de wrote:
>> Ted Zlatanov <tzz@lifelogs.com> writes:
>> 
>>> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote:
>>>> As I had assumed, /tmp/bogofilter.input only contains the header,
>>>> whereas the message which is to be classified contains a
>>>> non-empty body.
>> [...]
>>>
>>> I see.  Try setting nnimap-split-download-body to t.  Does that fix
>>> it? 
>> 
>> As I don't use IMAP, there's no effect.
>
> I'm pretty sure Simon Josefsson did the necessary work to have the
> full message available in nnmail (nnml, specifically) splitting.  See
> this message on the ding list and the surrounding discussion:
>
> <ilu7kct1e0d.fsf@latte.josefsson.org>
>
> Simon, am I supposed to do something special in spam-split to have the
> full message body available?  I think (widen) wouldn't work, because
> it will remove all narrowing restrictions and I'll be looking at the
> full incoming mbox file, is that correct?  I had assumed that your fix
> left the full message body in the buffer, but according to Chris'
> experience he's getting only the header.
>
> If it's not set up yet, it would be OK to have a variable analogous to
> nnimap-split-download-body to trigger that behavior optionally.

The behaviour enabled by n-s-d-b for nnmail should be the default.  I
think Gnus copies the message around, so (widen) should work.  Maybe
putting (progn (widen) (message (buffer-string))) in the split
function will tell for sure, I haven't used nnmail splitting recently.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-28 16:55                                       ` Simon Josefsson
@ 2003-02-28 17:07                                         ` Ted Zlatanov
  2003-02-28 17:34                                           ` Simon Josefsson
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2003-02-28 17:07 UTC (permalink / raw)


On Fri, 28 Feb 2003, jas@extundo.com wrote:
> The behaviour enabled by n-s-d-b for nnmail should be the default.
> I think Gnus copies the message around, so (widen) should work.
> Maybe putting (progn (widen) (message (buffer-string))) in the split
> function will tell for sure, I haven't used nnmail splitting
> recently.

OK, I see.  I thought that you did the (widen) by default before the
splitting was invoked in nnml somewhere, my fault for not checking.

So if the user has selected a statistical spam analyzer, I'm going to
call (widen) in all cases, and additionally set
nnimap-split-download-body to t when spam.el is loaded.  I think
that's sensible, so we avoid extra downloading when a spam
splitter is invoked that doesn't care about the message body,
e.g. BBDB or whitelists/blacklists.

Would a user ever want to use a statistical analyzer on the message
headers only?  Should I add a setting for that?

Thanks
Ted



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-28 17:07                                         ` Ted Zlatanov
@ 2003-02-28 17:34                                           ` Simon Josefsson
  2003-02-28 21:22                                             ` Ted Zlatanov
  0 siblings, 1 reply; 13+ messages in thread
From: Simon Josefsson @ 2003-02-28 17:34 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> On Fri, 28 Feb 2003, jas@extundo.com wrote:
>> The behaviour enabled by n-s-d-b for nnmail should be the default.
>> I think Gnus copies the message around, so (widen) should work.
>> Maybe putting (progn (widen) (message (buffer-string))) in the split
>> function will tell for sure, I haven't used nnmail splitting
>> recently.
>
> OK, I see.  I thought that you did the (widen) by default before the
> splitting was invoked in nnml somewhere, my fault for not checking.

I don't think I had to change any part of nnmail, I remember that the
(widen) hack worked when I last used nnmail splitting seriously, which
probably was 5 years ago or so.

> So if the user has selected a statistical spam analyzer, I'm going to
> call (widen) in all cases, and additionally set
> nnimap-split-download-body to t when spam.el is loaded.

Sounds good.

> Would a user ever want to use a statistical analyzer on the message
> headers only?  Should I add a setting for that?

Wait until someone asks for it. :-)




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-28 17:34                                           ` Simon Josefsson
@ 2003-02-28 21:22                                             ` Ted Zlatanov
  2003-02-28 22:17                                               ` Christopher Splinter
  0 siblings, 1 reply; 13+ messages in thread
From: Ted Zlatanov @ 2003-02-28 21:22 UTC (permalink / raw)


On Fri, 28 Feb 2003, jas@extundo.com wrote:
>> So if the user has selected a statistical spam analyzer, I'm going
>> to call (widen) in all cases, and additionally set
>> nnimap-split-download-body to t when spam.el is loaded.
> 
> Sounds good.

Done and committed; the manual was also updated to mention
spam-list-of-statistical-checks in the Extending spam.el section.

I hope I'm setting nnimap-split-download-body to t in the right place
(gnus-get-new-news-hook).

Chris, if you want to test, go ahead and let me know how it works.

Thanks
Ted



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-28 21:22                                             ` Ted Zlatanov
@ 2003-02-28 22:17                                               ` Christopher Splinter
  2003-03-01 15:01                                                 ` Ted Zlatanov
  0 siblings, 1 reply; 13+ messages in thread
From: Christopher Splinter @ 2003-02-28 22:17 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

>>> So if the user has selected a statistical spam analyzer, I'm going
>>> to call (widen) in all cases, and additionally set
>>> nnimap-split-download-body to t when spam.el is loaded.
> Done and committed; the manual was also updated to mention
> spam-list-of-statistical-checks in the Extending spam.el section.
>
> Chris, if you want to test, go ahead and let me know how it works.

It works properly now. Thanks!



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: (: spam-split) doesn't work
  2003-02-28 22:17                                               ` Christopher Splinter
@ 2003-03-01 15:01                                                 ` Ted Zlatanov
  0 siblings, 0 replies; 13+ messages in thread
From: Ted Zlatanov @ 2003-03-01 15:01 UTC (permalink / raw)


On Fri, 28 Feb 2003, chris@splinter.inka.de wrote:
> It works properly now. Thanks!

Great.  For anyone that uses spam.el with statistical analyzers
(ifile/spam-stat/bogofilter), this means you'll get far more accurate
results.  Please let me know if you have any problems or notice
classification is worse or improved.

Also, as I mentioned there could be a way to force statistical
analyzers to analyze only the headers if you are concerned about
download time over IMAP.  If anyone wants this feature, let me know.

Thanks
Ted




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-03-01 15:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87vfzyjx2v.fsf@splinter.inka.de>
     [not found] ` <4nisvxs1j6.fsf@lockgroove.bwh.harvard.edu>
     [not found]   ` <874r7g2f4w.fsf@splinter.inka.de>
     [not found]     ` <4nheb8qazq.fsf@lockgroove.bwh.harvard.edu>
     [not found]       ` <87vfznc21y.fsf@splinter.inka.de>
     [not found]         ` <4nk7g3gdbk.fsf@lockgroove.bwh.harvard.edu>
     [not found]           ` <87vfzm7p2f.fsf@splinter.inka.de>
     [not found]             ` <4nlm0iad77.fsf@lockgroove.bwh.harvard.edu>
     [not found]               ` <877kc26j07.fsf@splinter.inka.de>
     [not found]                 ` <m3of5dc4tt.fsf@heechee.beld.net>
     [not found]                   ` <87el5talpw.fsf@splinter.inka.de>
     [not found]                     ` <4nfzq9k84r.fsf@chubby.bwh.harvard.edu>
     [not found]                       ` <87wujl8tj8.fsf@splinter.inka.de>
2003-02-27 19:03                         ` (: spam-split) doesn't work Ted Zlatanov
2003-02-27 20:06                           ` Christopher Splinter
2003-02-27 20:56                             ` Ted Zlatanov
2003-02-27 22:05                               ` Christopher Splinter
2003-02-27 22:29                                 ` Ted Zlatanov
2003-02-27 23:45                                   ` Christopher Splinter
2003-02-28 16:43                                     ` Ted Zlatanov
2003-02-28 16:55                                       ` Simon Josefsson
2003-02-28 17:07                                         ` Ted Zlatanov
2003-02-28 17:34                                           ` Simon Josefsson
2003-02-28 21:22                                             ` Ted Zlatanov
2003-02-28 22:17                                               ` Christopher Splinter
2003-03-01 15:01                                                 ` Ted Zlatanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).