* Re: (: spam-split) doesn't work [not found] ` <87wujl8tj8.fsf@splinter.inka.de> @ 2003-02-27 19:03 ` Ted Zlatanov 2003-02-27 20:06 ` Christopher Splinter 0 siblings, 1 reply; 13+ messages in thread From: Ted Zlatanov @ 2003-02-27 19:03 UTC (permalink / raw) Cc: Ding Mailing List On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: > Ted Zlatanov <tzz@lifelogs.com> writes: > >> Send me the backtrace you'll get; > > Note that the spamicity value differs from what `S t' returns -- > which is a value of 1.0. Moreover, the *Article* buffer of the > respective article is deprived of its body after hitting `B r' > (which is what I did to get the backtrace) and the score of the > header alone (= 0.8477387640) is quite similar to the one below. > > Debugger entered: ("Unsure, tests=bogofilter, > spamicity=0.8469313635, version=0.10.3.1.cvs.20030227") (background for ding readers: Chris was having issues with bogofilter classification in spam.el) There's the problem, I think. "Unsure" is not a recognized spam flag. spam.el recognizes only "Yes" and "Spam" as spam-positive indicators. I'm not sure what to do here. Should I make "Unsure" an optional positive (and filter on a spamicity threshold then), filter based on the spamicity value alone, etc? Please comment if you use bogofilter with spam.el. Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-27 19:03 ` (: spam-split) doesn't work Ted Zlatanov @ 2003-02-27 20:06 ` Christopher Splinter 2003-02-27 20:56 ` Ted Zlatanov 0 siblings, 1 reply; 13+ messages in thread From: Christopher Splinter @ 2003-02-27 20:06 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: >> Note that the spamicity value differs from what `S t' returns -- >> which is a value of 1.0. Moreover, the *Article* buffer of the >> respective article is deprived of its body after hitting `B r' >> (which is what I did to get the backtrace) and the score of the >> header alone (= 0.8477387640) is quite similar to the one below. >> >> Debugger entered: ("Unsure, tests=bogofilter, >> spamicity=0.8469313635, version=0.10.3.1.cvs.20030227") > > (background for ding readers: Chris was having issues with bogofilter > classification in spam.el) > > There's the problem, I think. "Unsure" is not a recognized spam flag. I'm not sure about that. After all, the above value is not what bogofilter returns when the message, for which bogofilter returns that value when called via `B t' or `B r', is piped to bogofilter manually -- in that case, this is returned: X-Bogosity: Spam, tests=bogofilter, spamicity=1.0000000000, version=0.10.3.1.cvs.20030227 Therefore I suspect that bogofilter isn't fed the complete message. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-27 20:06 ` Christopher Splinter @ 2003-02-27 20:56 ` Ted Zlatanov 2003-02-27 22:05 ` Christopher Splinter 0 siblings, 1 reply; 13+ messages in thread From: Ted Zlatanov @ 2003-02-27 20:56 UTC (permalink / raw) On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: > Ted Zlatanov <tzz@lifelogs.com> writes: > >> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: >>> Note that the spamicity value differs from what `S t' returns -- >>> which is a value of 1.0. Moreover, the *Article* buffer of the >>> respective article is deprived of its body after hitting `B r' >>> (which is what I did to get the backtrace) and the score of the >>> header alone (= 0.8477387640) is quite similar to the one below. >>> >>> Debugger entered: ("Unsure, tests=bogofilter, >>> spamicity=0.8469313635, version=0.10.3.1.cvs.20030227") >> >> (background for ding readers: Chris was having issues with >> bogofilter classification in spam.el) >> >> There's the problem, I think. "Unsure" is not a recognized spam >> flag. > > I'm not sure about that. After all, the above value is not what > bogofilter returns when the message, for which bogofilter returns > that value when called via `B t' or `B r', is piped to bogofilter > manually -- in that case, this is returned: > > X-Bogosity: Spam, tests=bogofilter, spamicity=1.0000000000, > version=0.10.3.1.cvs.20030227 > > Therefore I suspect that bogofilter isn't fed the complete message. Sorry for the extensive quoting, there's a lot of context here. I remember we tested this, and the complete message was indeed fed to bogofilter, according to a debug statement. But maybe I was wrong. I remember some narrow/widen issues with IMAP, but I thought I had fixed those. Try replacing spam-bogofilter-path (which is normally the path to bogofilter) with the path to something like this (save this script anywhere and make it executable): #!/usr/bin/perl -w open SAVE,">/tmp/bogofilter.input" or die "Could not save output: $!"; while (<>) { print SAVE $_; } Then look at the /tmp/bogofilter.input file and see if it matches the message we're trying to classify. You can see what bogofilter would have said about it, too, if you do bogofilter -v /tmp/bogofilter.input Thanks Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-27 20:56 ` Ted Zlatanov @ 2003-02-27 22:05 ` Christopher Splinter 2003-02-27 22:29 ` Ted Zlatanov 0 siblings, 1 reply; 13+ messages in thread From: Christopher Splinter @ 2003-02-27 22:05 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: >> Ted Zlatanov <tzz@lifelogs.com> writes: >>> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: >>>> Note that the spamicity value differs from what `S t' returns -- >>>> which is a value of 1.0. Moreover, the *Article* buffer of the >>>> respective article is deprived of its body after hitting `B r' >>>> (which is what I did to get the backtrace) and the score of the >>>> header alone (= 0.8477387640) is quite similar to the one below. >>>> >>>> Debugger entered: ("Unsure, tests=bogofilter, >>>> spamicity=0.8469313635, version=0.10.3.1.cvs.20030227") >> After all, the above value is not what bogofilter returns when >> the message, for which bogofilter returns that value when >> called via `B t' or `B r', is piped to bogofilter manually -- >> in that case, this is returned: >> >> X-Bogosity: Spam, tests=bogofilter, spamicity=1.0000000000, >> version=0.10.3.1.cvs.20030227 >> >> Therefore I suspect that bogofilter isn't fed the complete message. > [Problems might be related to narrowing issues] > Try replacing spam-bogofilter-path (which is normally the path to > bogofilter) with the path to something like this (save this script > anywhere and make it executable): [...] > Then look at the /tmp/bogofilter.input file and see if it matches the > message we're trying to classify. As I had assumed, /tmp/bogofilter.input only contains the header, whereas the message which is to be classified contains a non-empty body. > You can see what bogofilter would have said about it, too, if > you do > > bogofilter -v /tmp/bogofilter.input Well, almost :-) bogofilter only takes input from stdin. X-Bogosity: Unsure, tests=bogofilter, spamicity=0.8524075477, version=0.10.3.1.cvs.20030227 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-27 22:05 ` Christopher Splinter @ 2003-02-27 22:29 ` Ted Zlatanov 2003-02-27 23:45 ` Christopher Splinter 0 siblings, 1 reply; 13+ messages in thread From: Ted Zlatanov @ 2003-02-27 22:29 UTC (permalink / raw) On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: > As I had assumed, /tmp/bogofilter.input only contains the header, > whereas the message which is to be classified contains a > non-empty body. > >> You can see what bogofilter would have said about it, too, if >> you do >> >> bogofilter -v /tmp/bogofilter.input > > Well, almost :-) bogofilter only takes input from stdin. > > X-Bogosity: Unsure, tests=bogofilter, spamicity=0.8524075477, > version=0.10.3.1.cvs.20030227 I see. Try setting nnimap-split-download-body to t. Does that fix it? If yes, I'll make it the default setting if you enable spam-use-bogofilter or any other statistical filter. But it will slow down IMAP mail retrieval, since the full message body will be downloaded for analysis. Does anyone object? I remember us checking this, and we got the full message body, that's why I didn't check this sooner. Sorry! Thanks Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-27 22:29 ` Ted Zlatanov @ 2003-02-27 23:45 ` Christopher Splinter 2003-02-28 16:43 ` Ted Zlatanov 0 siblings, 1 reply; 13+ messages in thread From: Christopher Splinter @ 2003-02-27 23:45 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: >> As I had assumed, /tmp/bogofilter.input only contains the header, >> whereas the message which is to be classified contains a >> non-empty body. [...] > > I see. Try setting nnimap-split-download-body to t. Does that fix > it? As I don't use IMAP, there's no effect. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-27 23:45 ` Christopher Splinter @ 2003-02-28 16:43 ` Ted Zlatanov 2003-02-28 16:55 ` Simon Josefsson 0 siblings, 1 reply; 13+ messages in thread From: Ted Zlatanov @ 2003-02-28 16:43 UTC (permalink / raw) Cc: Simon Josefsson On Fri, 28 Feb 2003, chris@splinter.inka.de wrote: > Ted Zlatanov <tzz@lifelogs.com> writes: > >> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: >>> As I had assumed, /tmp/bogofilter.input only contains the header, >>> whereas the message which is to be classified contains a >>> non-empty body. > [...] >> >> I see. Try setting nnimap-split-download-body to t. Does that fix >> it? > > As I don't use IMAP, there's no effect. I'm pretty sure Simon Josefsson did the necessary work to have the full message available in nnmail (nnml, specifically) splitting. See this message on the ding list and the surrounding discussion: <ilu7kct1e0d.fsf@latte.josefsson.org> Simon, am I supposed to do something special in spam-split to have the full message body available? I think (widen) wouldn't work, because it will remove all narrowing restrictions and I'll be looking at the full incoming mbox file, is that correct? I had assumed that your fix left the full message body in the buffer, but according to Chris' experience he's getting only the header. If it's not set up yet, it would be OK to have a variable analogous to nnimap-split-download-body to trigger that behavior optionally. Thanks Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-28 16:43 ` Ted Zlatanov @ 2003-02-28 16:55 ` Simon Josefsson 2003-02-28 17:07 ` Ted Zlatanov 0 siblings, 1 reply; 13+ messages in thread From: Simon Josefsson @ 2003-02-28 16:55 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Fri, 28 Feb 2003, chris@splinter.inka.de wrote: >> Ted Zlatanov <tzz@lifelogs.com> writes: >> >>> On Thu, 27 Feb 2003, chris@splinter.inka.de wrote: >>>> As I had assumed, /tmp/bogofilter.input only contains the header, >>>> whereas the message which is to be classified contains a >>>> non-empty body. >> [...] >>> >>> I see. Try setting nnimap-split-download-body to t. Does that fix >>> it? >> >> As I don't use IMAP, there's no effect. > > I'm pretty sure Simon Josefsson did the necessary work to have the > full message available in nnmail (nnml, specifically) splitting. See > this message on the ding list and the surrounding discussion: > > <ilu7kct1e0d.fsf@latte.josefsson.org> > > Simon, am I supposed to do something special in spam-split to have the > full message body available? I think (widen) wouldn't work, because > it will remove all narrowing restrictions and I'll be looking at the > full incoming mbox file, is that correct? I had assumed that your fix > left the full message body in the buffer, but according to Chris' > experience he's getting only the header. > > If it's not set up yet, it would be OK to have a variable analogous to > nnimap-split-download-body to trigger that behavior optionally. The behaviour enabled by n-s-d-b for nnmail should be the default. I think Gnus copies the message around, so (widen) should work. Maybe putting (progn (widen) (message (buffer-string))) in the split function will tell for sure, I haven't used nnmail splitting recently. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-28 16:55 ` Simon Josefsson @ 2003-02-28 17:07 ` Ted Zlatanov 2003-02-28 17:34 ` Simon Josefsson 0 siblings, 1 reply; 13+ messages in thread From: Ted Zlatanov @ 2003-02-28 17:07 UTC (permalink / raw) On Fri, 28 Feb 2003, jas@extundo.com wrote: > The behaviour enabled by n-s-d-b for nnmail should be the default. > I think Gnus copies the message around, so (widen) should work. > Maybe putting (progn (widen) (message (buffer-string))) in the split > function will tell for sure, I haven't used nnmail splitting > recently. OK, I see. I thought that you did the (widen) by default before the splitting was invoked in nnml somewhere, my fault for not checking. So if the user has selected a statistical spam analyzer, I'm going to call (widen) in all cases, and additionally set nnimap-split-download-body to t when spam.el is loaded. I think that's sensible, so we avoid extra downloading when a spam splitter is invoked that doesn't care about the message body, e.g. BBDB or whitelists/blacklists. Would a user ever want to use a statistical analyzer on the message headers only? Should I add a setting for that? Thanks Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-28 17:07 ` Ted Zlatanov @ 2003-02-28 17:34 ` Simon Josefsson 2003-02-28 21:22 ` Ted Zlatanov 0 siblings, 1 reply; 13+ messages in thread From: Simon Josefsson @ 2003-02-28 17:34 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Fri, 28 Feb 2003, jas@extundo.com wrote: >> The behaviour enabled by n-s-d-b for nnmail should be the default. >> I think Gnus copies the message around, so (widen) should work. >> Maybe putting (progn (widen) (message (buffer-string))) in the split >> function will tell for sure, I haven't used nnmail splitting >> recently. > > OK, I see. I thought that you did the (widen) by default before the > splitting was invoked in nnml somewhere, my fault for not checking. I don't think I had to change any part of nnmail, I remember that the (widen) hack worked when I last used nnmail splitting seriously, which probably was 5 years ago or so. > So if the user has selected a statistical spam analyzer, I'm going to > call (widen) in all cases, and additionally set > nnimap-split-download-body to t when spam.el is loaded. Sounds good. > Would a user ever want to use a statistical analyzer on the message > headers only? Should I add a setting for that? Wait until someone asks for it. :-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-28 17:34 ` Simon Josefsson @ 2003-02-28 21:22 ` Ted Zlatanov 2003-02-28 22:17 ` Christopher Splinter 0 siblings, 1 reply; 13+ messages in thread From: Ted Zlatanov @ 2003-02-28 21:22 UTC (permalink / raw) On Fri, 28 Feb 2003, jas@extundo.com wrote: >> So if the user has selected a statistical spam analyzer, I'm going >> to call (widen) in all cases, and additionally set >> nnimap-split-download-body to t when spam.el is loaded. > > Sounds good. Done and committed; the manual was also updated to mention spam-list-of-statistical-checks in the Extending spam.el section. I hope I'm setting nnimap-split-download-body to t in the right place (gnus-get-new-news-hook). Chris, if you want to test, go ahead and let me know how it works. Thanks Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-28 21:22 ` Ted Zlatanov @ 2003-02-28 22:17 ` Christopher Splinter 2003-03-01 15:01 ` Ted Zlatanov 0 siblings, 1 reply; 13+ messages in thread From: Christopher Splinter @ 2003-02-28 22:17 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: >>> So if the user has selected a statistical spam analyzer, I'm going >>> to call (widen) in all cases, and additionally set >>> nnimap-split-download-body to t when spam.el is loaded. > Done and committed; the manual was also updated to mention > spam-list-of-statistical-checks in the Extending spam.el section. > > Chris, if you want to test, go ahead and let me know how it works. It works properly now. Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: (: spam-split) doesn't work 2003-02-28 22:17 ` Christopher Splinter @ 2003-03-01 15:01 ` Ted Zlatanov 0 siblings, 0 replies; 13+ messages in thread From: Ted Zlatanov @ 2003-03-01 15:01 UTC (permalink / raw) On Fri, 28 Feb 2003, chris@splinter.inka.de wrote: > It works properly now. Thanks! Great. For anyone that uses spam.el with statistical analyzers (ifile/spam-stat/bogofilter), this means you'll get far more accurate results. Please let me know if you have any problems or notice classification is worse or improved. Also, as I mentioned there could be a way to force statistical analyzers to analyze only the headers if you are concerned about download time over IMAP. If anyone wants this feature, let me know. Thanks Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-03-01 15:01 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <87vfzyjx2v.fsf@splinter.inka.de> [not found] ` <4nisvxs1j6.fsf@lockgroove.bwh.harvard.edu> [not found] ` <874r7g2f4w.fsf@splinter.inka.de> [not found] ` <4nheb8qazq.fsf@lockgroove.bwh.harvard.edu> [not found] ` <87vfznc21y.fsf@splinter.inka.de> [not found] ` <4nk7g3gdbk.fsf@lockgroove.bwh.harvard.edu> [not found] ` <87vfzm7p2f.fsf@splinter.inka.de> [not found] ` <4nlm0iad77.fsf@lockgroove.bwh.harvard.edu> [not found] ` <877kc26j07.fsf@splinter.inka.de> [not found] ` <m3of5dc4tt.fsf@heechee.beld.net> [not found] ` <87el5talpw.fsf@splinter.inka.de> [not found] ` <4nfzq9k84r.fsf@chubby.bwh.harvard.edu> [not found] ` <87wujl8tj8.fsf@splinter.inka.de> 2003-02-27 19:03 ` (: spam-split) doesn't work Ted Zlatanov 2003-02-27 20:06 ` Christopher Splinter 2003-02-27 20:56 ` Ted Zlatanov 2003-02-27 22:05 ` Christopher Splinter 2003-02-27 22:29 ` Ted Zlatanov 2003-02-27 23:45 ` Christopher Splinter 2003-02-28 16:43 ` Ted Zlatanov 2003-02-28 16:55 ` Simon Josefsson 2003-02-28 17:07 ` Ted Zlatanov 2003-02-28 17:34 ` Simon Josefsson 2003-02-28 21:22 ` Ted Zlatanov 2003-02-28 22:17 ` Christopher Splinter 2003-03-01 15:01 ` Ted Zlatanov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).