* Problems with spam filtering and bogofilter.
@ 2003-01-10 12:33 Malcolm Purvis
2003-01-10 12:56 ` Ted Zlatanov
0 siblings, 1 reply; 26+ messages in thread
From: Malcolm Purvis @ 2003-01-10 12:33 UTC (permalink / raw)
All this discussion about spam filtering has made me try spam.el, in my case
with bogofilter and POP.
However, if I add (: spam-split) to the start of nnmail-split-fancy, errors
are produced during the split and all my mail gets sent to the bogus group.
I am using a fresh version of gnus and the latest version of bogofilter from
SourceForce (0.9.1.2). I see that spam.el's documentation refers to version
0.4 so perhaps there is come incompatibility? In particular, the output of
bogosort -v is:
X-Bogosity: No, tests=bogofilter, spamicity=0.355906, version=0.9.1.2
while spam-check-bogofilter is searching for
(re-search-forward "Spamicity: \\(0\\.9\\|1\\.0\\)" nil t)
This is all running under XEmacs 21.4.11 on PPC Linux.
Malcolm
--
Malcolm Purvis <malcolmpurvis@optushome.com.au>
The hidden, terrible cost of nuclear warfare is Really Bad Public Art.
- Angus McIntyre, alt.peeves, 13/3/02.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Problems with spam filtering and bogofilter.
2003-01-10 12:33 Problems with spam filtering and bogofilter Malcolm Purvis
@ 2003-01-10 12:56 ` Ted Zlatanov
2003-01-10 17:24 ` Raja R Harinath
2003-01-11 12:17 ` Problems with spam filtering and bogofilter Malcolm Purvis
0 siblings, 2 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-10 12:56 UTC (permalink / raw)
Cc: ding
On Fri, 10 Jan 2003, malcolmpurvis@optushome.com.au wrote:
> All this discussion about spam filtering has made me try spam.el, in
> my case with bogofilter and POP.
>
> However, if I add (: spam-split) to the start of nnmail-split-fancy,
> errors are produced during the split and all my mail gets sent to
> the bogus group.
>
> I am using a fresh version of gnus and the latest version of
> bogofilter from SourceForce (0.9.1.2). I see that spam.el's
> documentation refers to version 0.4 so perhaps there is come
> incompatibility? In particular, the output of bogosort -v is:
>
> X-Bogosity: No, tests=bogofilter, spamicity=0.355906,
> version=0.9.1.2
>
> while spam-check-bogofilter is searching for
>
> (re-search-forward "Spamicity: \\(0\\.9\\|1\\.0\\)" nil t)
>
> This is all running under XEmacs 21.4.11 on PPC Linux.
Yes, you need 0.4 currently. The bogofilter functionality is older
than the more modular implementation I did for ifile and spam-stat, so
it's a little more outdated. The question is, should I expend the
effort to keep up with bogofilter? Can you check to see if 0.9.1.2
has the same flags as 0.4? If so, and all I need to fix is the regex,
no big deal, but if the interface has changed then maybe I need to
rewrite the bogofilter section of spam.el anyway.
This will have to wait until Monday so I hope you don't have urgent
spam meanwhile :)
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Problems with spam filtering and bogofilter.
2003-01-10 12:56 ` Ted Zlatanov
@ 2003-01-10 17:24 ` Raja R Harinath
2003-01-13 19:17 ` Ted Zlatanov
2003-01-15 19:31 ` Ted Zlatanov
2003-01-11 12:17 ` Problems with spam filtering and bogofilter Malcolm Purvis
1 sibling, 2 replies; 26+ messages in thread
From: Raja R Harinath @ 2003-01-10 17:24 UTC (permalink / raw)
Cc: ding
Ted Zlatanov <tzz@lifelogs.com> writes:
[snip]
> Yes, you need 0.4 currently. The bogofilter functionality is older
> than the more modular implementation I did for ifile and spam-stat, so
> it's a little more outdated. The question is, should I expend the
> effort to keep up with bogofilter? Can you check to see if 0.9.1.2
> has the same flags as 0.4? If so, and all I need to fix is the regex,
> no big deal, but if the interface has changed then maybe I need to
> rewrite the bogofilter section of spam.el anyway.
>
> This will have to wait until Monday so I hope you don't have urgent
> spam meanwhile :)
The recommended procmail script for bogofilter 0.9.1.2 that seems to be
The following recipe (a) spam-bins anything that bogofil-
ter rates as spam, (b) adds the words in messages rated as
spam to the spam wordlist, and (c) adds the words in mes-
sages rated as non-spam to the non-spam wordlist. With
this in place, it will normally only be necessary for the
user to intervene (with -N or -S) when bogofilter miscate-
gorizes something.
# filter mail through bogofilter, tagging it as spam and
# updating the word lists
:0fw
| bogofilter -u -e -p
# if bogofilter failed, return the mail to the queue, the MTA will
# retry to deliver it later
# 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h
:0e
{ EXITCODE=75 HOST }
# file the mail to spam-bogofilter if it's spam.
:0:
* ^X-Bogosity: Yes, tests=bogofilter
spam-bogofilter
The '-u' however means that bogofilter has already integrated the
counts for the mail. So, the trick would be to do the following in
spam-process-bogofilter:
* ham marked and X-Bogosity: No => do nothing
* ham marked and X-Bogosity: Yes => | bogofilter -N
* ham marked and no X-Bogosity header => | bogofilter -n
* spam marked and X-Bogosity: No => | bogofilter -S
* spam marked and X-Bogosity: Yes => do nothing
* spam marked and no X-Bogosity header => | bogofilter -s
This should also handle the older version of bogofilter and the recipe
mentioned in 'spam.el'. (Also, the name of the X-Bogosity header
should be configurable, as it is configurable on the bogofilter side).
- Hari
--
Raja R Harinath ------------------------------ harinath@cs.umn.edu
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Problems with spam filtering and bogofilter.
2003-01-10 17:24 ` Raja R Harinath
@ 2003-01-13 19:17 ` Ted Zlatanov
2003-01-15 19:31 ` Ted Zlatanov
1 sibling, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-13 19:17 UTC (permalink / raw)
Cc: Malcolm Purvis, ding
On Fri, 10 Jan 2003, harinath@cs.umn.edu wrote:
> * ham marked and X-Bogosity: No => do nothing
> * ham marked and X-Bogosity: Yes => | bogofilter -N
> * ham marked and no X-Bogosity header => | bogofilter -n
>
> * spam marked and X-Bogosity: No => | bogofilter -S
> * spam marked and X-Bogosity: Yes => do nothing
> * spam marked and no X-Bogosity header => | bogofilter -s
Thanks for the summary, it was very helpful.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Problems with spam filtering and bogofilter.
2003-01-10 17:24 ` Raja R Harinath
2003-01-13 19:17 ` Ted Zlatanov
@ 2003-01-15 19:31 ` Ted Zlatanov
2003-01-15 21:17 ` Raja R Harinath
1 sibling, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 19:31 UTC (permalink / raw)
Cc: Malcolm Purvis, ding
On Fri, 10 Jan 2003, harinath@cs.umn.edu wrote:
> * ham marked and X-Bogosity: No => do nothing
> * ham marked and X-Bogosity: Yes => | bogofilter -N
> * ham marked and no X-Bogosity header => | bogofilter -n
>
> * spam marked and X-Bogosity: No => | bogofilter -S
> * spam marked and X-Bogosity: Yes => do nothing
> * spam marked and no X-Bogosity header => | bogofilter -s
Hmm, I think I like the exit codes of bogofilter (1 for ham, 0 for
spam). I'd rather use that than the procmail filtering, since IMAP
users often don't have procmail available.
We can add a new check (spam-check-bogofilter-headers) if anyone is
interested in filtering on the Bogofilter X-Bogosity header and it
gets inserted elsewhere in the mail delivery chain, but
spam-check-bogofilter should not rely on that. So speak up if you
want spam-check-bogofilter-headers added.
So the invocations will be:
ham-marked: bogofilter -n
spam-marked: bogofilter -s
Any last-minute protests? :)
I'll also add customization of the directory (-d DIRNAME). Most of
the current bogofilter complexity in spam.el will go away, and IMO
that is a good thing. The only thing missing will be training of
spam/ham backends, and I'll work on that next.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Problems with spam filtering and bogofilter.
2003-01-15 19:31 ` Ted Zlatanov
@ 2003-01-15 21:17 ` Raja R Harinath
2003-01-16 0:11 ` new Bogofilter functionality (was: Problems with spam filtering and bogofilter.) Ted Zlatanov
0 siblings, 1 reply; 26+ messages in thread
From: Raja R Harinath @ 2003-01-15 21:17 UTC (permalink / raw)
Cc: ding
Hi,
Ted Zlatanov <tzz@lifelogs.com> writes:
> On Fri, 10 Jan 2003, harinath@cs.umn.edu wrote:
>> * ham marked and X-Bogosity: No => do nothing
>> * ham marked and X-Bogosity: Yes => | bogofilter -N
>> * ham marked and no X-Bogosity header => | bogofilter -n
>>
>> * spam marked and X-Bogosity: No => | bogofilter -S
>> * spam marked and X-Bogosity: Yes => do nothing
>> * spam marked and no X-Bogosity header => | bogofilter -s
>
> Hmm, I think I like the exit codes of bogofilter (1 for ham, 0 for
> spam). I'd rather use that than the procmail filtering, since IMAP
> users often don't have procmail available.
That's fine. But, it would be nice to integrate with procmail for
people who can and do use procmail.
> We can add a new check (spam-check-bogofilter-headers) if anyone is
> interested in filtering on the Bogofilter X-Bogosity header and it
> gets inserted elsewhere in the mail delivery chain,
That can be achieved with the regular gnus splitting machinery. No
need for 'spam.el' to worry in this case, IMHO.
> but spam-check-bogofilter should not rely on that. So speak up if
> you want spam-check-bogofilter-headers added.
>
> So the invocations will be:
>
> ham-marked: bogofilter -n
> spam-marked: bogofilter -s
This should be automatically covered in the "no X-Bogosity header"
case above. You really want to check if 'bogofilter' already has
incorporated the counts for this message -- and the X-Bogosity header
is a good indicator.
Also I meant the above rules to be used in the post-processing routine
(spam-bogofilter-register-routine), not in the filtering part. My
suggestion above will work if the mail was processed by any of:
* procmail script with 'bogofilter -u -e -p'
* spam-split with spam-check-bogofilter
* any other splitting tool which may not even invoke bogofilter
- Hari
--
Raja R Harinath ------------------------------ harinath@cs.umn.edu
^ permalink raw reply [flat|nested] 26+ messages in thread
* new Bogofilter functionality (was: Problems with spam filtering and bogofilter.)
2003-01-15 21:17 ` Raja R Harinath
@ 2003-01-16 0:11 ` Ted Zlatanov
2003-01-16 11:00 ` new Bogofilter functionality Malcolm Purvis
0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16 0:11 UTC (permalink / raw)
Cc: Malcolm Purvis, ding
OK, I've added the revamped Bogofilter functionality.
In gnus.el, I added the gnus-group-ham-exit-processor-bogofilter ham
processor; it's customizable as usual for each group. So now
bogofilter can be used to process spam and ham on summary exit.
In spam.el, I added the spam-use-bogofilter-headers boolean for users
who just want to check for "X-Bogosity: Yes" in the headers when
invoking spam-split. It invokes the spam-check-bogofilter-headers
function. That function is also used by spam-check-bogofilter.
I also rewrote the spam-check-bogofilter and spam/ham bogofilter
registration. It's much simpler now, which I think is a good thing.
The spam-bogofilter-score functionality is working, using the
"spamicity=%f" header parameter.
I couldn't find a clean way in spam-check-bogofilter to use
call-process-region and get the return value of the process, so I use
the "-v" output of bogofilter. This works out OK, since it's the same
format as the one used in spam-check-bogofilter-headers, so we reuse
that function nicely.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: new Bogofilter functionality
2003-01-16 0:11 ` new Bogofilter functionality (was: Problems with spam filtering and bogofilter.) Ted Zlatanov
@ 2003-01-16 11:00 ` Malcolm Purvis
2003-01-16 11:55 ` Ted Zlatanov
0 siblings, 1 reply; 26+ messages in thread
From: Malcolm Purvis @ 2003-01-16 11:00 UTC (permalink / raw)
>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:
Ted> OK, I've added the revamped Bogofilter functionality.
Thanks! I know have spam.el working bogofilter. Hooray!
However, I had two minor problems:
1) You might like to consider re-wording the texinfo entry on the variable
gnus-ham-process-destinations. It implies that the variable can take a
string, while everywhere else says it needs a list.
2) The patch below enables blacklist exit processing when blacklist is
enabled, not bogofilter.
Malcolm
[malcolmp@c18072 gnus]$ cvs diff -c lisp/spam.el
Index: lisp/spam.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/spam.el,v
retrieving revision 6.53
diff -c -r6.53 spam.el
*** lisp/spam.el 2003/01/16 00:41:08 6.53
--- lisp/spam.el 2003/01/16 10:56:10
***************
*** 309,315 ****
(when (spam-group-spam-processor-stat-p gnus-newsgroup-name)
(spam-stat-register-spam-routine))
! (when (spam-group-spam-processor-bogofilter-p gnus-newsgroup-name)
(spam-blacklist-register-routine))
(if spam-move-spam-nonspam-groups-only
--- 309,315 ----
(when (spam-group-spam-processor-stat-p gnus-newsgroup-name)
(spam-stat-register-spam-routine))
! (when (spam-group-spam-processor-blacklist-p gnus-newsgroup-name)
(spam-blacklist-register-routine))
(if spam-move-spam-nonspam-groups-only
--
Malcolm Purvis <malcolmpurvis@optushome.com.au>
The hidden, terrible cost of nuclear warfare is Really Bad Public Art.
- Angus McIntyre, alt.peeves, 13/3/02.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: new Bogofilter functionality
2003-01-16 11:00 ` new Bogofilter functionality Malcolm Purvis
@ 2003-01-16 11:55 ` Ted Zlatanov
0 siblings, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16 11:55 UTC (permalink / raw)
Cc: ding
On Thu, 16 Jan 2003, malcolmpurvis@optushome.com.au wrote:
> 1) You might like to consider re-wording the texinfo entry on the
> variable gnus-ham-process-destinations. It implies that the
> variable can take a string, while everywhere else says it needs a
> list.
OK, done.
> 2) The patch below enables blacklist exit processing when blacklist
> is enabled, not bogofilter.
Yes, that was a bug. Thanks for catching it - fixed.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Problems with spam filtering and bogofilter.
2003-01-10 12:56 ` Ted Zlatanov
2003-01-10 17:24 ` Raja R Harinath
@ 2003-01-11 12:17 ` Malcolm Purvis
2003-01-13 19:16 ` Ted Zlatanov
1 sibling, 1 reply; 26+ messages in thread
From: Malcolm Purvis @ 2003-01-11 12:17 UTC (permalink / raw)
Cc: ding
>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:
Ted> Can you check to see if 0.9.1.2 has the same flags as 0.4?
Alas, I think that version 0.4 is no longer available (it's not kept in the
SourceForge project archives as far as I can see). However all the flags you
pass to bogofilter are still there so it should work (and indeed, the group
exit processing works fine).
One question: Within spam-check-bogofilter, the message being looked at is
refered to via gnus-summary-article-number. Is this valid within the context
of a fancy split? The description of what environment is available to
functions within a fancy split is, ahh, skimpy at best but since I get split
errors on startup, and there is no summary buffer then, perhaps this is the
cause?
Malcolm
--
Malcolm Purvis <malcolmpurvis@optushome.com.au>
The hidden, terrible cost of nuclear warfare is Really Bad Public Art.
- Angus McIntyre, alt.peeves, 13/3/02.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Problems with spam filtering and bogofilter.
2003-01-11 12:17 ` Problems with spam filtering and bogofilter Malcolm Purvis
@ 2003-01-13 19:16 ` Ted Zlatanov
2003-01-13 21:24 ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-13 19:16 UTC (permalink / raw)
Cc: ding
On Sat, 11 Jan 2003, malcolmpurvis@optushome.com.au wrote:
>>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:
>
> Ted> Can you check to see if 0.9.1.2 has the same flags as 0.4?
>
> Alas, I think that version 0.4 is no longer available (it's not kept
> in the SourceForge project archives as far as I can see). However
> all the flags you pass to bogofilter are still there so it should
> work (and indeed, the group exit processing works fine).
I'm thinking of a rewrite of the bogofilter functionality into
something that looks more like the ifile functionality - cleaner,
takes articles as strings, etc.
> One question: Within spam-check-bogofilter, the message being looked
> at is refered to via gnus-summary-article-number. Is this valid
> within the context of a fancy split? The description of what
> environment is available to functions within a fancy split is, ahh,
> skimpy at best but since I get split errors on startup, and there is
> no summary buffer then, perhaps this is the cause?
I didn't write the bogofilter functionality, so I'm not familiar with
all the assumptions it makes. I think the rewrite will be better in
that it will only work with the current article buffer, like all split
functions are supposed to. It may have *other* bugs but this one will
be gone. Hurrah for progress :)
I'll probably get around to it today or tomorrow...
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* requesting articles from a nnxyz backend: what's the fastest
2003-01-13 19:16 ` Ted Zlatanov
@ 2003-01-13 21:24 ` Ted Zlatanov
2003-01-14 7:22 ` Kai Großjohann
2003-01-14 18:16 ` Lars Magne Ingebrigtsen
0 siblings, 2 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-13 21:24 UTC (permalink / raw)
What's the fastest safe way to request an article from a nnxyz
backend? Special cases are OK - for instance, optimizing for
nnml/nnmaildir.
I'd like to optimize spam/ham processing, which currently takes an
article as a string out of the article buffer. This is slow, so I'd
like to map an article number to a file when possible.
This would also allow fast training of the spam/ham backends.
I don't need the article to be treated, washed, or prepared in any
way. That's why just the file name for file-based backends like
nnml/nnmaildir would be fine.
Thanks
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-13 21:24 ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
@ 2003-01-14 7:22 ` Kai Großjohann
2003-01-14 16:42 ` Ted Zlatanov
2003-01-14 18:16 ` Lars Magne Ingebrigtsen
1 sibling, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-14 7:22 UTC (permalink / raw)
Ted Zlatanov <tzz@lifelogs.com> writes:
> I'd like to optimize spam/ham processing, which currently takes an
> article as a string out of the article buffer. This is slow, so I'd
> like to map an article number to a file when possible.
I wonder if it is faster to fetch the article into a buffer and to
pipe it to the command? Would that work with the commands used by
spam.el?
If this is fast enough, it would avoid special-casing for
file-per-msg backends.
--
Ambibibentists unite!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-14 7:22 ` Kai Großjohann
@ 2003-01-14 16:42 ` Ted Zlatanov
2003-01-14 20:39 ` Kai Großjohann
0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-14 16:42 UTC (permalink / raw)
Cc: ding
On Tue, 14 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> I'd like to optimize spam/ham processing, which currently takes an
>> article as a string out of the article buffer. This is slow, so
>> I'd like to map an article number to a file when possible.
>
> I wonder if it is faster to fetch the article into a buffer and to
> pipe it to the command? Would that work with the commands used by
> spam.el?
That's exactly how spam.el operates now, except for bogofilter - the
whole article is passed to the spam/ham processor as a string.
> If this is fast enough, it would avoid special-casing for
> file-per-msg backends.
I am sure it's faster to invoke ifile (for instance) once on a
directory full of files, than it is to invoke ifile repeatedly on each
file in that directory, fetching that file as an article through Gnus.
That's why I asked if there was an "approved" way to map articles to
files for backends that support it.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-14 16:42 ` Ted Zlatanov
@ 2003-01-14 20:39 ` Kai Großjohann
2003-01-15 17:27 ` Ted Zlatanov
0 siblings, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-14 20:39 UTC (permalink / raw)
Ted Zlatanov <tzz@lifelogs.com> writes:
> That's exactly how spam.el operates now, except for bogofilter - the
> whole article is passed to the spam/ham processor as a string.
Your wording "as a string" sounds as if buffer-string or
buffer-substring is involved.
What I meant is call-process-region...
(I suspect that string processing in Emacs is slower than doing it
with text in buffers.)
--
Ambibibentists unite!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-14 20:39 ` Kai Großjohann
@ 2003-01-15 17:27 ` Ted Zlatanov
2003-01-16 11:56 ` Kai Großjohann
0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 17:27 UTC (permalink / raw)
Cc: ding
On Tue, 14 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> That's exactly how spam.el operates now, except for bogofilter -
>> the whole article is passed to the spam/ham processor as a string.
>
> Your wording "as a string" sounds as if buffer-string or
> buffer-substring is involved.
>
> What I meant is call-process-region...
The article is transferred between functions as a string currently,
but passed through call-process-region to the child process. I'm
working on making the parameter-passing less string-bound and more
oriented towards buffer reuse. This is my first mid-size Emacs Lisp
project, so bear with me as I learn to use buffers better.
> (I suspect that string processing in Emacs is slower than doing it
> with text in buffers.)
Probably, by a small margin (we're working with one article at a time
anyway). The only extra penalty is that an article buffer is copied
into a string, and that string is copied into a temporary buffer.
Could this be a problem with large messages in general? Should I set
a size limit on a message and only process up to that point with
ifile/bogofilter/etc.?
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-15 17:27 ` Ted Zlatanov
@ 2003-01-16 11:56 ` Kai Großjohann
2003-01-16 12:30 ` Ted Zlatanov
0 siblings, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-16 11:56 UTC (permalink / raw)
Ted Zlatanov <tzz@lifelogs.com> writes:
> Probably, by a small margin (we're working with one article at a time
> anyway). The only extra penalty is that an article buffer is copied
> into a string, and that string is copied into a temporary buffer.
> Could this be a problem with large messages in general? Should I set
> a size limit on a message and only process up to that point with
> ifile/bogofilter/etc.?
Why can't you call-process-region on the article buffer?
--
Ambibibentists unite!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-16 11:56 ` Kai Großjohann
@ 2003-01-16 12:30 ` Ted Zlatanov
2003-01-16 12:48 ` Kai Großjohann
0 siblings, 1 reply; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16 12:30 UTC (permalink / raw)
Cc: ding
On Thu, 16 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> Probably, by a small margin (we're working with one article at a
>> time anyway). The only extra penalty is that an article buffer is
>> copied into a string, and that string is copied into a temporary
>> buffer. Could this be a problem with large messages in general?
>> Should I set a size limit on a message and only process up to that
>> point with ifile/bogofilter/etc.?
>
> Why can't you call-process-region on the article buffer?
I was trying to make the functionality generic, based on strings and
buffers instead of article numbers. Also, my knowledge of Emacs Lisp
was not quite sufficient :)
I just added spam-get-article-as-buffer, and
spam-get-article-as-string uses it now. But
spam-get-article-as-string does a string copy; I will gradually
convert the places where it's called into calls to
spam-get-article-as-buffer. I'll probably end up doing a generic
shell command interface, actually. That will also deal with the
annoyance of having two cases for each command, depending on whether
a parameter is passed or not.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-16 12:30 ` Ted Zlatanov
@ 2003-01-16 12:48 ` Kai Großjohann
2003-01-16 13:51 ` Ted Zlatanov
0 siblings, 1 reply; 26+ messages in thread
From: Kai Großjohann @ 2003-01-16 12:48 UTC (permalink / raw)
Ted Zlatanov <tzz@lifelogs.com> writes:
> That will also deal with the annoyance of having two cases for each
> command, depending on whether a parameter is passed or not.
Was the problem that you need to call call-process (or
call-process-region) with a different number of parameters?
(apply 'call-process-region PARM1 PARM2 LIST-OF-OTHER-PARMS)
The LIST-OF-OTHER-PARMS can be nil if you need only two params.
I hope I'm not annoying you with this simple stuff. You keep saying
your Lisp knowledge is not good enough -- I somehow doubt it :-)
--
Ambibibentists unite!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-16 12:48 ` Kai Großjohann
@ 2003-01-16 13:51 ` Ted Zlatanov
0 siblings, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-16 13:51 UTC (permalink / raw)
Cc: ding
On Thu, 16 Jan 2003, kai.grossjohann@uni-duisburg.de wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> That will also deal with the annoyance of having two cases for each
>> command, depending on whether a parameter is passed or not.
>
> Was the problem that you need to call call-process (or
> call-process-region) with a different number of parameters?
>
> (apply 'call-process-region PARM1 PARM2 LIST-OF-OTHER-PARMS)
>
> The LIST-OF-OTHER-PARMS can be nil if you need only two params.
The problem is (a simple example):
(call-process-region (point-min) (point-max)
"/bin/cat" nil nil nil "-switch1" "-switch2" spam-bogofilter-database-directory)
so with apply:
(apply 'call-process-region (point-min) (point-max) "/bin/cat" nil nil nil
'("-v" "-d" nil))
I don't think I can use apply on that, when
spam-bogofilter-database-directory is nil. nil will still get passed
to call-process-region, which requires all command-line arguments to
pass stringp. I just need a function to filter out nil arguments, and
I will roll that into the generic shell interface.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-13 21:24 ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
2003-01-14 7:22 ` Kai Großjohann
@ 2003-01-14 18:16 ` Lars Magne Ingebrigtsen
2003-01-14 20:42 ` Kai Großjohann
2003-01-15 19:21 ` Ted Zlatanov
1 sibling, 2 replies; 26+ messages in thread
From: Lars Magne Ingebrigtsen @ 2003-01-14 18:16 UTC (permalink / raw)
Ted Zlatanov <tzz@lifelogs.com> writes:
> What's the fastest safe way to request an article from a nnxyz
> backend? Special cases are OK - for instance, optimizing for
> nnml/nnmaildir.
In general, you have to call `nnxyx-request-article', but if you have
complete control over which (virtual) server is active and stuff like
that, you can use
(nnml-possibly-change-directory group)
to set the current group, and
(expand-file-name "number" nnml-current-directory)
to get the path name of article NUMBER.
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-14 18:16 ` Lars Magne Ingebrigtsen
@ 2003-01-14 20:42 ` Kai Großjohann
2003-01-15 19:21 ` Ted Zlatanov
1 sibling, 0 replies; 26+ messages in thread
From: Kai Großjohann @ 2003-01-14 20:42 UTC (permalink / raw)
Lars Magne Ingebrigtsen <larsi@gnus.org> writes:
> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> What's the fastest safe way to request an article from a nnxyz
>> backend? Special cases are OK - for instance, optimizing for
>> nnml/nnmaildir.
>
> In general, you have to call `nnxyx-request-article',
nnxyz, nnxyx, what will we have next?
Sheesh. Watch your language! It's nnchoke!
--
Ambibibentists unite!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-14 18:16 ` Lars Magne Ingebrigtsen
2003-01-14 20:42 ` Kai Großjohann
@ 2003-01-15 19:21 ` Ted Zlatanov
2003-01-15 20:10 ` Lars Magne Ingebrigtsen
2003-01-17 17:46 ` Paul Jarc
1 sibling, 2 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 19:21 UTC (permalink / raw)
On Tue, 14 Jan 2003, larsi@gnus.org wrote:
> In general, you have to call `nnxyx-request-article', but if you
> have complete control over which (virtual) server is active and
> stuff like that, you can use
>
> (nnml-possibly-change-directory group)
>
> to set the current group, and
>
> (expand-file-name "number" nnml-current-directory)
>
> to get the path name of article NUMBER.
Assuming I have to enter the group I want to process, would this work
to get the filename of an article once I've entered the group?
(defun spam-get-article-as-filename (article)
(let ((article-filename))
(when (numberp article)
(nnml-possibly-change-directory (gnus-group-real-name gnus-newsgroup-name))
(setq article-filename (expand-file-name (int-to-string article) nnml-current-directory)))
(if (file-exists-p article-filename)
article-filename
nil)))
nnml-possibly-change-directory doesn't seem to transform something
like "ding.info" into "ding/info" so I'm definitely missing
something.
More generally, can we add to Gnus the functionality of transforming a
group name + an article number into a filename, when it's possible?
Maybe in gnus-int.el? I'd like to do something like this:
1. get list of articles
2. optionally filter list for spam/ham articles
3. if the group doesn't have 1 file per article, retrieve each article
into a temporary file (nnimap, nntp, etc. without the agent)
3. get file names for the articles
4. send file names to bogofilter/ifile/etc. to be studied
This would be significantly faster than requesting the article and
then invoking process-region on the buffer, I think. The function
should use the agent when possible (for nnimap groups, for instance).
I would do it myself, but I think it would be easier for someone who
understands the internals of the nn*.el code. I looked at nnml.el and
others, but it would take me a while to write this code. If no one is
interested I can try it, but it will take a while.
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-15 19:21 ` Ted Zlatanov
@ 2003-01-15 20:10 ` Lars Magne Ingebrigtsen
2003-01-15 22:14 ` Ted Zlatanov
2003-01-17 17:46 ` Paul Jarc
1 sibling, 1 reply; 26+ messages in thread
From: Lars Magne Ingebrigtsen @ 2003-01-15 20:10 UTC (permalink / raw)
Ted Zlatanov <tzz@lifelogs.com> writes:
> (defun spam-get-article-as-filename (article)
> (let ((article-filename))
> (when (numberp article)
> (nnml-possibly-change-directory (gnus-group-real-name gnus-newsgroup-name))
> (setq article-filename (expand-file-name (int-to-string article) nnml-current-directory)))
> (if (file-exists-p article-filename)
> article-filename
> nil)))
>
> nnml-possibly-change-directory doesn't seem to transform something
> like "ding.info" into "ding/info" so I'm definitely missing
> something.
`nnml-current-directory' should really be the actual directory after
selecting the group.
> More generally, can we add to Gnus the functionality of transforming a
> group name + an article number into a filename, when it's possible?
That's only possible for a very limited number of back ends. nnml
and nnmh, basically, so I'm not sure how useful that would be.
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-15 20:10 ` Lars Magne Ingebrigtsen
@ 2003-01-15 22:14 ` Ted Zlatanov
0 siblings, 0 replies; 26+ messages in thread
From: Ted Zlatanov @ 2003-01-15 22:14 UTC (permalink / raw)
On Wed, 15 Jan 2003, larsi@gnus.org wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>> nnml-possibly-change-directory doesn't seem to transform something
>> like "ding.info" into "ding/info" so I'm definitely missing
>> something.
>
> `nnml-current-directory' should really be the actual directory after
> selecting the group.
OK, I *was* missing something :)
>> More generally, can we add to Gnus the functionality of
>> transforming a group name + an article number into a filename, when
>> it's possible?
>
> That's only possible for a very limited number of back ends. nnml
> and nnmh, basically, so I'm not sure how useful that would be.
Considering almost everyone using Gnus for mail has some sort of nnml
set up, I think it makes a lot of sense. nnmaildir also has the
article <-> filename correspondence, now that I think about it.
It's fine if the function is in nnml.el/nnmaildir.el instead of
gnus-int.el, that's not important. I am just trying to avoid writing
8 spam-article-to-filename special cases, when it might be something
better handled through the usual internal Gnus system of (1. is the
function supported by the backend?, 2. invoke the function).
Ted
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: requesting articles from a nnxyz backend: what's the fastest
2003-01-15 19:21 ` Ted Zlatanov
2003-01-15 20:10 ` Lars Magne Ingebrigtsen
@ 2003-01-17 17:46 ` Paul Jarc
1 sibling, 0 replies; 26+ messages in thread
From: Paul Jarc @ 2003-01-17 17:46 UTC (permalink / raw)
Ted Zlatanov <tzz@lifelogs.com> wrote:
> More generally, can we add to Gnus the functionality of transforming a
> group name + an article number into a filename, when it's possible?
nnmaildir has:
nnmaildir-article-number-to-file-name
nnmaildir-article-number-to-base-name
nnmaildir-base-name-to-article-number
> 1. get list of articles
You can get the min and max numbers from gnus-active.
paul
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2003-01-17 17:46 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-10 12:33 Problems with spam filtering and bogofilter Malcolm Purvis
2003-01-10 12:56 ` Ted Zlatanov
2003-01-10 17:24 ` Raja R Harinath
2003-01-13 19:17 ` Ted Zlatanov
2003-01-15 19:31 ` Ted Zlatanov
2003-01-15 21:17 ` Raja R Harinath
2003-01-16 0:11 ` new Bogofilter functionality (was: Problems with spam filtering and bogofilter.) Ted Zlatanov
2003-01-16 11:00 ` new Bogofilter functionality Malcolm Purvis
2003-01-16 11:55 ` Ted Zlatanov
2003-01-11 12:17 ` Problems with spam filtering and bogofilter Malcolm Purvis
2003-01-13 19:16 ` Ted Zlatanov
2003-01-13 21:24 ` requesting articles from a nnxyz backend: what's the fastest Ted Zlatanov
2003-01-14 7:22 ` Kai Großjohann
2003-01-14 16:42 ` Ted Zlatanov
2003-01-14 20:39 ` Kai Großjohann
2003-01-15 17:27 ` Ted Zlatanov
2003-01-16 11:56 ` Kai Großjohann
2003-01-16 12:30 ` Ted Zlatanov
2003-01-16 12:48 ` Kai Großjohann
2003-01-16 13:51 ` Ted Zlatanov
2003-01-14 18:16 ` Lars Magne Ingebrigtsen
2003-01-14 20:42 ` Kai Großjohann
2003-01-15 19:21 ` Ted Zlatanov
2003-01-15 20:10 ` Lars Magne Ingebrigtsen
2003-01-15 22:14 ` Ted Zlatanov
2003-01-17 17:46 ` Paul Jarc
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).