From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53072 Path: main.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.gnus.general Subject: Re: spam-stat and base64 encoded messages Date: Mon, 09 Jun 2003 16:06:42 -0400 Organization: =?koi8-r?q?=F4=C5=CF=C4=CF=D2=20=FA=CC=C1=D4=C1=CE=CF=D7?= @ Cienfuegos Sender: ding-owner@lists.math.uh.edu Message-ID: <4n1xy32ey5.fsf@lockgroove.bwh.harvard.edu> References: <03of1dyplo.fsf@msgid.viggen.net> <4nof1cguhz.fsf@holmes.bwh.harvard.edu> <4nn0gv28m3.fsf@lockgroove.bwh.harvard.edu> <0365niyeq8.fsf@msgid.viggen.net> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1055189115 18878 80.91.224.249 (9 Jun 2003 20:05:15 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 9 Jun 2003 20:05:15 +0000 (UTC) Cc: John Owens Original-X-From: ding-owner+M1616@lists.math.uh.edu Mon Jun 09 22:05:13 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19PSsn-0004sB-00 for ; Mon, 09 Jun 2003 22:04:45 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19PSv5-0004o4-00; Mon, 09 Jun 2003 15:07:08 -0500 Original-Received: from sclp3.sclp.com ([64.157.176.121]) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19PSv1-0004nz-00 for ding@lists.math.uh.edu; Mon, 09 Jun 2003 15:07:03 -0500 Original-Received: (qmail 34848 invoked by alias); 9 Jun 2003 20:07:03 -0000 Original-Received: (qmail 34843 invoked from network); 9 Jun 2003 20:07:03 -0000 Original-Received: from clifford.bwh.harvard.edu (134.174.9.41) by sclp3.sclp.com with SMTP; 9 Jun 2003 20:07:03 -0000 Original-Received: from lockgroove.bwh.harvard.edu (lockgroove [134.174.9.133]) by clifford.bwh.harvard.edu (8.10.2+Sun/8.11.0) with ESMTP id h59K6hI20859; Mon, 9 Jun 2003 16:06:43 -0400 (EDT) Original-Received: (from tzz@localhost) by lockgroove.bwh.harvard.edu (8.11.6+Sun/8.11.0) id h59K6gm26499; Mon, 9 Jun 2003 16:06:42 -0400 (EDT) Original-To: ding@gnus.org X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6;d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" Mail-Followup-To: ding@gnus.org, John Owens In-Reply-To: (Jesper Harder's message of "Mon, 09 Jun 2003 03:21:54 +0200") User-Agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (usg-unix-v) Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:53072 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53072 On Mon, 09 Jun 2003, harder@myrealbox.com wrote: > I think that to make this work correctly, you'll need to parse the > MIME structure of the message, and then apply the proper decoding to > the approriate parts. Hmm, are you sure we need to do full MIME parsing? That would slow down the incoming mail splitting a lot, I would think. But I don't know all the Gnus MIME parsing functionality, or how fast it is. See below for more questions. John Owens (cc-ed on this) was asking about forwarded spam messages, which are inside an envelope from SpamAssassin. That's another case where spam-split or spam-stat-split has to do a lot of parsing. Maybe there's a better way? We can invoke spam-split or spam-stat-split on each part of the messages, then if they return t we know it's ham; if they return a string it's spam, and nil means the part was neither. In other words, we don't care about the deep structure, for instance one attachment inside another. We just want to find MIME boundaries, take the text up to the next MIME boundary (even if it includes other MIME boundaries), decode if needed (no decoding should be done on plain text!), and analyze the part. Is that possible already? Referring to the decode-if-needed part, is the gnus-article-decode-hook going to try decoding content even if it's plain text or is there some detection done? If not, spam.el and spam-stat.el or gnus-art.el should do some heuristics. Thanks Ted