From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/9793 Path: main.gmane.org!not-for-mail From: anonymous@sunsite.auc.dk Newsgroups: gmane.emacs.gnus.general Subject: Re: nnmail-split-it Date: 4 Feb 1997 06:16:32 -0000 Sender: paul@fester.cs.washington.edu Message-ID: <19970204061632.24946.qmail@sunsite.auc.dk> References: NNTP-Posting-Host: coloc-standby.netfonds.no X-Trace: main.gmane.org 1035149763 20588 80.91.224.250 (20 Oct 2002 21:36:03 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 20 Oct 2002 21:36:03 +0000 (UTC) Return-Path: Original-Received: from ifi.uio.no (0@ifi.uio.no [129.240.64.2]) by deanna.miranova.com (8.8.5/8.8.5) with SMTP id WAA11594 for ; Mon, 3 Feb 1997 22:34:20 -0800 Original-Received: from sunsite.auc.dk (qmailr@sunsite.auc.dk [130.225.51.30]) by ifi.uio.no with SMTP (8.6.11/ifi2.4) id for ; Tue, 4 Feb 1997 07:16:34 +0100 Original-Received: (qmail 24948 invoked by uid 509); 4 Feb 1997 06:16:32 -0000 Original-To: ding@ifi.uio.no Original-Newsgroups: emacs.ding Xref: main.gmane.org gmane.emacs.gnus.general:9793 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:9793 From: Paul Franklin Date: 03 Feb 1997 22:16:24 -0800 Message-ID: Organization: Computer Science, U of Washington, Seattle, WA, USA Lines: 192 X-Newsreader: Gnus v5.4.10/Emacs 19.34 Path: fester.cs.washington.edu NNTP-Posting-Host: fester.cs.washington.edu Warning: I'm about to throw out some performance numbers from what I remember from 6 months ago when my spool was on a local disk... >>>>> David Moore writes: > Lars Magne Ingebrigtsen writes: >> Paul Franklin writes: >> > Hmm. I wrote some elisp code to do splitting like this. I didn't >> > distribute it because: >> > * I realized that the bottleneck was disk access time (over NFS). >> The box I'm sitting with now is a 486/slow without NFS, and splitting >> is kinda slow here as well. > Two different costs. There is a per message cost (like NFS and > file stating). There is also a per split cost (which is roughly O(n*m) > where n is the number of splits, and m is the number of headers in the > message). I tried hard to lower the per split cost while not worrying as much about the per message cost. The significant per split costs are an assq and a string-match. But the per header line costs aren't small; I'd be very surprised if the per header line cost were lower than the per split cost. >> > It generates a alist of headers, unwrapping lines within headers and >> > separating values from duplicate headers with "\n". You then match >> > with a header or multiple ones concatenated (very useful, for me at >> > least). I never compared them with the default split rules, but I'm >> > fairly sure that this code is tight enough that it's very unlikely to >> > be a bottleneck. > This is similar to what I suggested, but I wasn't going to > bother to put the headers into concatenated strings, since that is quite > slow itself. But tracking the start/end position of those strings makes > doing a buffer regexp search much much faster since it limits the scope > of the search. I did this for flexibility, not speed. It allows searching, in order, from, apparently-from, to, cc, apparently-to, ... with a single rule. I really wanted this, so if I was going to write my own split function, it was going to have this feature. I'm attaching my code with sample rules, in case people want to experiment, run timing tests, or whatever. Until I spend some effort to clean it up for qgnus, if Lars wants to include it (at which point it'll be GPL'd), or decide not to clean it up, please don't redistribute it. I suppose Lars will want me to do something other than performing list surgery on a user-configurable variable. (Yes, this is truly evil code.) Be warned, I'm likely to change the rule forms to ;; (GROUP . REGEXP) ;; (GROUP WORDS...) where the second is converted to the first by inserting "\\<", "\\>", and "\\|" as appropriate. --Paul ;;Copyright 1996, 1997 Paul Franklin (setq nnmail-split-methods 'pdf-nnmail-split-function) (setq pdf-nnmail-split-abbrev-alist ;;Using these is particularly efficient ;;because their expansions are cached. ;; Elements are of the form ;; (ABBREV . HEADER-LIST) ;;which is equivalent to ;; (ABBREV HEADERS...) '( (f from sender) (l f to apparently-to) (t to apparently-to cc) (a f t) (s a subject))) (setq pdf-nnmail-split-methods (list ;;Rule groups are of the form (HEADER-LIST RULES...) ;;Headers are specified with lowercase symbols, not strings. ;;Rules come in two forms: ;; (GROUP . REGEXP) ;; (GROUP REGEXPS...) ;; The second is converted to the first by list surgery (!); ;; "\\|" is inserted between regexps. ;;Rule groups are considered in order, a match terminates the ;;search. ;;Rules withing a rule group are considered simultaneously, ;;with the one matching earlier in the specified headers ;;winning. '((gnus-warning) ("-mail.duplicates" . "\\")) '((a) ("-conf.cs.chi97.sv" "\\" "\\") ("-net.gnus.list" . "\\")) '((subject) ("-uw.cs.csl.dots" . "dot")) '((t) ("-seminar.uw.cs.systems" "\\" "\\" "\\") ("-seminar.uw.cs.ui" . "\\") ("-seminar.uw.cs.lis" "590m\\>" "\\<590f\\>" "\\") ("-seminar.uw.cs.arch" "\\" "590g\\>")) '((s) ("-class.uw.cse-568" "568\\>") ("-uw.cs.acm" "\\") ("-uw.cs.sports" "\\" "\\" "\\" "\\" "\\") ("-uw.cs.room.sieg-431" . "\\<431\\>") ("-uw.cs.csl.uns" . "\\