From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/9793
Path: main.gmane.org!not-for-mail
From: anonymous@sunsite.auc.dk
Newsgroups: gmane.emacs.gnus.general
Subject: Re: nnmail-split-it
Date: 4 Feb 1997 06:16:32 -0000
Sender: paul@fester.cs.washington.edu
Message-ID: <19970204061632.24946.qmail@sunsite.auc.dk>
References: <xofiv4aual5.fsf@blubb.pdc.kth.se> <rvraixqxkn.fsf@sdnp5.ucsd.edu> <r9q3evdpde3.fsf@snowking.cs.washington.edu> <m2raixpc6h.fsf@proletcult.slip.ifi.uio.no> <rvn2tlqjcs.fsf@sdnp5.ucsd.edu>
NNTP-Posting-Host: coloc-standby.netfonds.no
X-Trace: main.gmane.org 1035149763 20588 80.91.224.250 (20 Oct 2002 21:36:03 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Sun, 20 Oct 2002 21:36:03 +0000 (UTC)
Return-Path: <ding-request@ifi.uio.no>
Original-Received: from ifi.uio.no (0@ifi.uio.no [129.240.64.2])
	by deanna.miranova.com (8.8.5/8.8.5) with SMTP id WAA11594
	for <steve@miranova.com>; Mon, 3 Feb 1997 22:34:20 -0800
Original-Received: from sunsite.auc.dk (qmailr@sunsite.auc.dk [130.225.51.30]) by ifi.uio.no with SMTP (8.6.11/ifi2.4) 
	id <HAA01306@ifi.uio.no> for <ding@ifi.uio.no> ; Tue, 4 Feb 1997 07:16:34 +0100
Original-Received: (qmail 24948 invoked by uid 509); 4 Feb 1997 06:16:32 -0000
Original-To: ding@ifi.uio.no
Original-Newsgroups: emacs.ding
Xref: main.gmane.org gmane.emacs.gnus.general:9793
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:9793

From: Paul Franklin <paul@cs.washington.edu>
Date: 03 Feb 1997 22:16:24 -0800
Message-ID: <r9qpvyhayfb.fsf@fester.cs.washington.edu>
Organization: Computer Science, U of Washington, Seattle, WA, USA
Lines: 192
X-Newsreader: Gnus v5.4.10/Emacs 19.34
Path: fester.cs.washington.edu
NNTP-Posting-Host: fester.cs.washington.edu

Warning:  I'm about to throw out some performance numbers from what I
remember from 6 months ago when my spool was on a local disk...

>>>>> David Moore writes:

 > Lars Magne Ingebrigtsen <larsi@ifi.uio.no> writes:

 >> Paul Franklin <paul@cs.washington.edu> writes:

 >> > Hmm.  I wrote some elisp code to do splitting like this.  I didn't
 >> > distribute it because:

 >> > * I realized that the bottleneck was disk access time (over NFS).

 >> The box I'm sitting with now is a 486/slow without NFS, and splitting
 >> is kinda slow here as well.

 > 	Two different costs.  There is a per message cost (like NFS and
 > file stating).  There is also a per split cost (which is roughly O(n*m)
 > where n is the number of splits, and m is the number of headers in the
 > message).

I tried hard to lower the per split cost while not worrying as much
about the per message cost.  The significant per split costs are an
assq and a string-match.  But the per header line costs aren't small;
I'd be very surprised if the per header line cost were lower than the
per split cost.

 >> > It generates a alist of headers, unwrapping lines within headers and
 >> > separating values from duplicate headers with "\n".  You then match
 >> > with a header or multiple ones concatenated (very useful, for me at
 >> > least).  I never compared them with the default split rules, but I'm
 >> > fairly sure that this code is tight enough that it's very unlikely to
 >> > be a bottleneck.

 > 	This is similar to what I suggested, but I wasn't going to
 > bother to put the headers into concatenated strings, since that is quite
 > slow itself.  But tracking the start/end position of those strings makes
 > doing a buffer regexp search much much faster since it limits the scope
 > of the search.

I did this for flexibility, not speed.  It allows searching, in order,
from, apparently-from, to, cc, apparently-to, ... with a single rule.
I really wanted this, so if I was going to write my own split
function, it was going to have this feature.

I'm attaching my code with sample rules, in case people want to
experiment, run timing tests, or whatever.  Until I spend some effort
to clean it up for qgnus, if Lars wants to include it (at which point
it'll be GPL'd), or decide not to clean it up, please don't
redistribute it.  I suppose Lars will want me to do something other
than performing list surgery on a user-configurable variable.  (Yes,
this is truly evil code.)

Be warned, I'm likely to change the rule forms to
	;;	(GROUP . REGEXP)
	;;	(GROUP WORDS...)
where the second is converted to the first by inserting "\\<", "\\>",
and "\\|" as appropriate.

--Paul

;;Copyright 1996, 1997 Paul Franklin

(setq nnmail-split-methods 'pdf-nnmail-split-function)

(setq pdf-nnmail-split-abbrev-alist
	;;Using these is particularly efficient 
	;;because their expansions are cached.
	;; Elements are of the form
	;; (ABBREV . HEADER-LIST)
	;;which is equivalent to
	;; (ABBREV HEADERS...)
      '(
	(f from sender)
	(l f to apparently-to)
	(t to apparently-to cc)
	(a f t)
	(s a subject)))

(setq pdf-nnmail-split-methods
	(list

	;;Rule groups are of the form (HEADER-LIST RULES...)
	;;Headers are specified with lowercase symbols, not strings.
	;;Rules come in two forms:
	;;	(GROUP . REGEXP)
	;;	(GROUP REGEXPS...)
	;;  The second is converted to the first by list surgery (!);
	;;  "\\|" is inserted between regexps.

	;;Rule groups are considered in order, a match terminates the
	;;search.

	;;Rules withing a rule group are considered simultaneously,
	;;with the one matching earlier in the specified headers
	;;winning.

	 '((gnus-warning)
	   ("-mail.duplicates" . "\\<duplicate\\>"))

	 '((a)
	   ("-conf.cs.chi97.sv" 
	    "\\<chi97-sv\\>" "\\<tutorial-chi97\\>")
	   ("-net.gnus.list" . "\\<ding@ifi\\.uio\\.no\\>"))

	 '((subject) 
	   ("-uw.cs.csl.dots" . "dot"))

	 '((t)
	   ("-seminar.uw.cs.systems"
	    "\\<cse590s\\>" "\\<cer-systems\\>" "\\<uw-systems\\>")
	   ("-seminar.uw.cs.ui" . "\\<ui-students\\>")
	   ("-seminar.uw.cs.lis" 
	    "590m\\>" "\\<590f\\>" "\\<vlsi\\>")
	   ("-seminar.uw.cs.arch" "\\<arch-lunch\\>" "590g\\>"))

	 '((s)
	   ("-class.uw.cse-568" "568\\>")
	   ("-uw.cs.acm" "\\<acm\\>")
	   ("-uw.cs.sports"
	    "\\<stp-riders\\>" "\\<soccer\\>" "\\<ultimate\\>"
	    "\\<cyclists\\>" "\\<stp-1dayers\\>")
	   ("-uw.cs.room.sieg-431" . "\\<431\\>")
	   ("-uw.cs.csl.uns" . "\\<uns")))))


(defun pdf-nnmail-extract-header-alist (&optional init-header-alist)
  "Extract alist of headers"
  (let ((header-alist init-header-alist))
    (goto-char (point-min))
    (while (re-search-forward
	    "^\\([^ \t\n]*\\):[ \t]*\\(\\([^\n]*\n[ \t]\\)*[^\n]*\\)\n"
	    nil t)
      (let ((header-sym (intern-soft (downcase (match-string 1)))))
	(if header-sym
	    (let ((header-alist-elt (assq header-sym header-alist))
		  (header-data (match-string 2)))
	      (string-match "" header-data) ; reset match-data
	      (while (string-match "\n" header-data (match-end 0))
		(setq header-data (replace-match "" t t header-data)))
	      (if header-alist-elt
		  (setcdr header-alist-elt
			  (concat header-data "\n" (cdr header-alist-elt)))
		(setq header-alist (cons (cons header-sym header-data)
					 header-alist)))))))
    header-alist))

(defun pdf-nnmail-header-list-lookup (field-list header-alist)
  "Lookup fields in an alist.
Returns results, concatenated with newlines."
  (mapconcat
   '(lambda (field)
      (let* ((field-cons (assq field header-alist))
	     (field-cdr (cdr-safe field-cons)))
	(cond
	 ((atom field-cons)
	  "")
	 ((atom field-cdr)
	  field-cdr)
	 (t
	  (setcdr field-cons ;**new cdr is returned, not modified field-cons
		  (pdf-nnmail-header-list-lookup field-cdr header-alist))))))
   field-list "\n"))

(defun pdf-nnmail-split-function nil
  "Do splitting based on generated alist of header fields"
  (interactive)
  (let ((header-alist (pdf-nnmail-extract-header-alist
		       (copy-alist pdf-nnmail-split-abbrev-alist)))
	(methods-walker pdf-nnmail-split-methods)
	dest)
    (while methods-walker
      (let* ((current-method (car methods-walker))
	     (wanted-headers (pdf-nnmail-header-list-lookup
			      (car current-method) header-alist))
	     (clauses-walker (cdr current-method))
	     loc)
	(while clauses-walker
	(let ((current-clause (car clauses-walker)))
	  (if (listp (cdr current-clause))
	      (setcdr current-clause (mapconcat 'identity
						(cdr current-clause)
						"\\|")))
	  (let ((cur-loc (string-match (cdr current-clause) wanted-headers)))
	    (if (and cur-loc (or (not loc) (< cur-loc loc)))
		(setq loc cur-loc
		      dest (car current-clause)
		      methods-walker nil))))
	  (setq clauses-walker (cdr clauses-walker))))
      (setq methods-walker (cdr-safe methods-walker)))
    (list dest)))