Gnus development mailing list
 help / color / mirror / Atom feed
From: anonymous@sunsite.auc.dk
Subject: Re: nnmail-split-it
Date: 4 Feb 1997 06:16:32 -0000	[thread overview]
Message-ID: <19970204061632.24946.qmail@sunsite.auc.dk> (raw)
In-Reply-To: <rvn2tlqjcs.fsf@sdnp5.ucsd.edu>

From: Paul Franklin <paul@cs.washington.edu>
Date: 03 Feb 1997 22:16:24 -0800
Message-ID: <r9qpvyhayfb.fsf@fester.cs.washington.edu>
Organization: Computer Science, U of Washington, Seattle, WA, USA
Lines: 192
X-Newsreader: Gnus v5.4.10/Emacs 19.34
Path: fester.cs.washington.edu
NNTP-Posting-Host: fester.cs.washington.edu

Warning:  I'm about to throw out some performance numbers from what I
remember from 6 months ago when my spool was on a local disk...

>>>>> David Moore writes:

 > Lars Magne Ingebrigtsen <larsi@ifi.uio.no> writes:

 >> Paul Franklin <paul@cs.washington.edu> writes:

 >> > Hmm.  I wrote some elisp code to do splitting like this.  I didn't
 >> > distribute it because:

 >> > * I realized that the bottleneck was disk access time (over NFS).

 >> The box I'm sitting with now is a 486/slow without NFS, and splitting
 >> is kinda slow here as well.

 > 	Two different costs.  There is a per message cost (like NFS and
 > file stating).  There is also a per split cost (which is roughly O(n*m)
 > where n is the number of splits, and m is the number of headers in the
 > message).

I tried hard to lower the per split cost while not worrying as much
about the per message cost.  The significant per split costs are an
assq and a string-match.  But the per header line costs aren't small;
I'd be very surprised if the per header line cost were lower than the
per split cost.

 >> > It generates a alist of headers, unwrapping lines within headers and
 >> > separating values from duplicate headers with "\n".  You then match
 >> > with a header or multiple ones concatenated (very useful, for me at
 >> > least).  I never compared them with the default split rules, but I'm
 >> > fairly sure that this code is tight enough that it's very unlikely to
 >> > be a bottleneck.

 > 	This is similar to what I suggested, but I wasn't going to
 > bother to put the headers into concatenated strings, since that is quite
 > slow itself.  But tracking the start/end position of those strings makes
 > doing a buffer regexp search much much faster since it limits the scope
 > of the search.

I did this for flexibility, not speed.  It allows searching, in order,
from, apparently-from, to, cc, apparently-to, ... with a single rule.
I really wanted this, so if I was going to write my own split
function, it was going to have this feature.

I'm attaching my code with sample rules, in case people want to
experiment, run timing tests, or whatever.  Until I spend some effort
to clean it up for qgnus, if Lars wants to include it (at which point
it'll be GPL'd), or decide not to clean it up, please don't
redistribute it.  I suppose Lars will want me to do something other
than performing list surgery on a user-configurable variable.  (Yes,
this is truly evil code.)

Be warned, I'm likely to change the rule forms to
	;;	(GROUP . REGEXP)
	;;	(GROUP WORDS...)
where the second is converted to the first by inserting "\\<", "\\>",
and "\\|" as appropriate.

--Paul

;;Copyright 1996, 1997 Paul Franklin

(setq nnmail-split-methods 'pdf-nnmail-split-function)

(setq pdf-nnmail-split-abbrev-alist
	;;Using these is particularly efficient 
	;;because their expansions are cached.
	;; Elements are of the form
	;; (ABBREV . HEADER-LIST)
	;;which is equivalent to
	;; (ABBREV HEADERS...)
      '(
	(f from sender)
	(l f to apparently-to)
	(t to apparently-to cc)
	(a f t)
	(s a subject)))

(setq pdf-nnmail-split-methods
	(list

	;;Rule groups are of the form (HEADER-LIST RULES...)
	;;Headers are specified with lowercase symbols, not strings.
	;;Rules come in two forms:
	;;	(GROUP . REGEXP)
	;;	(GROUP REGEXPS...)
	;;  The second is converted to the first by list surgery (!);
	;;  "\\|" is inserted between regexps.

	;;Rule groups are considered in order, a match terminates the
	;;search.

	;;Rules withing a rule group are considered simultaneously,
	;;with the one matching earlier in the specified headers
	;;winning.

	 '((gnus-warning)
	   ("-mail.duplicates" . "\\<duplicate\\>"))

	 '((a)
	   ("-conf.cs.chi97.sv" 
	    "\\<chi97-sv\\>" "\\<tutorial-chi97\\>")
	   ("-net.gnus.list" . "\\<ding@ifi\\.uio\\.no\\>"))

	 '((subject) 
	   ("-uw.cs.csl.dots" . "dot"))

	 '((t)
	   ("-seminar.uw.cs.systems"
	    "\\<cse590s\\>" "\\<cer-systems\\>" "\\<uw-systems\\>")
	   ("-seminar.uw.cs.ui" . "\\<ui-students\\>")
	   ("-seminar.uw.cs.lis" 
	    "590m\\>" "\\<590f\\>" "\\<vlsi\\>")
	   ("-seminar.uw.cs.arch" "\\<arch-lunch\\>" "590g\\>"))

	 '((s)
	   ("-class.uw.cse-568" "568\\>")
	   ("-uw.cs.acm" "\\<acm\\>")
	   ("-uw.cs.sports"
	    "\\<stp-riders\\>" "\\<soccer\\>" "\\<ultimate\\>"
	    "\\<cyclists\\>" "\\<stp-1dayers\\>")
	   ("-uw.cs.room.sieg-431" . "\\<431\\>")
	   ("-uw.cs.csl.uns" . "\\<uns")))))


(defun pdf-nnmail-extract-header-alist (&optional init-header-alist)
  "Extract alist of headers"
  (let ((header-alist init-header-alist))
    (goto-char (point-min))
    (while (re-search-forward
	    "^\\([^ \t\n]*\\):[ \t]*\\(\\([^\n]*\n[ \t]\\)*[^\n]*\\)\n"
	    nil t)
      (let ((header-sym (intern-soft (downcase (match-string 1)))))
	(if header-sym
	    (let ((header-alist-elt (assq header-sym header-alist))
		  (header-data (match-string 2)))
	      (string-match "" header-data) ; reset match-data
	      (while (string-match "\n" header-data (match-end 0))
		(setq header-data (replace-match "" t t header-data)))
	      (if header-alist-elt
		  (setcdr header-alist-elt
			  (concat header-data "\n" (cdr header-alist-elt)))
		(setq header-alist (cons (cons header-sym header-data)
					 header-alist)))))))
    header-alist))

(defun pdf-nnmail-header-list-lookup (field-list header-alist)
  "Lookup fields in an alist.
Returns results, concatenated with newlines."
  (mapconcat
   '(lambda (field)
      (let* ((field-cons (assq field header-alist))
	     (field-cdr (cdr-safe field-cons)))
	(cond
	 ((atom field-cons)
	  "")
	 ((atom field-cdr)
	  field-cdr)
	 (t
	  (setcdr field-cons ;**new cdr is returned, not modified field-cons
		  (pdf-nnmail-header-list-lookup field-cdr header-alist))))))
   field-list "\n"))

(defun pdf-nnmail-split-function nil
  "Do splitting based on generated alist of header fields"
  (interactive)
  (let ((header-alist (pdf-nnmail-extract-header-alist
		       (copy-alist pdf-nnmail-split-abbrev-alist)))
	(methods-walker pdf-nnmail-split-methods)
	dest)
    (while methods-walker
      (let* ((current-method (car methods-walker))
	     (wanted-headers (pdf-nnmail-header-list-lookup
			      (car current-method) header-alist))
	     (clauses-walker (cdr current-method))
	     loc)
	(while clauses-walker
	(let ((current-clause (car clauses-walker)))
	  (if (listp (cdr current-clause))
	      (setcdr current-clause (mapconcat 'identity
						(cdr current-clause)
						"\\|")))
	  (let ((cur-loc (string-match (cdr current-clause) wanted-headers)))
	    (if (and cur-loc (or (not loc) (< cur-loc loc)))
		(setq loc cur-loc
		      dest (car current-clause)
		      methods-walker nil))))
	  (setq clauses-walker (cdr clauses-walker))))
      (setq methods-walker (cdr-safe methods-walker)))
    (list dest)))


  reply	other threads:[~1997-02-04  6:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1997-02-02 22:06 nnmail-split-it Johan Danielsson
1997-02-03 23:28 ` nnmail-split-it David Moore
1997-02-04  1:29   ` nnmail-split-it Paul Franklin
1997-02-04  1:55     ` nnmail-split-it Lars Magne Ingebrigtsen
1997-02-04  4:35       ` nnmail-split-it David Moore
1997-02-04  6:16         ` anonymous [this message]
1997-02-04  8:37       ` nnmail-split-it Per Abrahamsen
1997-02-04 18:05         ` nnmail-split-it David Moore
1997-02-04 19:58           ` nnmail-split-it Lars Magne Ingebrigtsen
1997-02-05  6:44             ` nnmail-split-it Paul Franklin
1997-02-05  8:24           ` nnmail-split-it Per Abrahamsen
1997-02-04  0:46 ` nnmail-split-it Lars Magne Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19970204061632.24946.qmail@sunsite.auc.dk \
    --to=anonymous@sunsite.auc.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).