From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/23550 Path: main.gmane.org!not-for-mail From: Alexandre Oliva Newsgroups: gmane.emacs.gnus.general Subject: Re: `nnmail-split-fancy' regexp Date: 25 Jun 1999 06:42:34 -0300 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: main.gmane.org 1035161262 2626 80.91.224.250 (21 Oct 2002 00:47:42 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 00:47:42 +0000 (UTC) Cc: "Petersen Jens-Ulrik (NRC/Tokyo)" , ding@gnus.org Return-Path: Original-Received: from farabi.math.uh.edu (farabi.math.uh.edu [129.7.128.57]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id FAA09969 for ; Fri, 25 Jun 1999 05:47:55 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by farabi.math.uh.edu (8.9.1/8.9.1) with ESMTP id EAB06257; Fri, 25 Jun 1999 04:43:54 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Fri, 25 Jun 1999 04:44:24 -0500 (CDT) Original-Received: from sclp3.sclp.com (root@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id EAA27473 for ; Fri, 25 Jun 1999 04:44:08 -0500 (CDT) Original-Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.1.11]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id FAA09940 for ; Fri, 25 Jun 1999 05:43:08 -0400 (EDT) Original-Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11]) by grande.dcc.unicamp.br (8.9.1/8.9.1) with ESMTP id GAA28308; Fri, 25 Jun 1999 06:38:55 -0300 (EST) Original-Received: from saci.lsd.dcc.unicamp.br (oliva@saci.lsd.dcc.unicamp.br [143.106.23.3]) by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with SMTP id GAA15867; Fri, 25 Jun 1999 06:38:54 -0300 (EST) Original-To: Per Abrahamsen In-Reply-To: Per Abrahamsen's message of "24 Jun 1999 20:26:38 +0200" Original-Lines: 62 User-Agent: Gnus/5.070088 (Pterodactyl Gnus v0.88) XEmacs/20.4 (Emerald) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:23550 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:23550 --=-=-= On Jun 24, 1999, Per Abrahamsen wrote: > "Petersen Jens-Ulrik (NRC/Tokyo)" writes: >> I found the restriction that string VALUE regexp's should match to >> whole words too restrictive... > There are zillions of nnmail-split-fancy rules written using that > assumption, and many of them represent a huge work effort. Mine has > approximately 300 rules. > If you want the rules to be interpreted differently, invent a new > variable and a new split method for it. How about using a magic meaningless prefix to select non-full-word matches, say, `[partial]'. I doubt anyone starts their VALUE splits with `[partial]', particularly because of the duplicate `a', which wouldn't be totally useless in terms of regular expressions. I've just implemented it; patch attached. While I was at it, I took the time to introduce a new feature that I had requested for some weeks ago. As a memory refresher, the problem was that I subscribed to the "libtool" and "bug-libtool" mailing lists, and there was no way to avoid that messages posted only to "bug-libtool" were cross-posted to "libtool" without preventing cross-posting when the message was *really* posted to both lists. The syntax I've come up with is: (FIELD VALUE [- RESTRICT [- RESTRICT [...]]] SPLIT) So that now I can write: (| (& (any "libtool@gnu\\.org" - "bug-libtool" "libtool") (any "bug-libtool@gnu\\.org" "bug-libtool") ;; ... ) "misc") The construction above is not equivalent to: (| (& (any "libtool@gnu\\.org" (| (any "bug-libtool@gnu\\.org" nil) "libtool")) (any "bug-libtool@gnu\\.org" "bug-libtool") ;; ... because the latter would not cross-post in case a message is posted to both mailing lists, whereas the former will. A RESTRICT clause will only be considered if there is a match that starts after the end of the FIELD match and ends after the beginning of the VALUE match. And since I was rewriting that code anyway, I took the time to fix the searching mechanism so that it looks for multiple matches of FIELD VALUE, so that SPLITs that involve \N substitutions are properly handled, causing cross-posting without introducing duplicates. -- Alexandre Oliva http://www.dcc.unicamp.br/~oliva IC-Unicamp, Bra[sz]il {oliva,Alexandre.Oliva}@dcc.unicamp.br aoliva@{acm.org,computer.org} oliva@{gnu.org,kaffe.org,{egcs,sourceware}.cygnus.com,samba.org} *** E-mail about software projects will be forwarded to mailing lists --=-=-= Content-Type: application/x-patch Content-Disposition: attachment; filename=partial&restrict.patch Content-Transfer-Encoding: 8bit Index: lisp/ChangeLog from Alexandre Oliva * nnmail.el (nnmail-split-it): Support `[partial]', i.e., non-full-word matches. Search for regexp multiple times, so that backslash substitutions are performed for all matches. Add RESTRICT support for FIELD VALUE splits. (nnmail-split-fancy): Document the new features. Document `!' split, that was missing. Index: texi/ChangeLog from Alexandre Oliva * gnus.texi (Fancy Mail Splitting): Document `[partial]' andC RESTRICT. Index: lisp/nnmail.el --- lisp/nnmail.el Tue Jun 15 01:12:03 1999 +++ lisp/nnmail.el Fri Jun 25 06:14:35 1999 @@ -293,8 +293,12 @@ GROUP: Mail will be stored in GROUP (a string). -\(FIELD VALUE SPLIT): If the message field FIELD (a regexp) contains - VALUE (a regexp), store the messages as specified by SPLIT. +\(FIELD VALUE [- RESTRICT [- RESTRICT [...]]] SPLIT): If the message + field FIELD (a regexp) contains VALUE (a regexp), store the messages + as specified by SPLIT. If RESTRICT (a regexp) matches some string + after FIELD and before the end of the matched VALUE, return NIL, + otherwise process SPLIT. Multiple RESTRICTs add up, further + restricting the possibility of processing SPLIT. \(| SPLIT...): Process each SPLIT expression until one of them matches. A SPLIT expression is said to match if it will cause the mail @@ -306,9 +310,14 @@ the buffer containing the message headers. The return value FUNCTION should be a split, which is then recursively processed. +\(! FUNCTION SPLIT): Call FUNCTION with the result of SPLIT. The + return value FUNCTION should be a split, which is then recursively + processed. + FIELD must match a complete field name. VALUE must match a complete -word according to the `nnmail-split-fancy-syntax-table' syntax table. -You can use \".*\" in the regexps to match partial field names or words. +word according to the `nnmail-split-fancy-syntax-table' syntax table, +unless it starts with \"[partial]\". You can use \".*\" in the +regexps to match partial field names or words. FIELD and VALUE can also be lisp symbols, in that case they are expanded as specified in `nnmail-split-abbrev-alist'. @@ -333,6 +342,13 @@ ;; Other mailing lists... (any \"procmail@informatik\\\\.rwth-aachen\\\\.de\" \"procmail.list\") (any \"SmartList@informatik\\\\.rwth-aachen\\\\.de\" \"SmartList.list\") + ;; Both lists below have the same suffix, so prevent + ;; cross-posting to mkpkg.list of messages posted only to + ;; the bugs- list, but allow cross-posting when the + ;; message was really cross-posted. + (any \"bugs-mypackage@somewhere\" \"mypkg.bugs\") + (any \"mypackage@somewhere\" - \"bugs-mypackage\" \"mypkg.list\") + ;; ;; People... (any \"larsi@ifi\\\\.uio\\\\.no\" \"people.Lars Magne Ingebrigtsen\")) ;; Unmatched mail goes to the catch all group. @@ -1111,47 +1127,79 @@ ;; Check the cache for the regexp for this split. ((setq cached-pair (assq split nnmail-split-cache)) - (goto-char (point-max)) - ;; FIX FIX FIX problem with re-search-backward is that if you have - ;; a split: (from "foo-\\(bar\\|baz\\)@gnus.org "mail.foo.\\1") - ;; and someone mails a message with 'To: foo-bar@gnus.org' and - ;; 'CC: foo-baz@gnus.org', we'll pick 'mail.foo.baz' as the group - ;; if the cc line is a later header, even though the other choice - ;; is probably better. Also, this routine won't do a crosspost - ;; when there are two different matches. - ;; I guess you could just make this more determined, and it could - ;; look for still more matches prior to this one, and recurse - ;; on each of the multiple matches hit. Of course, then you'd - ;; want to make sure that nnmail-article-group or nnmail-split-fancy - ;; removed duplicates, since there might be more of those. - ;; I guess we could also remove duplicates in the & split case, since - ;; that's the only thing that can introduce them. - (when (re-search-backward (cdr cached-pair) nil t) - (when nnmail-split-tracing - (push (cdr cached-pair) nnmail-split-trace)) - ;; Someone might want to do a \N sub on this match, so get the - ;; correct match positions. - (goto-char (match-end 0)) - (let ((value (nth 1 split))) - (re-search-backward (if (symbolp value) - (cdr (assq value nnmail-split-abbrev-alist)) - value) - (match-end 1))) - (nnmail-split-it (nth 2 split)))) + (let (split-result + (end-point (point-max)) + (value (nth 1 split))) + (if (and (stringp value) + (string= "[partial]" (substring value 0 9))) + (setq value (substring value 9)) + (if (symbolp value) + (setq value (cdr (assq value nnmail-split-abbrev-alist))))) + (goto-char end-point) + (while (re-search-backward (cdr cached-pair) nil t) + (when nnmail-split-tracing + (push (cdr cached-pair) nnmail-split-trace)) + ;; Start the next search just before the beginning of the + ;; VALUE match. + (setq end-point (1- (match-end 1))) + (goto-char (match-end 0)) + (let ((split-rest (cddr split)) + (start-of-value (match-end 1)) + (after-header-name (match-end 2))) + ;; Someone might want to do a \N sub on this match, so get the + ;; correct match positions. + (re-search-backward value start-of-value) + (if (eq (car split-rest) '-) + (let* ((end (match-end 0))) + ;; Handle - RESTRICTs + (while (eq (car split-rest) '-) + (goto-char end) + (setq split-rest + ;; RESTRICT must start after-header-name and + ;; end after start-of-value, so that, for + ;; (any "foo" - "x-foo" "foo.list") + ;; we do not exclude foo.list just because + ;; the header is: ``To: x-foo, foo'' + (if (and (re-search-backward (cadr split-rest) + after-header-name t) + (> (match-end 0) start-of-value)) + nil + (cddr split-rest)))) + (when split-rest + ;; Restore the correct match positions again for + ;; \N substitutions + (goto-char end) + (re-search-backward value start-of-value)))) + (if split-rest + ;; Prevent multiple matches from generating duplicates + (dolist (sp (nnmail-split-it (car split-rest))) + (unless (memq sp split-result) + (push sp split-result)))))) + split-result)) ;; Not in cache, compute a regexp for the field/value pair. (t (let* ((field (nth 0 split)) (value (nth 1 split)) + (fullword (if (and (stringp value) + (string= "[partial]" (substring value 0 9))) + (progn + (setq value (substring value 9)) + (cons "" "")) + (cons "\\<" "\\>"))) (regexp (concat "^\\(\\(" (if (symbolp field) (cdr (assq field nnmail-split-abbrev-alist)) field) - "\\):.*\\)\\<\\(" + "\\):.*\\)" + (car fullword) + "\\(" (if (symbolp value) (cdr (assq value nnmail-split-abbrev-alist)) value) - "\\)\\>"))) + "\\)" + (cdr fullword) + ))) (push (cons split regexp) nnmail-split-cache) ;; Now that it's in the cache, just call nnmail-split-it again ;; on the same split, which will find it immediately in the cache. Index: texi/gnus.texi --- texi/gnus.texi Tue Jun 15 01:12:06 1999 +++ texi/gnus.texi Fri Jun 25 06:20:53 1999 @@ -10441,6 +10441,12 @@ ;; Other mailing lists... (any "procmail@@informatik\\.rwth-aachen\\.de" "procmail.list") (any "SmartList@@informatik\\.rwth-aachen\\.de" "SmartList.list") + ;; Both lists below have the same suffix, so prevent + ;; cross-posting to mkpkg.list of messages posted only to + ;; the bugs- list, but allow cross-posting when the + ;; message was really cross-posted. + (any "bugs-mypackage@@somewhere" "mypkg.bugs") + (any "mypackage@@somewhere\" - "bugs-mypackage" "mypkg.list") ;; People... (any "larsi@@ifi\\.uio\\.no" "people.Lars_Magne_Ingebrigtsen")) ;; Unmatched mail goes to the catch all group. @@ -10459,9 +10465,12 @@ examples. @item -@var{(FIELD VALUE SPLIT)}: If the split is a list, the first element of -which is a string, then store the message as specified by SPLIT, if -header FIELD (a regexp) contains VALUE (also a regexp). +@var{(FIELD VALUE [- RESTRICT [- RESTRICT [...]]] SPLIT)}: If the split +is a list, the first element of which is a string, then store the +message as specified by SPLIT, if header FIELD (a regexp) contains VALUE +(also a regexp). If RESTRICT (yet another regexp) matches some string +after FIELD and before the end of the matched VALUE, the SPLIT is +ignored. If none of the RESTRICT clauses match, SPLIT is processed. @item @var{(| SPLIT...)}: If the split is a list, and the first element is @@ -10495,16 +10504,20 @@ In these splits, @var{FIELD} must match a complete field name. @var{VALUE} must match a complete word according to the fundamental mode -syntax table. You can use @code{.*} in the regexps to match partial -field names or words. In other words, all @var{VALUE}'s are wrapped in -@samp{\<} and @samp{\>} pairs. +syntax table, unless it starts with @code{[partial]}. You can use +@code{.*} in the regexps to match partial field names or words. In +other words, by default, @var{VALUE}'s are wrapped in @samp{\<} and +@samp{\>} pairs, but the default can be overridden with the magic prefix +@code{[partial]}. The prefix is obviously eliminated before @var{VALUE} +is used. @vindex nnmail-split-abbrev-alist @var{FIELD} and @var{VALUE} can also be lisp symbols, in that case they are expanded as specified by the variable @code{nnmail-split-abbrev-alist}. This is an alist of cons cells, where -the @code{car} of a cell contains the key, and the @code{cdr} contains the associated -value. +the @code{car} of a cell contains the key, and the @code{cdr} contains +the associated value. Note that @code{[partial]} is not supported in +this case. @vindex nnmail-split-fancy-syntax-table @code{nnmail-split-fancy-syntax-table} is the syntax table in effect --=-=-=--