From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/23550
Path: main.gmane.org!not-for-mail
From: Alexandre Oliva <oliva@dcc.unicamp.br>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: `nnmail-split-fancy' regexp
Date: 25 Jun 1999 06:42:34 -0300
Sender: owner-ding@hpc.uh.edu
Message-ID: <ord7ykaa5x.fsf@saci.lsd.dcc.unicamp.br>
References: <tfz3dzikntv.fsf@toedar02.europe.nokia.com> <rj4sjxo3oh.fsf@feller.dina.kvl.dk>
NNTP-Posting-Host: coloc-standby.netfonds.no
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-Trace: main.gmane.org 1035161262 2626 80.91.224.250 (21 Oct 2002 00:47:42 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Mon, 21 Oct 2002 00:47:42 +0000 (UTC)
Cc: "Petersen Jens-Ulrik (NRC/Tokyo)" <jens-ulrik.petersen@nokia.com>,
        ding@gnus.org
Return-Path: <owner-ding@hpc.uh.edu>
Original-Received: from farabi.math.uh.edu (farabi.math.uh.edu [129.7.128.57])
	by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id FAA09969
	for <jason@mailhost.sclp.com>; Fri, 25 Jun 1999 05:47:55 -0400 (EDT)
Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5])
	by farabi.math.uh.edu (8.9.1/8.9.1) with ESMTP id EAB06257;
	Fri, 25 Jun 1999 04:43:54 -0500 (CDT)
Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Fri, 25 Jun 1999 04:44:24 -0500 (CDT)
Original-Received: from sclp3.sclp.com (root@sclp3.sclp.com [204.252.123.139])
	by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id EAA27473
	for <ding@hpc.uh.edu>; Fri, 25 Jun 1999 04:44:08 -0500 (CDT)
Original-Received: from grande.dcc.unicamp.br (grande.dcc.unicamp.br [143.106.1.11])
	by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id FAA09940
	for <ding@gnus.org>; Fri, 25 Jun 1999 05:43:08 -0400 (EDT)
Original-Received: from amazonas.dcc.unicamp.br (amazonas.dcc.unicamp.br [143.106.7.11])
	by grande.dcc.unicamp.br (8.9.1/8.9.1) with ESMTP id GAA28308;
	Fri, 25 Jun 1999 06:38:55 -0300 (EST)
Original-Received: from saci.lsd.dcc.unicamp.br (oliva@saci.lsd.dcc.unicamp.br [143.106.23.3])
	by amazonas.dcc.unicamp.br (8.8.5/8.8.5) with SMTP id GAA15867;
	Fri, 25 Jun 1999 06:38:54 -0300 (EST)
Original-To: Per Abrahamsen <abraham@dina.kvl.dk>
In-Reply-To: Per Abrahamsen's message of "24 Jun 1999 20:26:38 +0200"
Original-Lines: 62
User-Agent: Gnus/5.070088 (Pterodactyl Gnus v0.88) XEmacs/20.4 (Emerald)
Precedence: list
X-Majordomo: 1.94.jlt7
Xref: main.gmane.org gmane.emacs.gnus.general:23550
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:23550


--=-=-=

On Jun 24, 1999, Per Abrahamsen <abraham@dina.kvl.dk> wrote:

> "Petersen Jens-Ulrik (NRC/Tokyo)" <jens-ulrik.petersen@nokia.com> writes:
>> I found the restriction that string VALUE regexp's should match to
>> whole words too restrictive...

> There are zillions of nnmail-split-fancy rules written using that
> assumption, and many of them represent a huge work effort.  Mine has
> approximately 300 rules.  

> If you want the rules to be interpreted differently, invent a new
> variable and a new split method for it.

How about using a magic meaningless prefix to select non-full-word
matches, say, `[partial]'.  I doubt anyone starts their VALUE splits
with `[partial]', particularly because of the duplicate `a', which
wouldn't be totally useless in terms of regular expressions.  I've
just implemented it; patch attached.

While I was at it, I took the time to introduce a new feature that I
had requested for some weeks ago.  As a memory refresher, the problem
was that I subscribed to the "libtool" and "bug-libtool" mailing
lists, and there was no way to avoid that messages posted only to
"bug-libtool" were cross-posted to "libtool" without preventing
cross-posting when the message was *really* posted to both lists.  The 
syntax I've come up with is:

(FIELD VALUE [- RESTRICT [- RESTRICT [...]]] SPLIT)

So that now I can write:

(| (& (any "libtool@gnu\\.org" - "bug-libtool" "libtool")
      (any "bug-libtool@gnu\\.org" "bug-libtool")
      ;; ...
      )
   "misc")

The construction above is not equivalent to:

(| (& (any "libtool@gnu\\.org" (| (any "bug-libtool@gnu\\.org" nil)
                                  "libtool"))
      (any "bug-libtool@gnu\\.org" "bug-libtool") ;; ...

because the latter would not cross-post in case a message is posted to
both mailing lists, whereas the former will.  A RESTRICT clause will
only be considered if there is a match that starts after the end of
the FIELD match and ends after the beginning of the VALUE match.


And since I was rewriting that code anyway, I took the time to fix the
searching mechanism so that it looks for multiple matches of FIELD
VALUE, so that SPLITs that involve \N substitutions are properly
handled, causing cross-posting without introducing duplicates.


-- 
Alexandre Oliva http://www.dcc.unicamp.br/~oliva IC-Unicamp, Bra[sz]il
{oliva,Alexandre.Oliva}@dcc.unicamp.br  aoliva@{acm.org,computer.org}
oliva@{gnu.org,kaffe.org,{egcs,sourceware}.cygnus.com,samba.org}
*** E-mail about software projects will be forwarded to mailing lists

--=-=-=
Content-Type: application/x-patch
Content-Disposition: attachment; filename=partial&restrict.patch
Content-Transfer-Encoding: 8bit

Index: lisp/ChangeLog
from Alexandre Oliva  <oliva@dcc.unicamp.br>

	* nnmail.el (nnmail-split-it): Support `[partial]', i.e.,
	non-full-word matches.  Search for regexp multiple times, so that
	backslash substitutions are performed for all matches.  Add
	RESTRICT support for FIELD VALUE splits.
	(nnmail-split-fancy): Document the new features.  Document `!'
	split, that was missing.

Index: texi/ChangeLog
from Alexandre Oliva  <oliva@dcc.unicamp.br>

	* gnus.texi (Fancy Mail Splitting): Document `[partial]' andC
	RESTRICT.

Index: lisp/nnmail.el
--- lisp/nnmail.el	Tue Jun 15 01:12:03 1999
+++ lisp/nnmail.el	Fri Jun 25 06:14:35 1999
@@ -293,8 +293,12 @@
 
 GROUP: Mail will be stored in GROUP (a string).
 
-\(FIELD VALUE SPLIT): If the message field FIELD (a regexp) contains
-  VALUE (a regexp), store the messages as specified by SPLIT.
+\(FIELD VALUE [- RESTRICT [- RESTRICT [...]]] SPLIT): If the message
+  field FIELD (a regexp) contains VALUE (a regexp), store the messages 
+  as specified by SPLIT.  If RESTRICT (a regexp) matches some string
+  after FIELD and before the end of the matched VALUE, return NIL,
+  otherwise process SPLIT.  Multiple RESTRICTs add up, further
+  restricting the possibility of processing SPLIT.
 
 \(| SPLIT...): Process each SPLIT expression until one of them matches.
   A SPLIT expression is said to match if it will cause the mail
@@ -306,9 +310,14 @@
   the buffer containing the message headers.  The return value FUNCTION
   should be a split, which is then recursively processed.
 
+\(! FUNCTION SPLIT): Call FUNCTION with the result of SPLIT.  The
+  return value FUNCTION should be a split, which is then recursively
+  processed.
+
 FIELD must match a complete field name.  VALUE must match a complete
-word according to the `nnmail-split-fancy-syntax-table' syntax table.
-You can use \".*\" in the regexps to match partial field names or words.
+word according to the `nnmail-split-fancy-syntax-table' syntax table,
+unless it starts with \"[partial]\".  You can use \".*\" in the
+regexps to match partial field names or words.
 
 FIELD and VALUE can also be lisp symbols, in that case they are expanded
 as specified in `nnmail-split-abbrev-alist'.
@@ -333,6 +342,13 @@
 	     ;; Other mailing lists...
 	     (any \"procmail@informatik\\\\.rwth-aachen\\\\.de\" \"procmail.list\")
 	     (any \"SmartList@informatik\\\\.rwth-aachen\\\\.de\" \"SmartList.list\")
+             ;; Both lists below have the same suffix, so prevent
+             ;; cross-posting to mkpkg.list of messages posted only to 
+             ;; the bugs- list, but allow cross-posting when the
+             ;; message was really cross-posted.
+             (any \"bugs-mypackage@somewhere\" \"mypkg.bugs\")
+             (any \"mypackage@somewhere\" - \"bugs-mypackage\" \"mypkg.list\")
+             ;; 
 	     ;; People...
 	     (any \"larsi@ifi\\\\.uio\\\\.no\" \"people.Lars Magne Ingebrigtsen\"))
 	  ;; Unmatched mail goes to the catch all group.
@@ -1111,47 +1127,79 @@
 
      ;; Check the cache for the regexp for this split.
      ((setq cached-pair (assq split nnmail-split-cache))
-      (goto-char (point-max))
-      ;; FIX FIX FIX problem with re-search-backward is that if you have
-      ;; a split: (from "foo-\\(bar\\|baz\\)@gnus.org "mail.foo.\\1")
-      ;; and someone mails a message with 'To: foo-bar@gnus.org' and
-      ;; 'CC: foo-baz@gnus.org', we'll pick 'mail.foo.baz' as the group
-      ;; if the cc line is a later header, even though the other choice
-      ;; is probably better.  Also, this routine won't do a crosspost
-      ;; when there are two different matches.
-      ;; I guess you could just make this more determined, and it could
-      ;; look for still more matches prior to this one, and recurse
-      ;; on each of the multiple matches hit.  Of course, then you'd
-      ;; want to make sure that nnmail-article-group or nnmail-split-fancy
-      ;; removed duplicates, since there might be more of those.
-      ;; I guess we could also remove duplicates in the & split case, since
-      ;; that's the only thing that can introduce them.
-      (when (re-search-backward (cdr cached-pair) nil t)
-	(when nnmail-split-tracing
-	  (push (cdr cached-pair) nnmail-split-trace))
-	;; Someone might want to do a \N sub on this match, so get the
-	;; correct match positions.
-	(goto-char (match-end 0))
-	(let ((value (nth 1 split)))
-	  (re-search-backward (if (symbolp value)
-				  (cdr (assq value nnmail-split-abbrev-alist))
-				value)
-			      (match-end 1)))
-	(nnmail-split-it (nth 2 split))))
+      (let (split-result
+	    (end-point (point-max))
+	    (value (nth 1 split)))
+	(if (and (stringp value)
+		 (string= "[partial]" (substring value 0 9)))
+	    (setq value (substring value 9))
+	  (if (symbolp value)
+	      (setq value (cdr (assq value nnmail-split-abbrev-alist)))))
+	(goto-char end-point)
+	(while (re-search-backward (cdr cached-pair) nil t)
+	  (when nnmail-split-tracing
+	    (push (cdr cached-pair) nnmail-split-trace))
+	  ;; Start the next search just before the beginning of the
+	  ;; VALUE match.
+	  (setq end-point (1- (match-end 1)))
+	  (goto-char (match-end 0))
+	  (let ((split-rest (cddr split))
+		(start-of-value (match-end 1))
+		(after-header-name (match-end 2)))
+	    ;; Someone might want to do a \N sub on this match, so get the
+	    ;; correct match positions.
+	    (re-search-backward value start-of-value)
+	    (if (eq (car split-rest) '-)
+		(let* ((end (match-end 0)))
+		  ;; Handle - RESTRICTs
+		  (while (eq (car split-rest) '-)
+		    (goto-char end)
+		    (setq split-rest
+			  ;; RESTRICT must start after-header-name and
+			  ;; end after start-of-value, so that, for
+			  ;; (any "foo" - "x-foo" "foo.list")
+			  ;; we do not exclude foo.list just because
+			  ;; the header is: ``To: x-foo, foo''
+			  (if (and (re-search-backward (cadr split-rest)
+						       after-header-name t)
+				   (> (match-end 0) start-of-value))
+			      nil
+			    (cddr split-rest))))
+		  (when split-rest
+		    ;; Restore the correct match positions again for
+		    ;; \N substitutions
+		    (goto-char end)
+		    (re-search-backward value start-of-value))))
+	    (if split-rest
+		;; Prevent multiple matches from generating duplicates 
+		(dolist (sp (nnmail-split-it (car split-rest)))
+		  (unless (memq sp split-result)
+		    (push sp split-result))))))
+	split-result))
 
      ;; Not in cache, compute a regexp for the field/value pair.
      (t
       (let* ((field (nth 0 split))
 	     (value (nth 1 split))
+	     (fullword (if (and (stringp value)
+				(string= "[partial]" (substring value 0 9)))
+			   (progn
+			     (setq value (substring value 9))
+			     (cons "" ""))
+			 (cons "\\<" "\\>")))
 	     (regexp (concat "^\\(\\("
 			     (if (symbolp field)
 				 (cdr (assq field nnmail-split-abbrev-alist))
 			       field)
-			     "\\):.*\\)\\<\\("
+			     "\\):.*\\)"
+			     (car fullword)
+			     "\\("
 			     (if (symbolp value)
 				 (cdr (assq value nnmail-split-abbrev-alist))
 			       value)
-			     "\\)\\>")))
+			     "\\)"
+			     (cdr fullword)
+			     )))
 	(push (cons split regexp) nnmail-split-cache)
 	;; Now that it's in the cache, just call nnmail-split-it again
 	;; on the same split, which will find it immediately in the cache.
Index: texi/gnus.texi
--- texi/gnus.texi	Tue Jun 15 01:12:06 1999
+++ texi/gnus.texi	Fri Jun 25 06:20:53 1999
@@ -10441,6 +10441,12 @@
       ;; Other mailing lists...
       (any "procmail@@informatik\\.rwth-aachen\\.de" "procmail.list")
       (any "SmartList@@informatik\\.rwth-aachen\\.de" "SmartList.list")
+      ;; Both lists below have the same suffix, so prevent
+      ;; cross-posting to mkpkg.list of messages posted only to 
+      ;; the bugs- list, but allow cross-posting when the
+      ;; message was really cross-posted.
+      (any "bugs-mypackage@@somewhere" "mypkg.bugs")
+      (any "mypackage@@somewhere\" - "bugs-mypackage" "mypkg.list")
       ;; People...
       (any "larsi@@ifi\\.uio\\.no" "people.Lars_Magne_Ingebrigtsen"))
    ;; Unmatched mail goes to the catch all group.
@@ -10459,9 +10465,12 @@
 examples.
 
 @item
-@var{(FIELD VALUE SPLIT)}: If the split is a list, the first element of
-which is a string, then store the message as specified by SPLIT, if
-header FIELD (a regexp) contains VALUE (also a regexp).
+@var{(FIELD VALUE [- RESTRICT [- RESTRICT [...]]] SPLIT)}: If the split
+is a list, the first element of which is a string, then store the
+message as specified by SPLIT, if header FIELD (a regexp) contains VALUE
+(also a regexp).  If RESTRICT (yet another regexp) matches some string
+after FIELD and before the end of the matched VALUE, the SPLIT is
+ignored.  If none of the RESTRICT clauses match, SPLIT is processed.
 
 @item
 @var{(| SPLIT...)}: If the split is a list, and the first element is
@@ -10495,16 +10504,20 @@
 
 In these splits, @var{FIELD} must match a complete field name.
 @var{VALUE} must match a complete word according to the fundamental mode
-syntax table.  You can use @code{.*} in the regexps to match partial
-field names or words.  In other words, all @var{VALUE}'s are wrapped in
-@samp{\<} and @samp{\>} pairs.
+syntax table, unless it starts with @code{[partial]}.  You can use
+@code{.*} in the regexps to match partial field names or words.  In
+other words, by default, @var{VALUE}'s are wrapped in @samp{\<} and
+@samp{\>} pairs, but the default can be overridden with the magic prefix 
+@code{[partial]}.  The prefix is obviously eliminated before @var{VALUE} 
+is used.
 
 @vindex nnmail-split-abbrev-alist
 @var{FIELD} and @var{VALUE} can also be lisp symbols, in that case they
 are expanded as specified by the variable
 @code{nnmail-split-abbrev-alist}.  This is an alist of cons cells, where
-the @code{car} of a cell contains the key, and the @code{cdr} contains the associated
-value.
+the @code{car} of a cell contains the key, and the @code{cdr} contains
+the associated value.  Note that @code{[partial]} is not supported in
+this case.
 
 @vindex nnmail-split-fancy-syntax-table
 @code{nnmail-split-fancy-syntax-table} is the syntax table in effect

--=-=-=--