Gnus development mailing list
 help / color / mirror / Atom feed
From: Daniel Pittman <daniel@rimspace.net>
To: ding@gnus.org
Cc: "Kai Großjohann" <grossjohann@ls6.cs.uni-dortmund.de>,
	"Simon Josefsson" <jas@pdc.kth.se>
Subject: NNIR, IMAP SEARCH, and the infinite pain of search terms.
Date: Mon, 10 Dec 2007 22:37:38 +1100	[thread overview]
Message-ID: <87r6hu3hml.fsf@enki.rimspace.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

G'day.

Gnus contains, in contrib/, the nnir.el interface to search engines.
This is a nice tool and, pleasantly, supports IMAP SEARCH to give me
efficient searching of my IMAP mail ... or so I thought.

The biggest problem I had was that it would never seem to find the mail
I expected, so I didn't use it much.  

After inspecting the code the reason became clear: the search was a
single "exact substring" match performed, not the logical sort of search
that I have come to expect with Google and other search engines.


So...  the IMAP SEARCH command doesn't do any clever parsing or
anything; the front end software has to do that.  

Attached is my first "draft" of a more complex search front-end for NNIR
and IMAP SEARCH -- it parses the query, translates that into a suitable
IMAP SEARCH command and returns the results.


This is *much* less surprising to me: it returns what I expect, most of
the time, and takes the sort of input I would expect as well.


At the moment it only handles basic searching, as documented in the
`nnir-imap-make-query' function in the patch.

I plan to extend this to support the full range of operators that IMAP
SEARCH supports, but wanted to seek feedback on the initial
implementation first.


I have signed papers assigning Gnus changes already, so there should be
no legal reason that this is rejected.

Regards,
        Daniel
-- 
Daniel Pittman <daniel@cybersource.com.au>           Phone: 03 9621 2377
Level 4, 10 Queen St, Melbourne             Web: http://www.cyber.com.au
Cybersource: Australia's Leading Linux and Open Source Solutions Company


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: imap-search.patch --]
[-- Type: text/x-diff, Size: 7341 bytes --]

? imap-search.patch
Index: contrib/nnir.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/contrib/nnir.el,v
retrieving revision 7.22
diff -u -r7.22 nnir.el
--- contrib/nnir.el	4 Oct 2007 18:51:27 -0000	7.22
+++ contrib/nnir.el	10 Dec 2007 11:18:15 -0000
@@ -886,6 +886,9 @@
 ;; handle errors
 
 (defun nnir-run-imap (query srv &optional group-option)
+  "Run a search against an IMAP back-end server.
+This uses a custom query language parser; see `nnir-imap-make-query' for
+details on the language and supported extensions"
   (require 'imap)
   (require 'nnimap)
   (save-excursion
@@ -908,11 +911,182 @@
                  (lambda (artnum)
                    (push (vector group artnum 1) artlist)
                    (setq arts (1+ arts)))
-                 (imap-search (concat criteria " \"" qstring "\"") buf))
+                 (imap-search (nnir-imap-make-query criteria qstring) buf))
                 (message "Searching %s... %d matches" mbx arts)))
             (message "Searching %s...done" group))
         (quit nil))
       (reverse artlist))))
+
+(defun nnir-imap-make-query (criteria qstring)
+  "Parse the query string and criteria into an appropriate IMAP search
+expression, returning the string query to make.
+
+This implements a little language designed to return the expected results
+to an arbitrary query string to the end user.
+
+The search is always case-insensitive, as defined by RFC2060, and supports
+the following features (inspired by the Google search input language): 
+
+Automatic \"and\" queries
+    If you specify multiple words then they will be treated as an \"and\"
+    expression intended to match all components.
+
+Phrase searches
+    If you wrap your query in double-quotes then it will be treated as a
+    literal string.
+
+Negative terms
+    If you precede a term with \"-\" then it will negate that.
+
+\"OR\" queries
+    If you include an upper-case \"OR\" in your search it will cause the
+    term before it and the term after it to be treated as alternatives.
+
+In future the following will be added to the language:
+ * support for date matches
+ * support for location of text matching within the query
+ * from/to/etc headers
+ * additional search terms
+ * flag based searching
+ * anything else that the RFC supports, basically."
+  ;; Walk through the query and turn it into an IMAP query string.
+  (nnir-imap-query-to-imap criteria (nnir-imap-parse-query qstring)))
+
+
+(defun nnir-imap-query-to-imap (criteria query)
+  "Turn a s-expression format query into IMAP."
+  (mapconcat
+   ;; Turn the expressions into IMAP text
+   (lambda (item)
+     (nnir-imap-expr-to-imap criteria item))
+   ;; The query, already in s-expr format.
+   query
+   ;; Append a space between each expression
+   " "))
+
+
+(defun nnir-imap-expr-to-imap (criteria expr)
+  "Convert EXPR into an IMAP search expression on CRITERIA"
+  ;; What sort of expression is this, eh?
+  (cond
+   ;; Simple string term
+   ((stringp expr)
+    (format "%s \"%s\"" criteria (imap-quote-specials expr)))
+   ;; Trivial term: and
+   ((eq expr 'and) nil)
+   ;; Composite term: or expression
+   ((eq (car-safe expr) 'or)
+    (format "OR %s %s"
+	    (nnir-imap-expr-to-imap criteria (second expr))
+	    (nnir-imap-expr-to-imap criteria (third expr))))
+   ;; Composite term: just the fax, mam
+   ((eq (car-safe expr) 'not)
+    (format "NOT (%s)" (nnir-imap-query-to-imap criteria (rest expr))))
+   ;; Composite term: just expand it all.
+   ((and (not (null expr)) (listp expr))
+    (format "(%s)" (nnir-imap-query-to-imap criteria expr)))
+   ;; Complex value, give up for now.
+   (t (error "Unhandled input: %S" expr))))
+
+
+(defun nnir-imap-parse-query (string)
+  "Turn STRING into an s-expression based query based on the IMAP
+query language as defined in `nnir-imap-make-query'.
+
+This involves turning individual tokens into higher level terms
+that the search language can then understand and use."
+  (with-temp-buffer
+    ;; Set up the parsing environment.
+    (insert string)
+    (goto-char (point-min))
+    ;; Now, collect the output terms and return them.
+    (let (out)
+      (while (not (nnir-imap-end-of-input))
+	(push (nnir-imap-next-expr) out))
+      (reverse out))))
+
+
+(defun nnir-imap-next-expr (&optional count)
+  "Return the next expression from the current buffer."
+  (let ((term (nnir-imap-next-term count))
+	(next (nnir-imap-peek-symbol)))
+    ;; Are we looking at an 'or' expression?
+    (cond
+     ;; Handle 'expr or expr'
+     ((eq next 'or)
+      (list 'or term (nnir-imap-next-expr 2)))
+     ;; Anything else
+     (t term))))
+
+
+(defun nnir-imap-next-term (&optional count)
+  "Return the next TERM from the current buffer."
+  (let ((term (nnir-imap-next-symbol count)))
+    ;; What sort of term is this?
+    (cond
+     ;; and -- just ignore it
+     ((eq term 'and) 'and)
+     ;; negated term
+     ((eq term 'not) (list 'not (nnir-imap-next-expr)))
+     ;; generic term
+     (t term))))
+
+
+(defun nnir-imap-peek-symbol ()
+  "Return the next symbol from the current buffer, but don't consume it."
+  (save-excursion
+    (nnir-imap-next-symbol)))
+
+(defun nnir-imap-next-symbol (&optional count)
+  "Return the next symbol from the current buffer, or nil if we are
+at the end of the buffer.  If supplied COUNT skips some symbols before
+returning the one at the supplied position."
+  (when (and (numberp count) (> count 1))
+    (nnir-imap-next-symbol (1- count)))
+  (let ((case-fold-search t))
+    ;; end of input stream?
+    (unless (nnir-imap-end-of-input)
+      ;; No, return the next symbol from the stream.
+      (cond
+       ;; negated expression -- return it and advance one char.
+       ((looking-at "-") (forward-char 1) 'not)
+       ;; quoted string
+       ((looking-at "\"") (nnir-imap-delimited-string "\""))
+       ;; list expression -- we parse the content and return this as a list.
+       ((looking-at "(")
+	(nnir-imap-parse-query (nnir-imap-delimited-string ")")))
+       ;; keyword input -- return a symbol version
+       ((looking-at "\\band\\b") (forward-char 3) 'and)
+       ((looking-at "\\bor\\b")  (forward-char 2) 'or)
+       ((looking-at "\\bnot\\b") (forward-char 3) 'not)
+       ;; Simple, boring keyword
+       (t (let ((start (point))
+		(end (if (search-forward-regexp "[[:blank:]]" nil t)
+			 (prog1
+			     (match-beginning 0)
+			   ;; unskip if we hit a non-blank terminal character.
+			   (when (string-match "[^[:blank:]]" (match-string 0))
+			     (backward-char 1)))
+		       (goto-char (point-max)))))
+	    (buffer-substring start end)))))))
+
+(defun nnir-imap-delimited-string (delimiter)
+  "Return a delimited string from the current buffer."
+  (let ((start (point)) end)
+    (forward-char 1)			; skip the first delimiter.
+    (while (not end)
+      (unless (search-forward delimiter nil t)
+	(error "Unmatched delimited input with %s in query" delimiter))
+      (let ((here (point)))
+	(unless (equal (buffer-substring (- here 2) (- here 1)) "\\")
+	  (setq end (point)))))
+    (buffer-substring (1+ start) (1- end))))
+
+(defun nnir-imap-end-of-input ()
+  "Are we at the end of input?"
+  (skip-chars-forward "[[:blank:]]")
+  (looking-at "$"))
+  
 
 ;; Swish++ interface.
 ;; -cc- Todo

             reply	other threads:[~2007-12-10 11:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-10 11:37 Daniel Pittman [this message]
2007-12-11 10:39 ` Vegard Vesterheim
2008-04-13 14:05 ` Reiner Steib
2008-04-14 11:54   ` Daniel Pittman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r6hu3hml.fsf@enki.rimspace.net \
    --to=daniel@rimspace.net \
    --cc=ding@gnus.org \
    --cc=grossjohann@ls6.cs.uni-dortmund.de \
    --cc=jas@pdc.kth.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).