From: Daniel Pittman <daniel@rimspace.net>
To: ding@gnus.org
Cc: "Kai Großjohann" <grossjohann@ls6.cs.uni-dortmund.de>,
"Simon Josefsson" <jas@pdc.kth.se>
Subject: NNIR, IMAP SEARCH, and the infinite pain of search terms.
Date: Mon, 10 Dec 2007 22:37:38 +1100 [thread overview]
Message-ID: <87r6hu3hml.fsf@enki.rimspace.net> (raw)
[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]
G'day.
Gnus contains, in contrib/, the nnir.el interface to search engines.
This is a nice tool and, pleasantly, supports IMAP SEARCH to give me
efficient searching of my IMAP mail ... or so I thought.
The biggest problem I had was that it would never seem to find the mail
I expected, so I didn't use it much.
After inspecting the code the reason became clear: the search was a
single "exact substring" match performed, not the logical sort of search
that I have come to expect with Google and other search engines.
So... the IMAP SEARCH command doesn't do any clever parsing or
anything; the front end software has to do that.
Attached is my first "draft" of a more complex search front-end for NNIR
and IMAP SEARCH -- it parses the query, translates that into a suitable
IMAP SEARCH command and returns the results.
This is *much* less surprising to me: it returns what I expect, most of
the time, and takes the sort of input I would expect as well.
At the moment it only handles basic searching, as documented in the
`nnir-imap-make-query' function in the patch.
I plan to extend this to support the full range of operators that IMAP
SEARCH supports, but wanted to seek feedback on the initial
implementation first.
I have signed papers assigning Gnus changes already, so there should be
no legal reason that this is rejected.
Regards,
Daniel
--
Daniel Pittman <daniel@cybersource.com.au> Phone: 03 9621 2377
Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au
Cybersource: Australia's Leading Linux and Open Source Solutions Company
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: imap-search.patch --]
[-- Type: text/x-diff, Size: 7341 bytes --]
? imap-search.patch
Index: contrib/nnir.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/contrib/nnir.el,v
retrieving revision 7.22
diff -u -r7.22 nnir.el
--- contrib/nnir.el 4 Oct 2007 18:51:27 -0000 7.22
+++ contrib/nnir.el 10 Dec 2007 11:18:15 -0000
@@ -886,6 +886,9 @@
;; handle errors
(defun nnir-run-imap (query srv &optional group-option)
+ "Run a search against an IMAP back-end server.
+This uses a custom query language parser; see `nnir-imap-make-query' for
+details on the language and supported extensions"
(require 'imap)
(require 'nnimap)
(save-excursion
@@ -908,11 +911,182 @@
(lambda (artnum)
(push (vector group artnum 1) artlist)
(setq arts (1+ arts)))
- (imap-search (concat criteria " \"" qstring "\"") buf))
+ (imap-search (nnir-imap-make-query criteria qstring) buf))
(message "Searching %s... %d matches" mbx arts)))
(message "Searching %s...done" group))
(quit nil))
(reverse artlist))))
+
+(defun nnir-imap-make-query (criteria qstring)
+ "Parse the query string and criteria into an appropriate IMAP search
+expression, returning the string query to make.
+
+This implements a little language designed to return the expected results
+to an arbitrary query string to the end user.
+
+The search is always case-insensitive, as defined by RFC2060, and supports
+the following features (inspired by the Google search input language):
+
+Automatic \"and\" queries
+ If you specify multiple words then they will be treated as an \"and\"
+ expression intended to match all components.
+
+Phrase searches
+ If you wrap your query in double-quotes then it will be treated as a
+ literal string.
+
+Negative terms
+ If you precede a term with \"-\" then it will negate that.
+
+\"OR\" queries
+ If you include an upper-case \"OR\" in your search it will cause the
+ term before it and the term after it to be treated as alternatives.
+
+In future the following will be added to the language:
+ * support for date matches
+ * support for location of text matching within the query
+ * from/to/etc headers
+ * additional search terms
+ * flag based searching
+ * anything else that the RFC supports, basically."
+ ;; Walk through the query and turn it into an IMAP query string.
+ (nnir-imap-query-to-imap criteria (nnir-imap-parse-query qstring)))
+
+
+(defun nnir-imap-query-to-imap (criteria query)
+ "Turn a s-expression format query into IMAP."
+ (mapconcat
+ ;; Turn the expressions into IMAP text
+ (lambda (item)
+ (nnir-imap-expr-to-imap criteria item))
+ ;; The query, already in s-expr format.
+ query
+ ;; Append a space between each expression
+ " "))
+
+
+(defun nnir-imap-expr-to-imap (criteria expr)
+ "Convert EXPR into an IMAP search expression on CRITERIA"
+ ;; What sort of expression is this, eh?
+ (cond
+ ;; Simple string term
+ ((stringp expr)
+ (format "%s \"%s\"" criteria (imap-quote-specials expr)))
+ ;; Trivial term: and
+ ((eq expr 'and) nil)
+ ;; Composite term: or expression
+ ((eq (car-safe expr) 'or)
+ (format "OR %s %s"
+ (nnir-imap-expr-to-imap criteria (second expr))
+ (nnir-imap-expr-to-imap criteria (third expr))))
+ ;; Composite term: just the fax, mam
+ ((eq (car-safe expr) 'not)
+ (format "NOT (%s)" (nnir-imap-query-to-imap criteria (rest expr))))
+ ;; Composite term: just expand it all.
+ ((and (not (null expr)) (listp expr))
+ (format "(%s)" (nnir-imap-query-to-imap criteria expr)))
+ ;; Complex value, give up for now.
+ (t (error "Unhandled input: %S" expr))))
+
+
+(defun nnir-imap-parse-query (string)
+ "Turn STRING into an s-expression based query based on the IMAP
+query language as defined in `nnir-imap-make-query'.
+
+This involves turning individual tokens into higher level terms
+that the search language can then understand and use."
+ (with-temp-buffer
+ ;; Set up the parsing environment.
+ (insert string)
+ (goto-char (point-min))
+ ;; Now, collect the output terms and return them.
+ (let (out)
+ (while (not (nnir-imap-end-of-input))
+ (push (nnir-imap-next-expr) out))
+ (reverse out))))
+
+
+(defun nnir-imap-next-expr (&optional count)
+ "Return the next expression from the current buffer."
+ (let ((term (nnir-imap-next-term count))
+ (next (nnir-imap-peek-symbol)))
+ ;; Are we looking at an 'or' expression?
+ (cond
+ ;; Handle 'expr or expr'
+ ((eq next 'or)
+ (list 'or term (nnir-imap-next-expr 2)))
+ ;; Anything else
+ (t term))))
+
+
+(defun nnir-imap-next-term (&optional count)
+ "Return the next TERM from the current buffer."
+ (let ((term (nnir-imap-next-symbol count)))
+ ;; What sort of term is this?
+ (cond
+ ;; and -- just ignore it
+ ((eq term 'and) 'and)
+ ;; negated term
+ ((eq term 'not) (list 'not (nnir-imap-next-expr)))
+ ;; generic term
+ (t term))))
+
+
+(defun nnir-imap-peek-symbol ()
+ "Return the next symbol from the current buffer, but don't consume it."
+ (save-excursion
+ (nnir-imap-next-symbol)))
+
+(defun nnir-imap-next-symbol (&optional count)
+ "Return the next symbol from the current buffer, or nil if we are
+at the end of the buffer. If supplied COUNT skips some symbols before
+returning the one at the supplied position."
+ (when (and (numberp count) (> count 1))
+ (nnir-imap-next-symbol (1- count)))
+ (let ((case-fold-search t))
+ ;; end of input stream?
+ (unless (nnir-imap-end-of-input)
+ ;; No, return the next symbol from the stream.
+ (cond
+ ;; negated expression -- return it and advance one char.
+ ((looking-at "-") (forward-char 1) 'not)
+ ;; quoted string
+ ((looking-at "\"") (nnir-imap-delimited-string "\""))
+ ;; list expression -- we parse the content and return this as a list.
+ ((looking-at "(")
+ (nnir-imap-parse-query (nnir-imap-delimited-string ")")))
+ ;; keyword input -- return a symbol version
+ ((looking-at "\\band\\b") (forward-char 3) 'and)
+ ((looking-at "\\bor\\b") (forward-char 2) 'or)
+ ((looking-at "\\bnot\\b") (forward-char 3) 'not)
+ ;; Simple, boring keyword
+ (t (let ((start (point))
+ (end (if (search-forward-regexp "[[:blank:]]" nil t)
+ (prog1
+ (match-beginning 0)
+ ;; unskip if we hit a non-blank terminal character.
+ (when (string-match "[^[:blank:]]" (match-string 0))
+ (backward-char 1)))
+ (goto-char (point-max)))))
+ (buffer-substring start end)))))))
+
+(defun nnir-imap-delimited-string (delimiter)
+ "Return a delimited string from the current buffer."
+ (let ((start (point)) end)
+ (forward-char 1) ; skip the first delimiter.
+ (while (not end)
+ (unless (search-forward delimiter nil t)
+ (error "Unmatched delimited input with %s in query" delimiter))
+ (let ((here (point)))
+ (unless (equal (buffer-substring (- here 2) (- here 1)) "\\")
+ (setq end (point)))))
+ (buffer-substring (1+ start) (1- end))))
+
+(defun nnir-imap-end-of-input ()
+ "Are we at the end of input?"
+ (skip-chars-forward "[[:blank:]]")
+ (looking-at "$"))
+
;; Swish++ interface.
;; -cc- Todo
next reply other threads:[~2007-12-10 11:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-10 11:37 Daniel Pittman [this message]
2007-12-11 10:39 ` Vegard Vesterheim
2008-04-13 14:05 ` Reiner Steib
2008-04-14 11:54 ` Daniel Pittman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r6hu3hml.fsf@enki.rimspace.net \
--to=daniel@rimspace.net \
--cc=ding@gnus.org \
--cc=grossjohann@ls6.cs.uni-dortmund.de \
--cc=jas@pdc.kth.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).