From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/65932 Path: news.gmane.org!not-for-mail From: Daniel Pittman Newsgroups: gmane.emacs.gnus.general Subject: NNIR, IMAP SEARCH, and the infinite pain of search terms. Date: Mon, 10 Dec 2007 22:37:38 +1100 Organization: Cybersource: Australia's Leading Linux and Open Source Solutions Company Message-ID: <87r6hu3hml.fsf@enki.rimspace.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1197288462 24051 80.91.229.12 (10 Dec 2007 12:07:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 10 Dec 2007 12:07:42 +0000 (UTC) Cc: Kai =?utf-8?Q?Gro=C3=9Fjohann?= , Simon Josefsson To: ding@gnus.org Original-X-From: ding-owner+M14426=ding+2Daccount=gmane.org@lists.math.uh.edu Mon Dec 10 13:07:52 2007 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.50) id 1J1hQH-0004Gj-K8 for ding-account@gmane.org; Mon, 10 Dec 2007 13:07:46 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1J1hPz-0004CP-Ef for ding-account@gmane.org; Mon, 10 Dec 2007 06:07:27 -0600 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1J1hPx-0004CJ-O3 for ding@lists.math.uh.edu; Mon, 10 Dec 2007 06:07:25 -0600 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtp (Exim 4.67) (envelope-from ) id 1J1hPp-000823-Sn for ding@lists.math.uh.edu; Mon, 10 Dec 2007 06:07:25 -0600 Original-Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1J1hPm-0002kU-00 for ; Mon, 10 Dec 2007 13:07:14 +0100 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1J1hHC-0006Ca-5z for ding@gnus.org; Mon, 10 Dec 2007 11:58:22 +0000 Original-Received: from 203-217-31-70.perm.iinet.net.au ([203.217.31.70]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 10 Dec 2007 11:58:22 +0000 Original-Received: from daniel by 203-217-31-70.perm.iinet.net.au with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 10 Dec 2007 11:58:22 +0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 255 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 203-217-31-70.perm.iinet.net.au User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/23.0.60 (gnu/linux) Cancel-Lock: sha1:6bOal/D603AKmxm4woLeqbD5Ai8= X-Spam-Score: -2.6 (--) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:65932 Archived-At: --=-=-= G'day. Gnus contains, in contrib/, the nnir.el interface to search engines. This is a nice tool and, pleasantly, supports IMAP SEARCH to give me efficient searching of my IMAP mail ... or so I thought. The biggest problem I had was that it would never seem to find the mail I expected, so I didn't use it much. After inspecting the code the reason became clear: the search was a single "exact substring" match performed, not the logical sort of search that I have come to expect with Google and other search engines. So... the IMAP SEARCH command doesn't do any clever parsing or anything; the front end software has to do that. Attached is my first "draft" of a more complex search front-end for NNIR and IMAP SEARCH -- it parses the query, translates that into a suitable IMAP SEARCH command and returns the results. This is *much* less surprising to me: it returns what I expect, most of the time, and takes the sort of input I would expect as well. At the moment it only handles basic searching, as documented in the `nnir-imap-make-query' function in the patch. I plan to extend this to support the full range of operators that IMAP SEARCH supports, but wanted to seek feedback on the initial implementation first. I have signed papers assigning Gnus changes already, so there should be no legal reason that this is rejected. Regards, Daniel -- Daniel Pittman Phone: 03 9621 2377 Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au Cybersource: Australia's Leading Linux and Open Source Solutions Company --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=imap-search.patch ? imap-search.patch Index: contrib/nnir.el =================================================================== RCS file: /usr/local/cvsroot/gnus/contrib/nnir.el,v retrieving revision 7.22 diff -u -r7.22 nnir.el --- contrib/nnir.el 4 Oct 2007 18:51:27 -0000 7.22 +++ contrib/nnir.el 10 Dec 2007 11:18:15 -0000 @@ -886,6 +886,9 @@ ;; handle errors (defun nnir-run-imap (query srv &optional group-option) + "Run a search against an IMAP back-end server. +This uses a custom query language parser; see `nnir-imap-make-query' for +details on the language and supported extensions" (require 'imap) (require 'nnimap) (save-excursion @@ -908,11 +911,182 @@ (lambda (artnum) (push (vector group artnum 1) artlist) (setq arts (1+ arts))) - (imap-search (concat criteria " \"" qstring "\"") buf)) + (imap-search (nnir-imap-make-query criteria qstring) buf)) (message "Searching %s... %d matches" mbx arts))) (message "Searching %s...done" group)) (quit nil)) (reverse artlist)))) + +(defun nnir-imap-make-query (criteria qstring) + "Parse the query string and criteria into an appropriate IMAP search +expression, returning the string query to make. + +This implements a little language designed to return the expected results +to an arbitrary query string to the end user. + +The search is always case-insensitive, as defined by RFC2060, and supports +the following features (inspired by the Google search input language): + +Automatic \"and\" queries + If you specify multiple words then they will be treated as an \"and\" + expression intended to match all components. + +Phrase searches + If you wrap your query in double-quotes then it will be treated as a + literal string. + +Negative terms + If you precede a term with \"-\" then it will negate that. + +\"OR\" queries + If you include an upper-case \"OR\" in your search it will cause the + term before it and the term after it to be treated as alternatives. + +In future the following will be added to the language: + * support for date matches + * support for location of text matching within the query + * from/to/etc headers + * additional search terms + * flag based searching + * anything else that the RFC supports, basically." + ;; Walk through the query and turn it into an IMAP query string. + (nnir-imap-query-to-imap criteria (nnir-imap-parse-query qstring))) + + +(defun nnir-imap-query-to-imap (criteria query) + "Turn a s-expression format query into IMAP." + (mapconcat + ;; Turn the expressions into IMAP text + (lambda (item) + (nnir-imap-expr-to-imap criteria item)) + ;; The query, already in s-expr format. + query + ;; Append a space between each expression + " ")) + + +(defun nnir-imap-expr-to-imap (criteria expr) + "Convert EXPR into an IMAP search expression on CRITERIA" + ;; What sort of expression is this, eh? + (cond + ;; Simple string term + ((stringp expr) + (format "%s \"%s\"" criteria (imap-quote-specials expr))) + ;; Trivial term: and + ((eq expr 'and) nil) + ;; Composite term: or expression + ((eq (car-safe expr) 'or) + (format "OR %s %s" + (nnir-imap-expr-to-imap criteria (second expr)) + (nnir-imap-expr-to-imap criteria (third expr)))) + ;; Composite term: just the fax, mam + ((eq (car-safe expr) 'not) + (format "NOT (%s)" (nnir-imap-query-to-imap criteria (rest expr)))) + ;; Composite term: just expand it all. + ((and (not (null expr)) (listp expr)) + (format "(%s)" (nnir-imap-query-to-imap criteria expr))) + ;; Complex value, give up for now. + (t (error "Unhandled input: %S" expr)))) + + +(defun nnir-imap-parse-query (string) + "Turn STRING into an s-expression based query based on the IMAP +query language as defined in `nnir-imap-make-query'. + +This involves turning individual tokens into higher level terms +that the search language can then understand and use." + (with-temp-buffer + ;; Set up the parsing environment. + (insert string) + (goto-char (point-min)) + ;; Now, collect the output terms and return them. + (let (out) + (while (not (nnir-imap-end-of-input)) + (push (nnir-imap-next-expr) out)) + (reverse out)))) + + +(defun nnir-imap-next-expr (&optional count) + "Return the next expression from the current buffer." + (let ((term (nnir-imap-next-term count)) + (next (nnir-imap-peek-symbol))) + ;; Are we looking at an 'or' expression? + (cond + ;; Handle 'expr or expr' + ((eq next 'or) + (list 'or term (nnir-imap-next-expr 2))) + ;; Anything else + (t term)))) + + +(defun nnir-imap-next-term (&optional count) + "Return the next TERM from the current buffer." + (let ((term (nnir-imap-next-symbol count))) + ;; What sort of term is this? + (cond + ;; and -- just ignore it + ((eq term 'and) 'and) + ;; negated term + ((eq term 'not) (list 'not (nnir-imap-next-expr))) + ;; generic term + (t term)))) + + +(defun nnir-imap-peek-symbol () + "Return the next symbol from the current buffer, but don't consume it." + (save-excursion + (nnir-imap-next-symbol))) + +(defun nnir-imap-next-symbol (&optional count) + "Return the next symbol from the current buffer, or nil if we are +at the end of the buffer. If supplied COUNT skips some symbols before +returning the one at the supplied position." + (when (and (numberp count) (> count 1)) + (nnir-imap-next-symbol (1- count))) + (let ((case-fold-search t)) + ;; end of input stream? + (unless (nnir-imap-end-of-input) + ;; No, return the next symbol from the stream. + (cond + ;; negated expression -- return it and advance one char. + ((looking-at "-") (forward-char 1) 'not) + ;; quoted string + ((looking-at "\"") (nnir-imap-delimited-string "\"")) + ;; list expression -- we parse the content and return this as a list. + ((looking-at "(") + (nnir-imap-parse-query (nnir-imap-delimited-string ")"))) + ;; keyword input -- return a symbol version + ((looking-at "\\band\\b") (forward-char 3) 'and) + ((looking-at "\\bor\\b") (forward-char 2) 'or) + ((looking-at "\\bnot\\b") (forward-char 3) 'not) + ;; Simple, boring keyword + (t (let ((start (point)) + (end (if (search-forward-regexp "[[:blank:]]" nil t) + (prog1 + (match-beginning 0) + ;; unskip if we hit a non-blank terminal character. + (when (string-match "[^[:blank:]]" (match-string 0)) + (backward-char 1))) + (goto-char (point-max))))) + (buffer-substring start end))))))) + +(defun nnir-imap-delimited-string (delimiter) + "Return a delimited string from the current buffer." + (let ((start (point)) end) + (forward-char 1) ; skip the first delimiter. + (while (not end) + (unless (search-forward delimiter nil t) + (error "Unmatched delimited input with %s in query" delimiter)) + (let ((here (point))) + (unless (equal (buffer-substring (- here 2) (- here 1)) "\\") + (setq end (point))))) + (buffer-substring (1+ start) (1- end)))) + +(defun nnir-imap-end-of-input () + "Are we at the end of input?" + (skip-chars-forward "[[:blank:]]") + (looking-at "$")) + ;; Swish++ interface. ;; -cc- Todo --=-=-=--