Gnus development mailing list
 help / color / mirror / Atom feed
* NNIR, IMAP SEARCH, and the infinite pain of search terms.
@ 2007-12-10 11:37 Daniel Pittman
  2007-12-11 10:39 ` Vegard Vesterheim
  2008-04-13 14:05 ` Reiner Steib
  0 siblings, 2 replies; 4+ messages in thread
From: Daniel Pittman @ 2007-12-10 11:37 UTC (permalink / raw)
  To: ding; +Cc: Kai Großjohann, Simon Josefsson

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

G'day.

Gnus contains, in contrib/, the nnir.el interface to search engines.
This is a nice tool and, pleasantly, supports IMAP SEARCH to give me
efficient searching of my IMAP mail ... or so I thought.

The biggest problem I had was that it would never seem to find the mail
I expected, so I didn't use it much.  

After inspecting the code the reason became clear: the search was a
single "exact substring" match performed, not the logical sort of search
that I have come to expect with Google and other search engines.


So...  the IMAP SEARCH command doesn't do any clever parsing or
anything; the front end software has to do that.  

Attached is my first "draft" of a more complex search front-end for NNIR
and IMAP SEARCH -- it parses the query, translates that into a suitable
IMAP SEARCH command and returns the results.


This is *much* less surprising to me: it returns what I expect, most of
the time, and takes the sort of input I would expect as well.


At the moment it only handles basic searching, as documented in the
`nnir-imap-make-query' function in the patch.

I plan to extend this to support the full range of operators that IMAP
SEARCH supports, but wanted to seek feedback on the initial
implementation first.


I have signed papers assigning Gnus changes already, so there should be
no legal reason that this is rejected.

Regards,
        Daniel
-- 
Daniel Pittman <daniel@cybersource.com.au>           Phone: 03 9621 2377
Level 4, 10 Queen St, Melbourne             Web: http://www.cyber.com.au
Cybersource: Australia's Leading Linux and Open Source Solutions Company


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: imap-search.patch --]
[-- Type: text/x-diff, Size: 7341 bytes --]

? imap-search.patch
Index: contrib/nnir.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/contrib/nnir.el,v
retrieving revision 7.22
diff -u -r7.22 nnir.el
--- contrib/nnir.el	4 Oct 2007 18:51:27 -0000	7.22
+++ contrib/nnir.el	10 Dec 2007 11:18:15 -0000
@@ -886,6 +886,9 @@
 ;; handle errors
 
 (defun nnir-run-imap (query srv &optional group-option)
+  "Run a search against an IMAP back-end server.
+This uses a custom query language parser; see `nnir-imap-make-query' for
+details on the language and supported extensions"
   (require 'imap)
   (require 'nnimap)
   (save-excursion
@@ -908,11 +911,182 @@
                  (lambda (artnum)
                    (push (vector group artnum 1) artlist)
                    (setq arts (1+ arts)))
-                 (imap-search (concat criteria " \"" qstring "\"") buf))
+                 (imap-search (nnir-imap-make-query criteria qstring) buf))
                 (message "Searching %s... %d matches" mbx arts)))
             (message "Searching %s...done" group))
         (quit nil))
       (reverse artlist))))
+
+(defun nnir-imap-make-query (criteria qstring)
+  "Parse the query string and criteria into an appropriate IMAP search
+expression, returning the string query to make.
+
+This implements a little language designed to return the expected results
+to an arbitrary query string to the end user.
+
+The search is always case-insensitive, as defined by RFC2060, and supports
+the following features (inspired by the Google search input language): 
+
+Automatic \"and\" queries
+    If you specify multiple words then they will be treated as an \"and\"
+    expression intended to match all components.
+
+Phrase searches
+    If you wrap your query in double-quotes then it will be treated as a
+    literal string.
+
+Negative terms
+    If you precede a term with \"-\" then it will negate that.
+
+\"OR\" queries
+    If you include an upper-case \"OR\" in your search it will cause the
+    term before it and the term after it to be treated as alternatives.
+
+In future the following will be added to the language:
+ * support for date matches
+ * support for location of text matching within the query
+ * from/to/etc headers
+ * additional search terms
+ * flag based searching
+ * anything else that the RFC supports, basically."
+  ;; Walk through the query and turn it into an IMAP query string.
+  (nnir-imap-query-to-imap criteria (nnir-imap-parse-query qstring)))
+
+
+(defun nnir-imap-query-to-imap (criteria query)
+  "Turn a s-expression format query into IMAP."
+  (mapconcat
+   ;; Turn the expressions into IMAP text
+   (lambda (item)
+     (nnir-imap-expr-to-imap criteria item))
+   ;; The query, already in s-expr format.
+   query
+   ;; Append a space between each expression
+   " "))
+
+
+(defun nnir-imap-expr-to-imap (criteria expr)
+  "Convert EXPR into an IMAP search expression on CRITERIA"
+  ;; What sort of expression is this, eh?
+  (cond
+   ;; Simple string term
+   ((stringp expr)
+    (format "%s \"%s\"" criteria (imap-quote-specials expr)))
+   ;; Trivial term: and
+   ((eq expr 'and) nil)
+   ;; Composite term: or expression
+   ((eq (car-safe expr) 'or)
+    (format "OR %s %s"
+	    (nnir-imap-expr-to-imap criteria (second expr))
+	    (nnir-imap-expr-to-imap criteria (third expr))))
+   ;; Composite term: just the fax, mam
+   ((eq (car-safe expr) 'not)
+    (format "NOT (%s)" (nnir-imap-query-to-imap criteria (rest expr))))
+   ;; Composite term: just expand it all.
+   ((and (not (null expr)) (listp expr))
+    (format "(%s)" (nnir-imap-query-to-imap criteria expr)))
+   ;; Complex value, give up for now.
+   (t (error "Unhandled input: %S" expr))))
+
+
+(defun nnir-imap-parse-query (string)
+  "Turn STRING into an s-expression based query based on the IMAP
+query language as defined in `nnir-imap-make-query'.
+
+This involves turning individual tokens into higher level terms
+that the search language can then understand and use."
+  (with-temp-buffer
+    ;; Set up the parsing environment.
+    (insert string)
+    (goto-char (point-min))
+    ;; Now, collect the output terms and return them.
+    (let (out)
+      (while (not (nnir-imap-end-of-input))
+	(push (nnir-imap-next-expr) out))
+      (reverse out))))
+
+
+(defun nnir-imap-next-expr (&optional count)
+  "Return the next expression from the current buffer."
+  (let ((term (nnir-imap-next-term count))
+	(next (nnir-imap-peek-symbol)))
+    ;; Are we looking at an 'or' expression?
+    (cond
+     ;; Handle 'expr or expr'
+     ((eq next 'or)
+      (list 'or term (nnir-imap-next-expr 2)))
+     ;; Anything else
+     (t term))))
+
+
+(defun nnir-imap-next-term (&optional count)
+  "Return the next TERM from the current buffer."
+  (let ((term (nnir-imap-next-symbol count)))
+    ;; What sort of term is this?
+    (cond
+     ;; and -- just ignore it
+     ((eq term 'and) 'and)
+     ;; negated term
+     ((eq term 'not) (list 'not (nnir-imap-next-expr)))
+     ;; generic term
+     (t term))))
+
+
+(defun nnir-imap-peek-symbol ()
+  "Return the next symbol from the current buffer, but don't consume it."
+  (save-excursion
+    (nnir-imap-next-symbol)))
+
+(defun nnir-imap-next-symbol (&optional count)
+  "Return the next symbol from the current buffer, or nil if we are
+at the end of the buffer.  If supplied COUNT skips some symbols before
+returning the one at the supplied position."
+  (when (and (numberp count) (> count 1))
+    (nnir-imap-next-symbol (1- count)))
+  (let ((case-fold-search t))
+    ;; end of input stream?
+    (unless (nnir-imap-end-of-input)
+      ;; No, return the next symbol from the stream.
+      (cond
+       ;; negated expression -- return it and advance one char.
+       ((looking-at "-") (forward-char 1) 'not)
+       ;; quoted string
+       ((looking-at "\"") (nnir-imap-delimited-string "\""))
+       ;; list expression -- we parse the content and return this as a list.
+       ((looking-at "(")
+	(nnir-imap-parse-query (nnir-imap-delimited-string ")")))
+       ;; keyword input -- return a symbol version
+       ((looking-at "\\band\\b") (forward-char 3) 'and)
+       ((looking-at "\\bor\\b")  (forward-char 2) 'or)
+       ((looking-at "\\bnot\\b") (forward-char 3) 'not)
+       ;; Simple, boring keyword
+       (t (let ((start (point))
+		(end (if (search-forward-regexp "[[:blank:]]" nil t)
+			 (prog1
+			     (match-beginning 0)
+			   ;; unskip if we hit a non-blank terminal character.
+			   (when (string-match "[^[:blank:]]" (match-string 0))
+			     (backward-char 1)))
+		       (goto-char (point-max)))))
+	    (buffer-substring start end)))))))
+
+(defun nnir-imap-delimited-string (delimiter)
+  "Return a delimited string from the current buffer."
+  (let ((start (point)) end)
+    (forward-char 1)			; skip the first delimiter.
+    (while (not end)
+      (unless (search-forward delimiter nil t)
+	(error "Unmatched delimited input with %s in query" delimiter))
+      (let ((here (point)))
+	(unless (equal (buffer-substring (- here 2) (- here 1)) "\\")
+	  (setq end (point)))))
+    (buffer-substring (1+ start) (1- end))))
+
+(defun nnir-imap-end-of-input ()
+  "Are we at the end of input?"
+  (skip-chars-forward "[[:blank:]]")
+  (looking-at "$"))
+  
 
 ;; Swish++ interface.
 ;; -cc- Todo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NNIR, IMAP SEARCH, and the infinite pain of search terms.
  2007-12-10 11:37 NNIR, IMAP SEARCH, and the infinite pain of search terms Daniel Pittman
@ 2007-12-11 10:39 ` Vegard Vesterheim
  2008-04-13 14:05 ` Reiner Steib
  1 sibling, 0 replies; 4+ messages in thread
From: Vegard Vesterheim @ 2007-12-11 10:39 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: ding, Kai Großjohann, Simon Josefsson

On Mon, 10 Dec 2007 22:37:38 +1100 Daniel Pittman <daniel@rimspace.net> wrote:

> G'day.
>
> Gnus contains, in contrib/, the nnir.el interface to search engines.
> This is a nice tool and, pleasantly, supports IMAP SEARCH to give me
> efficient searching of my IMAP mail ... or so I thought.
>
> The biggest problem I had was that it would never seem to find the mail
> I expected, so I didn't use it much.  
>
> After inspecting the code the reason became clear: the search was a
> single "exact substring" match performed, not the logical sort of search
> that I have come to expect with Google and other search engines.
>
>
> So...  the IMAP SEARCH command doesn't do any clever parsing or
> anything; the front end software has to do that.  
>
> Attached is my first "draft" of a more complex search front-end for NNIR
> and IMAP SEARCH -- it parses the query, translates that into a suitable
> IMAP SEARCH command and returns the results.
>
>
> This is *much* less surprising to me: it returns what I expect, most of
> the time, and takes the sort of input I would expect as well.
>
>
> At the moment it only handles basic searching, as documented in the
> `nnir-imap-make-query' function in the patch.
>
> I plan to extend this to support the full range of operators that IMAP
> SEARCH supports, but wanted to seek feedback on the initial
> implementation first.

I have been bothered by the same limitation in nnir. Glad to see
someone tackling this issue.  My elisp skills are limited, so my take
on this was simply to replace this:

   (imap-search (concat "TEXT \"" qstring "\"") buf))

with this:

   (imap-search (concat "CHARSET iso-8859-1 " qstring) buf))

Now I can write IMAP SEARCH commands directly myself. This is very
useful, I can for example do searches like:

   (OR FROM foo FROM bar) BEFORE Jul-18-2007

I find this very useful, although it means I have to spell out the IMAP
SEARCH command myself. 

Looking forward to trying your solution.

 - Vegard -




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NNIR, IMAP SEARCH, and the infinite pain of search terms.
  2007-12-10 11:37 NNIR, IMAP SEARCH, and the infinite pain of search terms Daniel Pittman
  2007-12-11 10:39 ` Vegard Vesterheim
@ 2008-04-13 14:05 ` Reiner Steib
  2008-04-14 11:54   ` Daniel Pittman
  1 sibling, 1 reply; 4+ messages in thread
From: Reiner Steib @ 2008-04-13 14:05 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: ding

On Mon, Dec 10 2007, Daniel Pittman wrote:

> Gnus contains, in contrib/, the nnir.el interface to search engines.
> This is a nice tool and, pleasantly, supports IMAP SEARCH to give me
> efficient searching of my IMAP mail ... or so I thought.
>
> The biggest problem I had was that it would never seem to find the mail
> I expected, so I didn't use it much.  
>
> After inspecting the code the reason became clear: the search was a
> single "exact substring" match performed, not the logical sort of search
> that I have come to expect with Google and other search engines.
>
> So...  the IMAP SEARCH command doesn't do any clever parsing or
> anything; the front end software has to do that.  
>
> Attached is my first "draft" of a more complex search front-end for NNIR
> and IMAP SEARCH -- it parses the query, translates that into a suitable
> IMAP SEARCH command and returns the results.
>
> This is *much* less surprising to me: it returns what I expect, most of
> the time, and takes the sort of input I would expect as well.
>
> At the moment it only handles basic searching, as documented in the
> `nnir-imap-make-query' function in the patch.
>
> I plan to extend this to support the full range of operators that IMAP
> SEARCH supports, but wanted to seek feedback on the initial
> implementation first.
>
> I have signed papers assigning Gnus changes already, so there should be
> no legal reason that this is rejected.

Installed.  Thanks for your contribution.  Sorry for the delay.

Could you please suggest an improved variant of my minimal ChangeLog
entry?

	* nnir.el (nnir-run-imap): Add doc string.  Use `nnir-imap-make-query'.
	(nnir-imap-make-query, nnir-imap-query-to-imap)
	(nnir-imap-expr-to-imap, nnir-imap-parse-query, nnir-imap-next-expr)
	(nnir-imap-peek-symbol, nnir-imap-next-symbol)
	(nnir-imap-delimited-string, nnir-imap-end-of-input): New functions.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NNIR, IMAP SEARCH, and the infinite pain of search terms.
  2008-04-13 14:05 ` Reiner Steib
@ 2008-04-14 11:54   ` Daniel Pittman
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Pittman @ 2008-04-14 11:54 UTC (permalink / raw)
  To: ding

Reiner Steib <reinersteib+gmane@imap.cc> writes:
> On Mon, Dec 10 2007, Daniel Pittman wrote:
>
>> Gnus contains, in contrib/, the nnir.el interface to search engines.
>> This is a nice tool and, pleasantly, supports IMAP SEARCH to give me
>> efficient searching of my IMAP mail ... or so I thought.

[...]

> Installed.  Thanks for your contribution.  Sorry for the delay.

No problem.  I still have, on my task list, implementing a generic query
language for nnir, then passing that to the appropriate back-ends, so
/hopefully/ this will not be the last you see of that.

> Could you please suggest an improved variant of my minimal ChangeLog
> entry?
>
> 	* nnir.el (nnir-run-imap): Add doc string.  Use `nnir-imap-make-query'.
> 	(nnir-imap-make-query, nnir-imap-query-to-imap)
> 	(nnir-imap-expr-to-imap, nnir-imap-parse-query, nnir-imap-next-expr)
> 	(nnir-imap-peek-symbol, nnir-imap-next-symbol)
> 	(nnir-imap-delimited-string, nnir-imap-end-of-input): New
> 	functions.

Hrm.  Well, the purpose of the change is:

      * nnir.el (nnir-run-imap): Add doc string.  Use `nnir-imap-make-query'.
 	(nnir-imap-make-query, nnir-imap-query-to-imap)
 	(nnir-imap-expr-to-imap, nnir-imap-parse-query, nnir-imap-next-expr)
 	(nnir-imap-peek-symbol, nnir-imap-next-symbol)
 	(nnir-imap-delimited-string, nnir-imap-end-of-input): 
        New functions.  Implement a query language for IMAP search,
        parse that and compose the back-end query from it.  This allows
        searches with AND, OR and fixed strings, not just a single
        substring.

I don't think that meets the GNU coding standards, though, which I
confess have never made much sense to me for changes like this.[1]

For something in news that summary would definitely fit, or something
adapted from my comments above.

Regards,
        Daniel

Footnotes: 
[1]  I can't see how anything meaningful can be recorded in the style
     requested, and welcome pointers to a better guide.





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-04-14 11:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-10 11:37 NNIR, IMAP SEARCH, and the infinite pain of search terms Daniel Pittman
2007-12-11 10:39 ` Vegard Vesterheim
2008-04-13 14:05 ` Reiner Steib
2008-04-14 11:54   ` Daniel Pittman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).