Gnus development mailing list
 help / color / mirror / Atom feed
* nnir and fast email searches
@ 2004-07-04 22:37 Max Froumentin
  2004-07-04 23:09 ` James Leifer
  0 siblings, 1 reply; 4+ messages in thread
From: Max Froumentin @ 2004-07-04 22:37 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

Hi, 

As my email folders grow bigger than my computer gets faster, I'm
looking for a fast way to search through them. It seems that a simple
solution is to use grepmail with nnir-grepmail, however Nevin Kapur
hasn't been working on nnir-grepmail since 2001 and the code doesn't
work well with the newer nnir.el

So I'm wondering if the idea of using grepmail with gnus has simply
been abandoned for some better nnir backend.

Anyone?

Max.

[-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nnir and fast email searches
  2004-07-04 22:37 nnir and fast email searches Max Froumentin
@ 2004-07-04 23:09 ` James Leifer
  2004-07-05  6:03   ` Niklas Morberg
  0 siblings, 1 reply; 4+ messages in thread
From: James Leifer @ 2004-07-04 23:09 UTC (permalink / raw)


Max Froumentin <max@lapin-bleu.net> writes:

> Hi, 
>
> As my email folders grow bigger than my computer gets faster, I'm
> looking for a fast way to search through them. It seems that a simple

[Here's an edited reply I sent to the list a couple of weeks ago,
reporting good performance with swish++.  It assumes that you have one
message per file, which may well not be your case since you're
interested in grepmail.]

Hi,

I run a nightly cron job that indexes all my mail, which consists of
about 30K messages stored in maildir or mh format (one file per
message).  The indexing is performed by swish++ which is easy to get
and available prepackaged under Debian woody and many other distros.

When I want to search, I populate a directory with symlinks to the
messages swish++ returns.

Advantages: this gives me nearly instantaneous searching (fraction of
a second) through all my mail.

Disadvantages (I'd love to have solved):

* the folder of search results does not have the marks associated with
  the real messages;
 
* marking a message in the search result doesn't propogate the marks
  to the real messages.

* refiling a message in the search results doesn't affect the real
  message

I believe nnir.el can use swish++, though I don't know nnir.el well.
I'be always built the folder myself (with my own scripts).

Cheers,
-James




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nnir and fast email searches
  2004-07-04 23:09 ` James Leifer
@ 2004-07-05  6:03   ` Niklas Morberg
  2004-07-05  7:43     ` James Leifer
  0 siblings, 1 reply; 4+ messages in thread
From: Niklas Morberg @ 2004-07-05  6:03 UTC (permalink / raw)
  Cc: ding

James Leifer <James.Leifer@inria.fr> writes:

> I run a nightly cron job that indexes all my mail, which consists of
> about 30K messages stored in maildir or mh format (one file per
> message).  The indexing is performed by swish++ which is easy to get
> and available prepackaged under Debian woody and many other distros.
>
> When I want to search, I populate a directory with symlinks to the
> messages swish++ returns.

Would you care to share the code you have to achieve this?
Also, how do read the messages from within gnus?

> I believe nnir.el can use swish++, though I don't know nnir.el well.

I tried to get this working, but failed. See [tedious
searching at gmane.org] thread
<URL:http://thread.gmane.org/gmane.emacs.gnus.general/57947>

I started using the agent to fetch messages (I'm using
imap), but that was not as smooth as I wanted (too slow and
somewhat inconsistent behaviour -- I needed to delete the
agent hierarchy and restart a couple of times). Instead I'm
planning on using offlineimap and then index them using a
search engine.

The remaining problem is to integrate the search results
with gnus.

By the way, Zoe <URL:http://www.zoe.nu> is really nice.
Works like a charm and without much hassle. Its main
problems are no integration with gnus and slowness when your
search yields many results.

Niklas




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nnir and fast email searches
  2004-07-05  6:03   ` Niklas Morberg
@ 2004-07-05  7:43     ` James Leifer
  0 siblings, 0 replies; 4+ messages in thread
From: James Leifer @ 2004-07-05  7:43 UTC (permalink / raw)


Niklas Morberg <niklas.morberg@axis.com> writes:

> James Leifer <James.Leifer@inria.fr> writes:
>
>> I run a nightly cron job that indexes all my mail, which consists of
>> about 30K messages stored in maildir or mh format (one file per
>> message).  The indexing is performed by swish++ which is easy to get
>> and available prepackaged under Debian woody and many other distros.
>>
>> When I want to search, I populate a directory with symlinks to the
>> messages swish++ returns.
>
> Would you care to share the code you have to achieve this?
> Also, how do read the messages from within gnus?

Here it is...

*Warning!* This is all something I hacked up in a couple of minutes at
some point: it most likely requires lots of changes to suit your
system.  It certainly is a dirty hack.  If everything blows up, you've
been warned!


In my .newsrc.eld I have a directory group called "g" for "grep" that
always holds the latest search results:

====================================================================
("nndir:~/Mail/g" 3
  () () 
  (nndir "~/Mail/g" 
  (nndir-directory "~/Mail/g")) ((large-newsgroup-initial . 200)
  (expiry-wait . immediate) (display . all) (visible . t))) 
====================================================================

My search script is called gm++ ("grep mail using swish++") looks like
this:

====================================================================
#!/bin/bash
set -e  # quit on the first error
set -u  # quit on undefined variables

GREP_RESULT_DIR=~/Mail/g  #where we put the search results
SWISH_INDEX=~/Mail/xnfsMail/search-index  #the swish++ index
SWISH_RESULTS_RELATIVE=".."  
#path relative to GREP_RESULT_DIR to use when interpreting swish++'s
#response, i.e foo/3 means ~/Mail/foo/3. 

if test $# -eq 0; then
   echo "Usage:"
   echo "gm++ searchphrase ..."
   exit 0
fi

#check that GREP_RESULT_DIR is reasonable
GREP_RESULT_DIR_BASENAME=`basename "$GREP_RESULT_DIR"`
if test z"$GREP_RESULT_DIR_BASENAME" = z; then
  echo "error: GREP_RESULT_DIR empty"
  exit 1
fi

# do the search
matches=`search++ -m 5000 -i "$SWISH_INDEX" "$1"`

# output "# results: 14"
echo "$matches" | grep '#'

# clean and uniquify the matches
matches=`echo "$matches" \
         | grep -v '#' \
         | perl -p -e 's/[^ ]+ ([^ ]+) [^ ]+ [^ ]+/$1\n/g' \
         | sort -u`

# calculate the number of matches
numbermatches=0
for match in $matches; do
  numbermatches=$(( numbermatches + 1))
done

# print the number of unique matches "Matches: 7"
echo Matches: $numbermatches

cd "$GREP_RESULT_DIR"

#danger: it's for the following line we check above GREP_RESULT_DIR
# delete all the old search results
find . -not -name "." -print0 | xargs -0 rm -rf 

# add the new search results, calling the files 1, 2, 3,...
matchcount=1
echo  "Linking:[          ]"
echo -n "         "
numbermatchesdivten=$(( $numbermatches / 10 + 1 ))
dotsprinted=0
for match in $matches; do
  ln -s "$SWISH_RESULTS_RELATIVE"/$match $matchcount &&
  matchcount=$(( $matchcount + 1 ))
  percentagedot=$(( $matchcount % $numbermatchesdivten ))
  # advance the progress bar to show the linking
  if test $percentagedot = 0 ; then
    echo -n "."
    dotsprinted=$(( dotsprinted + 1 ))
  fi
done
while test $dotsprinted -le 9; do
    echo -n "."
    dotsprinted=$(( dotsprinted + 1 ))
done
echo

# resort the search results by date using nnmh's sortm
echo sorting by date
sortm +"$GREP_RESULT_DIR"
====================================================================

A typical invocation is of the form
   gm++ 'juggling tomorrow'
from the shell and results (almost instantly) in the following output;
doing M-g on the nndir:g group in gnus lets me see the messages.

====================================================================
# results: 7
Matches: 4
Linking:[          ]
         ..........
sorting by date
====================================================================

Finally, the indexing is done nightly with:

====================================================================
cd ~/Mail &&

find $FOLDERLIST | index++ --verbosity=2 -i search-index-tmp - &&
mv search-index-tmp search-index
====================================================================

Several points:

* this lets me search across several groups almost instantly; the
  display of each message is also essentially instantaneous in gnus

* swish++ supports incremental indexing: it would be better to do that
  rather than the current nightly complete reindexing;

* the marks in nndir:g are rather meaningless (after one search run
  they apply to the next one, even though the messages have changed
  completely).

All of this needs to be cleaned up and thought through carefully.
If you do so, please share the results with the list.

Best,
-James



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-07-05  7:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-04 22:37 nnir and fast email searches Max Froumentin
2004-07-04 23:09 ` James Leifer
2004-07-05  6:03   ` Niklas Morberg
2004-07-05  7:43     ` James Leifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).