Gnus development mailing list
 help / color / mirror / Atom feed
* searching multiple groups from the Group buffer
@ 2001-02-17 21:00 Colin Walters
  2001-02-17 21:53 ` Kai Großjohann
  0 siblings, 1 reply; 6+ messages in thread
From: Colin Walters @ 2001-02-17 21:00 UTC (permalink / raw)


Is there a way I can search for a regexp in multiple groups?  For
example, I archive my mail in nnfolder archives by month, like
mail-2001-02, mail-2001-01, etc.

I tried marking those groups, and doing 'M-& RET', which did visit
them all, but I wasn't quite sure how to go about actually searching
all of them from there.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching multiple groups from the Group buffer
  2001-02-17 21:00 searching multiple groups from the Group buffer Colin Walters
@ 2001-02-17 21:53 ` Kai Großjohann
  2001-02-20  1:11   ` Harry Putnam
  0 siblings, 1 reply; 6+ messages in thread
From: Kai Großjohann @ 2001-02-17 21:53 UTC (permalink / raw)


For searching multiple groups, you can create an nnvirtual group that
contains them all, then search normally.  Or you create an nnkiboze
group which is similar to nnvirtual, but has the searching built-in.

And then, there's always nnir.el which requires a search engine.
Several people have volunteered to make it work with find/grep, but so
far no go.  However, Glimpse is sufficiently simple to set up so it
might do the trick for you.

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching multiple groups from the Group buffer
  2001-02-17 21:53 ` Kai Großjohann
@ 2001-02-20  1:11   ` Harry Putnam
  2001-02-20  9:40     ` Kai Großjohann
  2001-02-20 17:52     ` Harry Putnam
  0 siblings, 2 replies; 6+ messages in thread
From: Harry Putnam @ 2001-02-20  1:11 UTC (permalink / raw)



Colin W. asks:
>>	Is there a way I can search for a regexp in multiple groups?  For
>>	xample, I archive my mail in nnfolder archives by month, like
>>	xmail-2001-02, mail-2001-01, etc.


Kai Answers:
> For searching multiple groups, you can create an nnvirtual group that
> contains them all, then search normally.  Or you create an nnkiboze
> group which is similar to nnvirtual, but has the searching built-in.
 
> And then, there's always nnir.el which requires a search engine.
> Several people have volunteered to make it work with find/grep, but so
> far no go.  However, Glimpse is sufficiently simple to set up so it
> might do the trick for you.

Not sure about nnkiboze but the nnvirtual technique is the only one
mentioned here that really uses regexp.  From what I know, nnkiboze
isn't really working well either.

None of the currently supported search engines for nnir.el support
real regexp, far as I know. ( Is that correct Kai?)  Glimpse comes
closest, but doesn't allow full regexp either.

FreeWAIS is really a pain to setup and get working on linux.  I
finally got it working on my Freebsd box but it was such a pain (as
Kai will remember.. since I pestered him to death with it) I wouldn't
wish it on anybody.  It is very fast though. It is based on regexp in
the *.fmt file but the command lines are not.  Does not seem to be
near as robust as the glimpse software.

Colin, 

I've written a small shell/awk script that I use a lot, but kind of
stoped improving it once I got it working.  It is *NOT* integrated
into gnus or nnir as yet since I'm working on a more elaborate version
(sporadically).

To view

http://www.ptw.com/~reader/exp/search2.html

To download

http://www.ptw.com/~reader/exp/download/search2.tar.gz

It requires 3 regexp but has a default mode where you supply only one to
find in the body of messages.  Two regexp are searched for in headers,
one in body.  In the default mode  `From: ' and `Subject: ' are supplied by
the program and user supplies the body regexp on the command line with
`-b' flag.  Along with the directories to search with a `-d' flag.

NOTE: WARNING The script includes what documentation there is in the
form of `here documents' and comments.  Thes script may be somewhat
unconventional since I'm not really very experienced as a shell/awk
programmer.

The script knows when its in headers and when not and returns the hits
accordingly.  There must be 3 hits for anything to be returned.

A command line would look like:

 $ search2.sh -b "marking.*M-&" -d ~/Mail/prinb

  ( This will find your message in my  incoming pre-view group and
    return the regexp line plus `From' and `Subject' headers)

 (output would look like)

	/home/reader/Mail/prinb/30271
	30271|Subject: searching multiple groups from the Group buffer
	30271|From: Colin Walters <walters@cis.ohio-state.edu>
	30271|50|I tried marking those groups, and doing 'M-& RET', which did visit
	-- 

First section above shows file name where the hit occured and line
numbers of the actual hits, for quick viewing with command line tools
like less or vi.  Had there been more instances of the regexp in one
file they would be listed too.

If more files contain the hits then they are listed in the same way
with a separating `-- '.
	
	Regular expressions used:
	Header1 = ^Subject: 
	Header2 = ^From: 
	Body    = marking.*M-&
	Searched:    60 files 4803 lines
	under directory /home/reader/Mail/prinb
	  Finish = Feb 17 16:16:47 2001
	  Start  = Feb 17 16:16:47 2001


A final `stats' output (above) is displayed that may be of interest or
helpfull in adjusting the regexps.

I sometimes use a filter which writes copies of the files with hits to
a spool style file that gnus can access with nndoc (or mutt with `-f'
flag)  If the search seems to merit it.

   http://www.ptw.com/~reader/exp/download/nnml-nntp2mbox.sh

A very crude way to allow gnus tools to come into the foray.

I use the filter (nnml-nntp2mbox.sh) on the command line like:

   nnml-nntp2mbox.sh `search2.sh -b  "marking.*M-&" -d ~/Mail/prinb\
   |egrep '^/[^/]+/'` >FILE

Adjust the egrep regexp as needed.  This should generate a spool
(mbox) file that gnus can open with nndoc or mutt with `mutt -f FILE'.

NOTE -- about shabby filter 
No guarantee that the filter will handle all nntp messages.  Some that
are produced by list to news gateways have abnormal first lines.

I would like to get some feed back from an experienced person if you
are willing to try this home made script out.  It should work OK on a
linux box with `gawk' installed but not sure about other OSs with
different  `awks' (nawk mawk tawk etc etc)


NOTE:  A slick awk (manipulating ARGV )technique supplied by Arnold
       Robbins (of awk fame) allows just about any number of
       files to be searched without resorting to xargs.

That does not apply to the `filter' which passes the files on the
command line via "$@"  which can run into the `too many args' problem
but only if there are really a lot of hits.

Please try this out and give feed back if you have time.  At some
point after I get a newer better search.sh written I want to integrate
this into nnir as one of the engines.






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching multiple groups from the Group buffer
  2001-02-20  1:11   ` Harry Putnam
@ 2001-02-20  9:40     ` Kai Großjohann
  2001-02-20 14:53       ` Harry Putnam
  2001-02-20 17:52     ` Harry Putnam
  1 sibling, 1 reply; 6+ messages in thread
From: Kai Großjohann @ 2001-02-20  9:40 UTC (permalink / raw)
  Cc: ding

On 19 Feb 2001, Harry Putnam wrote:

> None of the currently supported search engines for nnir.el support
> real regexp, far as I know. ( Is that correct Kai?)  Glimpse comes
> closest, but doesn't allow full regexp either.

I thought Glimpse did support full regexes.  But I might be wrong.

But it should be fairly easy to copy the Glimpse code into a new
function and change the Glimpse call to something like this:

    cd ~/Mail; find . -type f -print | xargs egrep REGEX

This should produce a list of matched files, in a format similar to
the Glimpse output format, so the Glimpse parsing code could probably
be reused.

I was too lazy to try this myself.  Also, I'm afraid of the time this
find command takes.

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching multiple groups from the Group buffer
  2001-02-20  9:40     ` Kai Großjohann
@ 2001-02-20 14:53       ` Harry Putnam
  0 siblings, 0 replies; 6+ messages in thread
From: Harry Putnam @ 2001-02-20 14:53 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> > None of the currently supported search engines for nnir.el support
> > real regexp, far as I know. ( Is that correct Kai?)  Glimpse comes
> > closest, but doesn't allow full regexp either.
> 
> I thought Glimpse did support full regexes.  But I might be wrong.

>From `man glimpse'

[...]
              . . . . . Regular expressions are currently limited
              to approximately 30 characters (generally excluding
              meta  characters).   Some  options (-d, -w, -t, -x,
              -D, -I, -S) do  not  currently  work  with  regular
              expressions.  The maximal number of errors for reg­
              ular expressions that use '*' or  '|'  is  4.  (See
              LIMITATIONS.)
[...]

The regexp `From:.*(Kai\.Gross|Kleinp|ShengHuo)'

	 $ glimpse-m -F ding3 'From:.*(Kai\.Gross|Kleinp|ShengHuo)'
	glimpse: regular expression too long
	glimpse: error in options or arguments to `agrep'

By Shortening ShengHuo to ShengH, it will work but this gives an idea
of a major limitation.  Regexp can become quite long very easily.

This is one of the easiest to run into but there are more.

Regexp can't be used with the boolean  operators at all.. like ; (and). 

One place where a regexp would be handy is with the -d operator, used
to set a  record separater.  Its set to newline by default.

[...]
       -d 'delim'
              Define  delim  to  be  the  separator  between  two
              records.  The default value is '$', namely a record
              is  by  default  a  line.  delim can be a string of
              size at most 8 (with possible use of ^ and $),  but
              not   a   regular  expression.   Text  between  two
              ....
[...]



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: searching multiple groups from the Group buffer
  2001-02-20  1:11   ` Harry Putnam
  2001-02-20  9:40     ` Kai Großjohann
@ 2001-02-20 17:52     ` Harry Putnam
  1 sibling, 0 replies; 6+ messages in thread
From: Harry Putnam @ 2001-02-20 17:52 UTC (permalink / raw)


Harry Putnam <reader@newsguy.com> writes:

> 
>    http://www.ptw.com/~reader/exp/download/nnml-nntp2mbox.sh

There is a little more robust version at the site above now.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-02-20 17:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-17 21:00 searching multiple groups from the Group buffer Colin Walters
2001-02-17 21:53 ` Kai Großjohann
2001-02-20  1:11   ` Harry Putnam
2001-02-20  9:40     ` Kai Großjohann
2001-02-20 14:53       ` Harry Putnam
2001-02-20 17:52     ` Harry Putnam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).