Gnus development mailing list
 help / color / mirror / Atom feed
* Seeking suggestions for pet project
@ 2000-10-07 14:09 Harry Putnam
  2000-10-08 13:01 ` Harry Putnam
  2000-10-08 21:36 ` Kai Großjohann
  0 siblings, 2 replies; 4+ messages in thread
From: Harry Putnam @ 2000-10-07 14:09 UTC (permalink / raw)
  Cc: ding

[cc'ed to ding list]

Kai,
I've written a shell/awk script search engine that one might call a
targetted multi grep type search.  As you know I'm not very experience
in programming so it is quite unsophisticated.  What sophistication it
has is mainly due to help from posters on comp.lang.awk.

And with your knowledge of `information retrieval' you'll probably
think this flys in the face of current IR wisdom.  Even so, I find
this approach to be more exacting and produces better results than
other techniques I've tried.  It does require some familiarity with
regexp but not necessarily thorough knowledge. 

The code is aimed at `one message per file' mail or `usenet' messages,
so works well against nnml or nntp messages.

Currently the code is too cumbersom and inflexable for popular usage
so am planning two major revamps.

Currently the user is required to give three RE and a filename on the
command line, like (line wrapped for clarity):

`search.sh "^From: someone" "^Newsgroups: comp\.lang\.awk" \
 "if?\\\(" /home/reader/news-archive'

Might find a bunch of examples of author `someone' using `if' clauses in awk.

$1 $2 $3 are used to hardwire the RE and nail down there target. $4 is
used to nail down the files to search.

All three RE must be present to produce a hit.  The script expects
the first two to be aimed at the headers and will only find them
there.  The third is expected to be in the body and will only be found there.

The script is a small state machine that knows were it is in a message
and the state of hits on the RE so far.

I've stripped the lengthy documentation comments from the code below,
but for further `in depth' commented documentation please go here:

http://www.ptw.com/~reader/exp/awk-search.html

But be warned that I have made multiple revisions in the script and
the comments may be inaccurate or wrong in places, since I haven't
cleaned that part up thoroughly yet.

The procedure described can be tedious so I am working on revisions
that will query the user for `Header RE' and "body RE"  and will work
with any combination up to 2 header and 2 body.  Including no header
but 1 body, no body, but 1 header and anything in between.

So the command line would only be `search.sh filenames' and the script
would query the user for the rest.

If you have time to try this script out you'll see that the output has
all the ingredients necessary for gnus to display the results in an
nnir buffer.  Once I get the script cleaned up and revamped, I'll
probably want to add it as an nnir engine.

In addition to having gnus display the found messages, I'd really
like to figure out how to include the grep style hits (single line
containing RE) in a separate buffer, in the style of `M-x occur' or
`M-x grep' output.  My feeling is that often the single line is all
thats needed.  So having that information in a separate buffer with
hypertext to the files would be a very nice addition I think.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2000-10-09  2:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-10-07 14:09 Seeking suggestions for pet project Harry Putnam
2000-10-08 13:01 ` Harry Putnam
2000-10-08 21:36 ` Kai Großjohann
2000-10-09  2:20   ` Harry Putnam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).