Gnus development mailing list
 help / color / mirror / Atom feed
From: Harry Putnam <reader@newsguy.com>
Cc: ding@gnus.org
Subject: Seeking suggestions for pet project
Date: 07 Oct 2000 07:09:47 -0700	[thread overview]
Message-ID: <m2g0m8lp3o.fsf_-_@pgnus-5.8.8-cvs.now.playing> (raw)
In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "03 Oct 2000 15:53:26 +0200"

[cc'ed to ding list]

Kai,
I've written a shell/awk script search engine that one might call a
targetted multi grep type search.  As you know I'm not very experience
in programming so it is quite unsophisticated.  What sophistication it
has is mainly due to help from posters on comp.lang.awk.

And with your knowledge of `information retrieval' you'll probably
think this flys in the face of current IR wisdom.  Even so, I find
this approach to be more exacting and produces better results than
other techniques I've tried.  It does require some familiarity with
regexp but not necessarily thorough knowledge. 

The code is aimed at `one message per file' mail or `usenet' messages,
so works well against nnml or nntp messages.

Currently the code is too cumbersom and inflexable for popular usage
so am planning two major revamps.

Currently the user is required to give three RE and a filename on the
command line, like (line wrapped for clarity):

`search.sh "^From: someone" "^Newsgroups: comp\.lang\.awk" \
 "if?\\\(" /home/reader/news-archive'

Might find a bunch of examples of author `someone' using `if' clauses in awk.

$1 $2 $3 are used to hardwire the RE and nail down there target. $4 is
used to nail down the files to search.

All three RE must be present to produce a hit.  The script expects
the first two to be aimed at the headers and will only find them
there.  The third is expected to be in the body and will only be found there.

The script is a small state machine that knows were it is in a message
and the state of hits on the RE so far.

I've stripped the lengthy documentation comments from the code below,
but for further `in depth' commented documentation please go here:

http://www.ptw.com/~reader/exp/awk-search.html

But be warned that I have made multiple revisions in the script and
the comments may be inaccurate or wrong in places, since I haven't
cleaned that part up thoroughly yet.

The procedure described can be tedious so I am working on revisions
that will query the user for `Header RE' and "body RE"  and will work
with any combination up to 2 header and 2 body.  Including no header
but 1 body, no body, but 1 header and anything in between.

So the command line would only be `search.sh filenames' and the script
would query the user for the rest.

If you have time to try this script out you'll see that the output has
all the ingredients necessary for gnus to display the results in an
nnir buffer.  Once I get the script cleaned up and revamped, I'll
probably want to add it as an nnir engine.

In addition to having gnus display the found messages, I'd really
like to figure out how to include the grep style hits (single line
containing RE) in a separate buffer, in the style of `M-x occur' or
`M-x grep' output.  My feeling is that often the single line is all
thats needed.  So having that information in a separate buffer with
hypertext to the files would be a very nice addition I think.



             reply	other threads:[~2000-10-07 14:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-10-07 14:09 Harry Putnam [this message]
2000-10-08 13:01 ` Harry Putnam
2000-10-08 21:36 ` Kai Großjohann
2000-10-09  2:20   ` Harry Putnam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2g0m8lp3o.fsf_-_@pgnus-5.8.8-cvs.now.playing \
    --to=reader@newsguy.com \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).