Gnus development mailing list
 help / color / mirror / Atom feed
* Seeking suggestions for pet project
@ 2000-10-07 14:09 Harry Putnam
  2000-10-08 13:01 ` Harry Putnam
  2000-10-08 21:36 ` Kai Großjohann
  0 siblings, 2 replies; 4+ messages in thread
From: Harry Putnam @ 2000-10-07 14:09 UTC (permalink / raw)
  Cc: ding

[cc'ed to ding list]

Kai,
I've written a shell/awk script search engine that one might call a
targetted multi grep type search.  As you know I'm not very experience
in programming so it is quite unsophisticated.  What sophistication it
has is mainly due to help from posters on comp.lang.awk.

And with your knowledge of `information retrieval' you'll probably
think this flys in the face of current IR wisdom.  Even so, I find
this approach to be more exacting and produces better results than
other techniques I've tried.  It does require some familiarity with
regexp but not necessarily thorough knowledge. 

The code is aimed at `one message per file' mail or `usenet' messages,
so works well against nnml or nntp messages.

Currently the code is too cumbersom and inflexable for popular usage
so am planning two major revamps.

Currently the user is required to give three RE and a filename on the
command line, like (line wrapped for clarity):

`search.sh "^From: someone" "^Newsgroups: comp\.lang\.awk" \
 "if?\\\(" /home/reader/news-archive'

Might find a bunch of examples of author `someone' using `if' clauses in awk.

$1 $2 $3 are used to hardwire the RE and nail down there target. $4 is
used to nail down the files to search.

All three RE must be present to produce a hit.  The script expects
the first two to be aimed at the headers and will only find them
there.  The third is expected to be in the body and will only be found there.

The script is a small state machine that knows were it is in a message
and the state of hits on the RE so far.

I've stripped the lengthy documentation comments from the code below,
but for further `in depth' commented documentation please go here:

http://www.ptw.com/~reader/exp/awk-search.html

But be warned that I have made multiple revisions in the script and
the comments may be inaccurate or wrong in places, since I haven't
cleaned that part up thoroughly yet.

The procedure described can be tedious so I am working on revisions
that will query the user for `Header RE' and "body RE"  and will work
with any combination up to 2 header and 2 body.  Including no header
but 1 body, no body, but 1 header and anything in between.

So the command line would only be `search.sh filenames' and the script
would query the user for the rest.

If you have time to try this script out you'll see that the output has
all the ingredients necessary for gnus to display the results in an
nnir buffer.  Once I get the script cleaned up and revamped, I'll
probably want to add it as an nnir engine.

In addition to having gnus display the found messages, I'd really
like to figure out how to include the grep style hits (single line
containing RE) in a separate buffer, in the style of `M-x occur' or
`M-x grep' output.  My feeling is that often the single line is all
thats needed.  So having that information in a separate buffer with
hypertext to the files would be a very nice addition I think.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Seeking suggestions for pet project
  2000-10-07 14:09 Seeking suggestions for pet project Harry Putnam
@ 2000-10-08 13:01 ` Harry Putnam
  2000-10-08 21:36 ` Kai Großjohann
  1 sibling, 0 replies; 4+ messages in thread
From: Harry Putnam @ 2000-10-08 13:01 UTC (permalink / raw)


Harry Putnam <reader@newsguy.com> writes:

> [cc'ed to ding list]
> 
> Kai,
> I've written a shell/awk script search engine that one might call a
> targetted multi grep type search.  As you know I'm not very experience

Apparently I left out the stripped down script:

find $4 -type f -name '[0-9]*'|xargs  awk '{ # set values from the command line
 
     RE1_p = '"\"$1\""' 
     RE2_p = '"\"$2\""' 
     RE3_p = '"\"$3\""' 
   }

   {
      if (FNR == 1) {
          in_header  = 1
          in_matched = 0
          RE1_l  = ""
          RE2_l  = ""
          RE3_l  = ""
      } else if (in_header && /^$/) {
          in_header  = 0
          if (RE1_l && RE2_l) {
              in_matched = 1
          }
      }
      if (in_header) {  # in header section
          if ($0 ~ RE1_p) {
              RE1_l = $0
          } else if ($0 ~ RE2_p) {
              RE2_l = $0
          }
      } else if (in_matched) {  # in body of matched message
          if ($0 ~ RE3_p) {
              if (! RE3_l) {
                        filename2=FILENAME      
                                sub(/^.*\//,"",filename2)
                                    print "-- \n" FILENAME
                  print filename2"|" RE1_l
                  print filename2"|" RE2_l
                        }
              RE3_l = $0
              print filename2"|"FNR"|" RE3_l
          }
      } else {  # in body of unmatched message
            nextfile  # GNU awk extension to skip to next file
      }
  }'
#    END {
#    print "RE1="RE1_p "\nRE2="RE2_p"\nRE3="RE3_p
#}'
echo "RE1=$1"
echo "RE2=$2"
echo "RE3=$3"




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Seeking suggestions for pet project
  2000-10-07 14:09 Seeking suggestions for pet project Harry Putnam
  2000-10-08 13:01 ` Harry Putnam
@ 2000-10-08 21:36 ` Kai Großjohann
  2000-10-09  2:20   ` Harry Putnam
  1 sibling, 1 reply; 4+ messages in thread
From: Kai Großjohann @ 2000-10-08 21:36 UTC (permalink / raw)
  Cc: ding

On 07 Oct 2000, Harry Putnam wrote:

> In addition to having gnus display the found messages, I'd really
> like to figure out how to include the grep style hits (single line
> containing RE) in a separate buffer, in the style of `M-x occur' or
> `M-x grep' output.  My feeling is that often the single line is all
> thats needed.  So having that information in a separate buffer with
> hypertext to the files would be a very nice addition I think.

Your description of the awk script sounds cool.  I haven't tried it
yet, though.  I haven't made up my mind how to display the list of
occurrences.  It's a good idea to do that, but the user interface
isn't clear to me.  Hm.

One approach would be to just have the summary buffer contain
different information -- the matched lines.  That ought to be fairly
easy to do.  But this way, we can't have both the normal list of
messages and the list of occurrences.  Hm.

When you need help integrating your script with nnir.el, please
holler.

kai
-- 
I like BOTH kinds of music.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Seeking suggestions for pet project
  2000-10-08 21:36 ` Kai Großjohann
@ 2000-10-09  2:20   ` Harry Putnam
  0 siblings, 0 replies; 4+ messages in thread
From: Harry Putnam @ 2000-10-09  2:20 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Your description of the awk script sounds cool.  I haven't tried it
> yet, though.  I haven't made up my mind how to display the list of
> occurrences.  It's a good idea to do that, but the user interface
> isn't clear to me.  Hm.

What I keep thinking is that it could be very similar to the interface
involved in `M-x grep'.  A buffer pops up containg the hits with
hypertext links leading to that hit in the actual file.  It can't
really get much better than that.  And like you say that might be
enough. 

However that misses out on the very nice functions of nnir where the
whole article is available and the ability to transport to the
thread.  Putting those two together would really by dynomite.

I don't know enough to look at the code for M-x grep, and see the
mechanics but I'm thinking for this search engines purposes that at
the same time nnir is generating the summary buffer, a very similar
(read identicle) buffer to the one created with M-x grep could be
generated but no popup.  The user would call it forth when desired
with C-x b.  Other wise it would die a natural death when the
ephemeral nnir buffer is closed.

Another possibility would be to somehow hylite the hits within the
nnir generated messages.  Not as usefull as the first approach though.

I'm probably getting carried away here but I'm also thinking how this
search engine could be used to pull other kinds of files into a fake
summary buffer.  Something similar to the way one can `import' any
text file into a gnus group by letting gnus generate headers for it,
and view it as a message. (gnus-summary-import-article).  Only
into an ephemeral nnir summary buffer.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2000-10-09  2:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-10-07 14:09 Seeking suggestions for pet project Harry Putnam
2000-10-08 13:01 ` Harry Putnam
2000-10-08 21:36 ` Kai Großjohann
2000-10-09  2:20   ` Harry Putnam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).