Gnus development mailing list
 help / color / mirror / Atom feed
* `World Class' break through in software dev
@ 2000-12-15  3:21 Harry Putnam
  0 siblings, 0 replies; only message in thread
From: Harry Putnam @ 2000-12-15  3:21 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 2839 bytes --]


Announcing the prerelease of the  AMMMAAAZZZINGG....... `agentsch'

There is a general stirring in the software dev community with this 
prerelease of the AMMMAZZZING first part of the ASSSTOUNDING:

`AgentHelperKit' suite 

This first precursor model, just one of a coming suite, is designed to
help `agent' users search the files they've accumulated under
News/agent.

Please send lots of money to my numbered Swiss Bank accounts.  I will
be publishing addresses throughout usenet for the next two weeks...


On the real side.....

If I don't get horse whipped or horse laughed off the list for posting 
an attachment and a non-lisp one at that.....

I hope someone finds this search tool useful.  I use it constantly
and find it to be very precise and useful.  Very unpolished of course
and has lots of things remaining to be done like some kind of basic
integration into nnir.

This tool was built under bash-2.04 and RedHat Linux 6.2. It may
break on other Machines and shells.

Hope it doesn't blow up and throw files on anyone.
Please send feedback if you do decide to try it out.

The output is formatted in what seems like a generally useful way
making it easy to retrieve messages if so desired.  A few stats are
printed at the end that will be of interest, including start and end
time.

`agentsch' is based on shell scripting and awk with the heavy work
being done by awk.  It is designed to search the numbered messages
under News/agent  or in `nnml' groups.

This tool requires 3 regexp to work and only returns hits if all three
are found in one message. Two are aimed at the headers and one at the
body.

Any users please read the documentation included.  
You can access it by typing `agentsch -H <RET>'

I tried to explain well as I could to make it easy to use.  Its a bit
clunky and uses a tmpfile to avoid the `too many command line args'
problem that arises when thousands of files are searched.  I had a
version that used `xargs' for that but that presented other problems
concerning `awks' use of BEGIN and END clauses.

There is a more sophisticated model underway that will handle up to 6
RE (3 head 3 body) and in any combination.  That will be a very exact
search tool.  No boolean fuzziness, just straight regular expression
searching.

This tool is not fast and relies on no database, but even against as
many as 60,000 messages it isn't all that slow either.

There are two ways to control where the search is directed.  The
obvious one is using the "-d" flag (required) to name the parent
directory searched.  But the tool will find numbered files at any
depth of nesting if you feed it a parent directory.

The other is to use `Xref:' in mail or `Newsgroups:' in news with the
"-t" flag that allows the user to set two custom regexp for the
headers.

To get full usage details type:

`agentsch -H' <RET>


[-- Attachment #2: shell/awk script --]
[-- Type: application/octet-stream, Size: 6105 bytes --]

#!/bin/sh
START=`daten`
RHS="^Subject: "
RHM="^Message-ID: "
RHF="^From: "
STRING=""
REH1=""
REH2=""
REB=""

usage1 () {
cat >&2 <<EOM1

Usage:  $(basename $0) -hHm[no args][-b -B -d -t[args] 
          (-d /path/DIRECTORY is required)

 Type \`$(basename $0) -h' [no args] for a minimal usage statement
 Type \`$(basename $0) -H' [no args] for a detailed usage statement 

NOTES: $(basename $0) creates a \`tmp.FILE' file and later deletes it.
	  One way to avoid having too many arguments searching may files.	
      if you prefer this file be created somewhere besides \`\$PWD' 
	  you can adjust the name and path at lines 113 and 191 and 193.

EOM1
} #end usage1

usage2 () {
cat >&2 <<EOM2
Purpose:
   Search tool for searching Mail and news in numbered files

   $(basename $0) works by searching for 3 regular expressions
   suppled on the command line with the various flags. Along
   with the directorys to search (using the "-d" flag)  There must
   be three regexp supplied unless using the "-b" flag, in which case
   the program supplies the other two.  Two of the RE are expected 
   to be aimed at the headers and one at the body.  Using the "-b" 
   flag the user must supply the argument for "-b"

  -b "bodyRE"
  Using the semi-default command line (-b flag):

 Example1: $(basename $0) -b "bodyRE" -d /path/DIRECTORY 	

  In the above  example the program supplies the header regexp:
	"^Subject: "  and "^From: "
  If any other flags beside the -b and -m are used you must supply 
  All three regexp. 	

  -m (no args allowed)
  Using the "-m" and "-B"  flag causes the program to supply 
  two header RE. Message-ID and Subject.

 Example2: $(basename $0) -m -B "bodyRE" -d DIRECTORY 
			(uppercase -B and arg  required)

  In the above example the "-m" flag will cause the program to 
  supply "^Subject: " and "^Message-ID: " (note the uppercase -B)

  -t 
  This flag is intended to be where the user is able to construct 
  two custom RE aimed at the header section.	
  Using the "-t" flag the user must supply two RE separated by a 
  percent sign (%).  The program splits them and feeds them to awk.
  Make sure to use double quotes at the begining and end.

 Example: 
 $(basename $0) -h "HeaderRE1%HeaderRE2" -B"bodyRE" -d DIRECTORY 	

 Real
 Examples:
 $(basename $0) -b "syslog-ng" -d /path/to/archive
 $(basename $0) -m -B"syslog-ng" -d  /path/to/archive
 $(basename $0)\
  -t"^Subject:.*security%^Newsgroups: comp.unix.questions" \
  -B"syslog-ng" -d /path/to/archives

	IMPORTANT NOTE:  Don\'t forget the "%" sign when using the "-t"
					 option. Don\'t forget the "-d" to designate the 
					 directory name. \(see full usage for details\)

Output:  The formatted output contains one long line at the top of 
each "-- " delimited group that is the full path and file name.
The next two lines contain the last component of file name and the hits
on the Header regexp
Line 3 and any more lines up to the "-- " contain the last component of of 
filename|linenumber| and hit on the Body regexp.

The end of ouptut contains a reiteration of the regexp used, and some
data on how many files were searched and how long it took. 

EOM2
} #end usage

	[ -z "$1" ] && {
	usage1
	exit 1
}
    while getopts "hHmb:d:t:" opt; do
    case $opt in
## Message-ID and From hard wired
	m) REH1="$RHS"  REH2="$RHM"               ;;
## Default Subject  and From hardwired
	b) REH1="$RHS" REH2="$RHF" REB="$OPTARG"  ;;
## -B flag required with -m or -t
	B) REB="$OPTARG"                          ;;
## -t Must be used with  -B  
	t) STRING="$OPTARG"                       ;;
## -d passes the root directory to search
	d) if [ -d "$OPTARG" ] ;then 
		DIRECTORY="$OPTARG"
	   else
	    echo -e "\n$OPTARG.. No such directory\n"
	   fi                                     ;;	
     h) usage1
		exit                                  ;;
	 H) usage1
		usage2
		exit                                  ;;
     *) usage1
		  exit 1                              ;;  

esac
done
find $DIRECTORY -type f -name '[0-9]*' >> FILES.tmp

awk 'BEGIN {
        list_of_files = ARGV[1]
        delete ARGV[1]
        ARGC = 1

        while ((getline fname < list_of_files) > 0) {
                ARGV[ARGC] = fname
                ARGC++
        }
}
{
	if ("'"$STRING"'") {
	   split("'"$STRING"'", HR,"%")

	       RE1_p = HR[1]
	       RE2_p = HR[2]  
  	   	   RE3_p = '"\"$REB\""' 

	  }else {
 	RE1_p =  '"\"$REH1\""'  
 	RE2_p = '"\"$REH2\""' 
 	RE3_p = '"\"$REB\""' 
  }

}
{
          if   (FNR == 1 && RE3_l) {
             print "-- \n"
                   a = ""
				 }
       
         if (FNR == 1 ) {
              in_header  = 1
              in_matched = 0
              RE1_l  = ""
              RE2_l  = ""
              RE3_l  = ""
        
         }else if (in_header && /^$/) {
              in_header  = 0
              if (RE1_l && RE2_l) {
                  in_matched = 1
              }
          }
          if (in_header) {  # in header section
              if ($0 ~ RE1_p) {
                  RE1_l = $0
              } else if ($0 ~ RE2_p) {
                  RE2_l = $0
              }
          } else if (in_matched) {  # in body of matched message
              if ($0 ~ RE3_p) {
                  if (! RE3_l) {
                       a = 1  
    			filename2=FILENAME      
    			        sub(/^.*\//,"",filename2)
    				    print  FILENAME
                      print filename2"|" RE1_l
                      print filename2"|" RE2_l
    		        }
                  RE3_l = $0
                  print filename2"|"FNR"|" RE3_l
#                  filename2 = ""
              }
          } else {  # in body of unmatched message
                nextfile  # GNU awk extension to skip to next file
          }
}
END { 
      if (RE_l) {
         print "-- "
      }else {
		  print "Regular expressions used:" 
		  print "Header1 = "RE1_p"\nHeader2 = "RE2_p"\nBody    = "RE3_p 
		  print "Searched: \t"ARGC" files\t"  NR" lines"
		  print "under directory"'"\" $DIRECTORY"\"'}
}' FILES.tmp

rm -f FILES.tmp
FINISH=`daten`
echo "  Finish = $FINISH"
echo "  Start  = $START"

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2000-12-15  3:21 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-15  3:21 `World Class' break through in software dev Harry Putnam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).