* `World Class' break through in software dev
@ 2000-12-15 3:21 Harry Putnam
0 siblings, 0 replies; only message in thread
From: Harry Putnam @ 2000-12-15 3:21 UTC (permalink / raw)
[-- Attachment #1: Type: text/plain, Size: 2839 bytes --]
Announcing the prerelease of the AMMMAAAZZZINGG....... `agentsch'
There is a general stirring in the software dev community with this
prerelease of the AMMMAZZZING first part of the ASSSTOUNDING:
`AgentHelperKit' suite
This first precursor model, just one of a coming suite, is designed to
help `agent' users search the files they've accumulated under
News/agent.
Please send lots of money to my numbered Swiss Bank accounts. I will
be publishing addresses throughout usenet for the next two weeks...
On the real side.....
If I don't get horse whipped or horse laughed off the list for posting
an attachment and a non-lisp one at that.....
I hope someone finds this search tool useful. I use it constantly
and find it to be very precise and useful. Very unpolished of course
and has lots of things remaining to be done like some kind of basic
integration into nnir.
This tool was built under bash-2.04 and RedHat Linux 6.2. It may
break on other Machines and shells.
Hope it doesn't blow up and throw files on anyone.
Please send feedback if you do decide to try it out.
The output is formatted in what seems like a generally useful way
making it easy to retrieve messages if so desired. A few stats are
printed at the end that will be of interest, including start and end
time.
`agentsch' is based on shell scripting and awk with the heavy work
being done by awk. It is designed to search the numbered messages
under News/agent or in `nnml' groups.
This tool requires 3 regexp to work and only returns hits if all three
are found in one message. Two are aimed at the headers and one at the
body.
Any users please read the documentation included.
You can access it by typing `agentsch -H <RET>'
I tried to explain well as I could to make it easy to use. Its a bit
clunky and uses a tmpfile to avoid the `too many command line args'
problem that arises when thousands of files are searched. I had a
version that used `xargs' for that but that presented other problems
concerning `awks' use of BEGIN and END clauses.
There is a more sophisticated model underway that will handle up to 6
RE (3 head 3 body) and in any combination. That will be a very exact
search tool. No boolean fuzziness, just straight regular expression
searching.
This tool is not fast and relies on no database, but even against as
many as 60,000 messages it isn't all that slow either.
There are two ways to control where the search is directed. The
obvious one is using the "-d" flag (required) to name the parent
directory searched. But the tool will find numbered files at any
depth of nesting if you feed it a parent directory.
The other is to use `Xref:' in mail or `Newsgroups:' in news with the
"-t" flag that allows the user to set two custom regexp for the
headers.
To get full usage details type:
`agentsch -H' <RET>
[-- Attachment #2: shell/awk script --]
[-- Type: application/octet-stream, Size: 6105 bytes --]
#!/bin/sh
START=`daten`
RHS="^Subject: "
RHM="^Message-ID: "
RHF="^From: "
STRING=""
REH1=""
REH2=""
REB=""
usage1 () {
cat >&2 <<EOM1
Usage: $(basename $0) -hHm[no args][-b -B -d -t[args]
(-d /path/DIRECTORY is required)
Type \`$(basename $0) -h' [no args] for a minimal usage statement
Type \`$(basename $0) -H' [no args] for a detailed usage statement
NOTES: $(basename $0) creates a \`tmp.FILE' file and later deletes it.
One way to avoid having too many arguments searching may files.
if you prefer this file be created somewhere besides \`\$PWD'
you can adjust the name and path at lines 113 and 191 and 193.
EOM1
} #end usage1
usage2 () {
cat >&2 <<EOM2
Purpose:
Search tool for searching Mail and news in numbered files
$(basename $0) works by searching for 3 regular expressions
suppled on the command line with the various flags. Along
with the directorys to search (using the "-d" flag) There must
be three regexp supplied unless using the "-b" flag, in which case
the program supplies the other two. Two of the RE are expected
to be aimed at the headers and one at the body. Using the "-b"
flag the user must supply the argument for "-b"
-b "bodyRE"
Using the semi-default command line (-b flag):
Example1: $(basename $0) -b "bodyRE" -d /path/DIRECTORY
In the above example the program supplies the header regexp:
"^Subject: " and "^From: "
If any other flags beside the -b and -m are used you must supply
All three regexp.
-m (no args allowed)
Using the "-m" and "-B" flag causes the program to supply
two header RE. Message-ID and Subject.
Example2: $(basename $0) -m -B "bodyRE" -d DIRECTORY
(uppercase -B and arg required)
In the above example the "-m" flag will cause the program to
supply "^Subject: " and "^Message-ID: " (note the uppercase -B)
-t
This flag is intended to be where the user is able to construct
two custom RE aimed at the header section.
Using the "-t" flag the user must supply two RE separated by a
percent sign (%). The program splits them and feeds them to awk.
Make sure to use double quotes at the begining and end.
Example:
$(basename $0) -h "HeaderRE1%HeaderRE2" -B"bodyRE" -d DIRECTORY
Real
Examples:
$(basename $0) -b "syslog-ng" -d /path/to/archive
$(basename $0) -m -B"syslog-ng" -d /path/to/archive
$(basename $0)\
-t"^Subject:.*security%^Newsgroups: comp.unix.questions" \
-B"syslog-ng" -d /path/to/archives
IMPORTANT NOTE: Don\'t forget the "%" sign when using the "-t"
option. Don\'t forget the "-d" to designate the
directory name. \(see full usage for details\)
Output: The formatted output contains one long line at the top of
each "-- " delimited group that is the full path and file name.
The next two lines contain the last component of file name and the hits
on the Header regexp
Line 3 and any more lines up to the "-- " contain the last component of of
filename|linenumber| and hit on the Body regexp.
The end of ouptut contains a reiteration of the regexp used, and some
data on how many files were searched and how long it took.
EOM2
} #end usage
[ -z "$1" ] && {
usage1
exit 1
}
while getopts "hHmb:d:t:" opt; do
case $opt in
## Message-ID and From hard wired
m) REH1="$RHS" REH2="$RHM" ;;
## Default Subject and From hardwired
b) REH1="$RHS" REH2="$RHF" REB="$OPTARG" ;;
## -B flag required with -m or -t
B) REB="$OPTARG" ;;
## -t Must be used with -B
t) STRING="$OPTARG" ;;
## -d passes the root directory to search
d) if [ -d "$OPTARG" ] ;then
DIRECTORY="$OPTARG"
else
echo -e "\n$OPTARG.. No such directory\n"
fi ;;
h) usage1
exit ;;
H) usage1
usage2
exit ;;
*) usage1
exit 1 ;;
esac
done
find $DIRECTORY -type f -name '[0-9]*' >> FILES.tmp
awk 'BEGIN {
list_of_files = ARGV[1]
delete ARGV[1]
ARGC = 1
while ((getline fname < list_of_files) > 0) {
ARGV[ARGC] = fname
ARGC++
}
}
{
if ("'"$STRING"'") {
split("'"$STRING"'", HR,"%")
RE1_p = HR[1]
RE2_p = HR[2]
RE3_p = '"\"$REB\""'
}else {
RE1_p = '"\"$REH1\""'
RE2_p = '"\"$REH2\""'
RE3_p = '"\"$REB\""'
}
}
{
if (FNR == 1 && RE3_l) {
print "-- \n"
a = ""
}
if (FNR == 1 ) {
in_header = 1
in_matched = 0
RE1_l = ""
RE2_l = ""
RE3_l = ""
}else if (in_header && /^$/) {
in_header = 0
if (RE1_l && RE2_l) {
in_matched = 1
}
}
if (in_header) { # in header section
if ($0 ~ RE1_p) {
RE1_l = $0
} else if ($0 ~ RE2_p) {
RE2_l = $0
}
} else if (in_matched) { # in body of matched message
if ($0 ~ RE3_p) {
if (! RE3_l) {
a = 1
filename2=FILENAME
sub(/^.*\//,"",filename2)
print FILENAME
print filename2"|" RE1_l
print filename2"|" RE2_l
}
RE3_l = $0
print filename2"|"FNR"|" RE3_l
# filename2 = ""
}
} else { # in body of unmatched message
nextfile # GNU awk extension to skip to next file
}
}
END {
if (RE_l) {
print "-- "
}else {
print "Regular expressions used:"
print "Header1 = "RE1_p"\nHeader2 = "RE2_p"\nBody = "RE3_p
print "Searched: \t"ARGC" files\t" NR" lines"
print "under directory"'"\" $DIRECTORY"\"'}
}' FILES.tmp
rm -f FILES.tmp
FINISH=`daten`
echo " Finish = $FINISH"
echo " Start = $START"
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2000-12-15 3:21 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-15 3:21 `World Class' break through in software dev Harry Putnam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).