zsh-users
 help / color / mirror / code / Atom feed
* clobber, diff, and any other suggestions...
@ 1998-03-26 17:04 Sweth Chandramouli
  1998-03-26 18:04 ` Andrew Main
  0 siblings, 1 reply; 2+ messages in thread
From: Sweth Chandramouli @ 1998-03-26 17:04 UTC (permalink / raw)
  To: ZSH Users

	some of the users at my site have been complaining about a very spotty 
newsfeed, so i complained in turn to our upstream provider, who responded that 
in order to do anything, i would need to provide them with a list of missing 
articles.  i whipped up this script to climb the directory tree, create a list 
of referenced articles and a list of message-ids for present articles, and find 
out which articles that are being referenced aren't actually present; i was 
wondering if any kind souls could help me tweak it some (and, in the spirit of 
tjl's function archive, let me and others learn by example).
	the first little block of script is, obviously, to create zero-length 
files (or zero out existing files), because the current settings on zsh on my 
machine don't let >> create a file, only append, and don't let > clobber a file, 
only create.  i could swear that the last zsh install i used had the oppostie 
behavior for both, and that the option to set it was clobber/noclobber, but 
doing a search on noclobber in the man page returns nothing, while searching for 
clobber only returns HIST_ALLOW_CLOBBER.  so first, what is that switch?  is 
there some one-time only version of > and >> to toggle that behavior (akin to 
${=variable}, which turns on word splitting for that one substitution) that i 
could use here, since i have become used to not being able to clobber files with 
>, and now sometimes even use it to test for a file's existance?
	the second block should first make a list of every directory under the 
one in which the script is run, and then check for lines beginning with 
'Message-ID', 'Message-Id', or 'References' in all files in those directories, 
parsing out the appropriate field from such lines if they exist.  i stuck in the 
if...then loop because the script would hang if there was a directory that only 
contained other dirs, not files, and the 2> /dev/null in that test was to 
eliminate the error that ls would return when the test failed; the loop _does_ 
stop the script from hanging, but the error text still appears on-screen (which 
isn't a big problem, but which i wouldn't mind fixing somehow).  also, i'm sure 
that that loop could be cleaned up in a lot of different ways--maybe an awk 
statement to do all of the parsing?
	the third block creates a new list of references, one-per-line, and 
sorts it and gets rid of duplicates.  i thought that enclosing the entire while 
loop in curly braces, getting rid of the >> for the output of the echo, and 
piping the output of the entire brace-enclosed block into the sort command would 
let me get rid of those temp files, but that didn't seem to work.  what was i 
missing there?
	finally, and this is the main reason why i'm writing, the last block 
should, as i see it, find any message-ids that occur in both the references list 
and the message-ids list, and output them to the file $matches.  the next line 
diffs $matches against the references list; since the only differences should be 
lines that appear in the references list but not in $matches, i would think that 
the diff output would be a list of message-ids for all of the missing articles, 
each prepended with a '>' (no quotes).  the actual output, however, has a few 
ids that are prepended with a '<', which would mean that these were articles 
that appeared in the match-list but not the original reference list; since the 
match-list can only contain lines that are in the original reference list, i 
have a feeling that somethign is seriously awry here.
	ideas?  suggestions?  comments?  a one-line obscure zsh command that 
would do all of this much more cleanly :) ?

	tia,
	sweth.
	
#!/usr/local/bin/zsh

outdir=$HOME/newses
idfile=$outdir/mesgids
reffile=$outdir/refids
matches=$outdir/matchids
misses=$outdir/missids

echo 'zeroing files'
cp /dev/null $idfile
cp /dev/null $reffile
cp /dev/null $misses
cp /dev/null $matches
cp /dev/null ${reffile}.list
cp /dev/null ${reffile}.list2

for dir in `find . -type d -name '*' -print` ; do
   if [[ -n `ls $dir/*(.) 2> /dev/null` ]] ; then
      echo "grepping mesg-ids in $dir..."
      grep '^Message-I[dD]: <.*>' `ls ${dir}/*(.)` | cut -d':' -f3- \
        | tr -d ' ' | sort >> $idfile
      echo "grepping refs in $dir..."
      grep '^References: <.*>' `ls ${dir}/*(.)` | cut -d':' -f3- \
        | sed 's/^ //' >> $reffile
   fi;
done 

echo "splitting words..."
while read line ; do
   items=(${=line})
   for item in $items ; do
      echo $item >> ${reffile}.list
   done ;
done < $reffile
sort ${reffile}.list | sort -u >> ${reffile}.list2

echo 'finding matches...'
while read line ; do
   grep $line $idfile >> $matches
done < ${reffile}.list2

diff $matches ${reffile}.list2 > $missids


-- 
"Countin' on a remedy I've counted on before
Goin' with a cure that's never failed me
What you call the disease
I call the remedy"  -- The Mighty Mighty Bosstones


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: clobber, diff, and any other suggestions...
  1998-03-26 17:04 clobber, diff, and any other suggestions Sweth Chandramouli
@ 1998-03-26 18:04 ` Andrew Main
  0 siblings, 0 replies; 2+ messages in thread
From: Andrew Main @ 1998-03-26 18:04 UTC (permalink / raw)
  To: Sweth Chandramouli; +Cc: zsh-users

Sweth Chandramouli wrote:
>	the first little block of script is, obviously, to create zero-length 
>files (or zero out existing files), because the current settings on zsh on my 
>machine don't let >> create a file, only append, and don't let > clobber a file, 
>only create.  i could swear that the last zsh install i used had the oppostie 
>behavior for both, and that the option to set it was clobber/noclobber, but 
>doing a search on noclobber in the man page returns nothing, while searching for 
>clobber only returns HIST_ALLOW_CLOBBER.  so first, what is that switch?

It is (NO_)CLOBBER, and it is in the man page.  The default is for
CLOBBER to be on; something in your setup must be turning it off.

>                                                                          is 
>there some one-time only version of > and >> to toggle that behavior

>! and >>!

>	the second block should first make a list of every directory under the 
>one in which the script is run, and then check for lines beginning with 
>'Message-ID', 'Message-Id', or 'References' in all files in those directories, 

	find . -type f -print0 | xargs -0 ./process_files

(if you have GNU tools -- otherwise change "-print0" to "-print" and drop
"-0").  This is much more efficient and scalable than anything you can
do in zsh builtins.  The process_files script should be something like

	#!/usr/local/bin/zsh -f
	for f; do
		sed -n '/^$/q
			/^References:/{
				s/^[^:]*: *//
				s/  *$//
				s/  */ /g
				p
			}' $f
		sed -n '/^$/q
			/^Message-I[dD]:/{
				s/^[^:]*: *//
				s/  *$//
				p
			}' $f >&3
	done 3>> messages | tr ' ' '\012' >> referenced

Or you can do all that in Perl for greater efficiency -- those forks
don't come cheap, you know.  When all that has finished, you'll have
files containing unsorted message-IDs.  Run

	comm -13 <(sort messages) <(sort -u referenced)

to get a list of unique message-IDs of messages that are referenced but
not available.

zsh's clever features aren't really helpful on large scale jobs like this,
but its flexible plumbing does come in handy.

-zefram


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~1998-03-26 18:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-03-26 17:04 clobber, diff, and any other suggestions Sweth Chandramouli
1998-03-26 18:04 ` Andrew Main

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).