From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from gatech.edu ([128.61.1.1]) by hawkwind.utcs.toronto.edu with SMTP id <2701>; Wed, 30 Sep 1992 14:34:02 -0400 Received: from burdell.cc.gatech.edu by gatech.edu (4.1/Gatech-9.1) id AA02617 for rc@hawkwind.utcs.toronto.edu; Wed, 30 Sep 92 14:33:53 EDT Received: from penfold.cc.gatech.edu by burdell.cc.gatech.edu (4.1/SMI-4.1) id AA22350; for byron@netapp.com; Wed, 30 Sep 92 14:33:50 EDT Received: by penfold.cc.gatech.edu (4.1/SMI-4.1) id AA01662; Wed, 30 Sep 92 14:33:16 EDT From: arnold@cc.gatech.edu (Arnold Robbins) Message-Id: <9209301833.AA01662@penfold.cc.gatech.edu> Date: Wed, 30 Sep 1992 14:33:15 -0400 In-Reply-To: Byron Rakitzis's 150-line message on Sep 28, 4:42pm X-Ultrix: Just Say NO! X-Important-Saying: Premature Optimization Is The Root Of All Evil. X-Mailer: Mail User's Shell (7.2.3 5/22/91) To: byron@netapp.com (Byron Rakitzis) Subject: Re: recent rc hacks Cc: rc@hawkwind.utcs.toronto.edu, david@cs.dal.ca > Date: Mon, 28 Sep 1992 16:42:10 -0400 > From: byron@netapp.com (Byron Rakitzis) > To: rc@hawkwind.utcs.toronto.edu > Subject: recent rc hacks > > Anyway, before I talk more about "tac", let me quote the "onlyfirst" > filter here: > > #!/usr/local/bin/gawk -f > { > if (hash[$0] == 0) { > hash[$0] = 1 > strings[total++] = $0 > } > } > > END { > for (i = 0; i < total; i++) > print strings[i] > } This would probably be better written as { if (! ($0 in strings)) strings[$0]++ } END { for (i in strings) print i } You could modify the print statement in the END block to print a count of commands or whatever other info you might want. > Anyway, I didn't realize tail -r was broken until very recently (it's > actually documented under BUGS in the Sun man pages --- oops). My > first pass was an awk program, but the performance was terrible. > (I wonder if anyone maintaining gawk might care to see why it's > so expensive to append elements to an array?) Arrays are hash tables. The current version uses a fixed size set of buckets. You probably used something like { a[NR] = $0 } END { for (i = NR; i >= 1; i++) print a[i] } to do tac -- gawk has to malloc storage to hold your whole file, and also has to do lots of hash chain following. I also expect that the hash function isn't great on purely numeric strings. I'd be curious what the performance of this program was compared to SunOS nawk or mawk. Arnold