From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from netcomsv.netcom.com ([192.100.81.101]) by hawkwind.utcs.toronto.edu with SMTP id <2701>; Mon, 28 Sep 1992 16:56:12 -0400 Received: from netapp.UUCP by netcomsv.netcom.com with UUCP (4.1/SMI-4.1) id AA00765; Mon, 28 Sep 92 13:55:24 PDT Received: by netapp.netapp.com (4.1/SMI-4.1) id AA07458; Mon, 28 Sep 92 13:42:10 PDT Date: Mon, 28 Sep 1992 16:42:10 -0400 From: byron@netapp.com (Byron Rakitzis) Message-Id: <9209282042.AA07458@netapp.netapp.com> To: rc@hawkwind.utcs.toronto.edu Subject: recent rc hacks I've been working on a convenient way to keep my history file trimmed down to a particular size, and yet make the most of the bytes in the history file. I've now got a little history file "compressor" (more like "redundancy eliminator" --- which is what a compressor is, so why am I quibbling?) which I'd like to share with the list. Here's what I run out of my crontab every 4am or so. It's a script I call trimhist: #!/bin/rc -l if (test -f $history) { egrep -v '^(-|; )' $history | tac | onlyfirst | sed 5000q | tac > $history.$pid mv $history.$pid $history echo `{d} `{wc -l < $history} >> $home/lib/histlog } First it prunes the completely uninteresting lines from $history, then reverses the file, then filters out all but the first instance of each command (more later), trims to 5000 lines, reverses and stores. The "d" command is a customized "date" script: I keep a log of the length of my history file. It turns out that it's an interesting number to look at; if you buy the concept of a "working set" of commands, then a trimhisted history file should not change much in size over time. I have not had the chance to test this much, since my line reverser, "tail -r", turns out to be broken. Anyway, before I talk more about "tac", let me quote the "onlyfirst" filter here: #!/usr/local/bin/gawk -f { if (hash[$0] == 0) { hash[$0] = 1 strings[total++] = $0 } } END { for (i = 0; i < total; i++) print strings[i] } The pipeline "tac|onlyfirst|tac" has the property that it keeps only the last instance of whatever command you typed, but preserves the ordering of the commands overall. (In trimhist, putting the "sed q" before the second tac makes sure you keep the *last* n lines!) Anyway, I didn't realize tail -r was broken until very recently (it's actually documented under BUGS in the Sun man pages --- oops). My first pass was an awk program, but the performance was terrible. (I wonder if anyone maintaining gawk might care to see why it's so expensive to append elements to an array?) So to wrap it up, here's my C implementation of tac. It's not very pretty, but it works: /* tac: a tail -r that really works */ #include #include enum {CHUNKSIZE=8192}; static char *readfile(int fd, size_t *size) { size_t nread = 0; char *file; int r; if ((file = malloc(CHUNKSIZE)) == NULL) { perror("malloc"); exit(1); } while (1) { switch (r = read(fd, file + nread, CHUNKSIZE)) { case -1: perror("read"); exit(1); case 0: if (size != NULL) *size = nread; return file; default: nread += r; if ((file = realloc(file, nread + CHUNKSIZE)) == NULL) { perror("realloc"); exit(1); } break; } } } static void tac(char *file, size_t size) { char *end, *last; int nlterm = (file[size-1] == '\n'); for (last = end = &file[size-1-nlterm]; end != &file[-1]; --end) if (*end == '\n') { fwrite(end + 1, 1, last - end, stdout); putchar('\n'); last = end - 1; } fwrite(file, 1, last - end, stdout); if (nlterm) putchar('\n'); } extern int main(int argc, char *argv[]) { size_t size; char *file; if (argc == 1) { file = readfile(0, &size); tac(file, size); } else { while (*++argv != NULL) { int fd = open(*argv, 0); if (fd < 0) perror(*argv); else { file = readfile(fd, &size); close(fd); tac(file, size); } } } return 0; }