From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from netcomsv.netcom.com ([192.100.81.101]) by hawkwind.utcs.toronto.edu with SMTP id <2701>; Mon, 28 Sep 1992 16:56:12 -0400
Received: from netapp.UUCP by netcomsv.netcom.com with UUCP (4.1/SMI-4.1)
	id AA00765; Mon, 28 Sep 92 13:55:24 PDT
Received: by netapp.netapp.com (4.1/SMI-4.1)
	id AA07458; Mon, 28 Sep 92 13:42:10 PDT
Date:	Mon, 28 Sep 1992 16:42:10 -0400
From:	byron@netapp.com (Byron Rakitzis)
Message-Id: <9209282042.AA07458@netapp.netapp.com>
To:	rc@hawkwind.utcs.toronto.edu
Subject: recent rc hacks

I've been working on a convenient way to keep my history file trimmed
down to a particular size, and yet make the most of the bytes in the
history file. I've now got a little history file "compressor" (more
like "redundancy eliminator" --- which is what a compressor is, so why
am I quibbling?) which I'd like to share with the list.

Here's what I run out of my crontab every 4am or so. It's a script
I call trimhist:

#!/bin/rc -l

if (test -f $history) {
	egrep -v '^(-|; )' $history |
		tac | onlyfirst | sed 5000q | tac > $history.$pid
	mv $history.$pid $history
	echo `{d} `{wc -l < $history} >> $home/lib/histlog
}

First it prunes the completely uninteresting lines from $history, then
reverses the file, then filters out all but the first instance of each
command (more later), trims to 5000 lines, reverses and stores. The "d"
command is a customized "date" script: I keep a log of the length of my
history file. It turns out that it's an interesting number to look at;
if you buy the concept of a "working set" of commands, then a
trimhisted history file should not change much in size over time.  I
have not had the chance to test this much, since my line reverser,
"tail -r", turns out to be broken.

Anyway, before I talk more about "tac", let me quote the "onlyfirst"
filter here:

#!/usr/local/bin/gawk -f
{
	if (hash[$0] == 0) {
		hash[$0] = 1
		strings[total++] = $0
	}
}

END {
	for (i = 0; i < total; i++)
		print strings[i]
}

The pipeline "tac|onlyfirst|tac" has the property that it keeps only
the last instance of whatever command you typed, but preserves the
ordering of the commands overall. (In trimhist, putting the "sed <n>q"
before the second tac makes sure you keep the *last* n lines!)

Anyway, I didn't realize tail -r was broken until very recently (it's
actually documented under BUGS in the Sun man pages --- oops). My
first pass was an awk program, but the performance was terrible.
(I wonder if anyone maintaining gawk might care to see why it's
so expensive to append elements to an array?)

So to wrap it up, here's my C implementation of tac. It's not very
pretty, but it works:

/* tac: a tail -r that really works */

#include <stdio.h>
#include <stdlib.h>

enum {CHUNKSIZE=8192};

static char *readfile(int fd, size_t *size) {
	size_t nread = 0;
	char *file;
	int r;

	if ((file = malloc(CHUNKSIZE)) == NULL) {
		perror("malloc");
		exit(1);
	}

	while (1) {
		switch (r = read(fd, file + nread, CHUNKSIZE)) {
		case -1:
			perror("read");
			exit(1);
		case 0:
			if (size != NULL)
				*size = nread;
			return file;
		default:
			nread += r;
			if ((file = realloc(file, nread + CHUNKSIZE)) == NULL) {
				perror("realloc");
				exit(1);
			}
			break;
		}
	}
}

static void tac(char *file, size_t size) {
	char *end, *last;
	int nlterm = (file[size-1] == '\n');

	for (last = end = &file[size-1-nlterm]; end != &file[-1]; --end)
		if (*end == '\n') {
     			fwrite(end + 1, 1, last - end, stdout);
			putchar('\n');
			last = end - 1;
		}

	fwrite(file, 1, last - end, stdout);
	if (nlterm)
		putchar('\n');
}

extern int main(int argc, char *argv[]) {
	size_t size;
	char *file;

	if (argc == 1) {
		file = readfile(0, &size);
		tac(file, size);
	} else {
		while (*++argv != NULL) {
			int fd = open(*argv, 0);
			if (fd < 0)
				perror(*argv);
			else {
				file = readfile(fd, &size);
				close(fd);
				tac(file, size);
			}
		}
	}
	return 0;
}