rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
From: byron@netapp.com (Byron Rakitzis)
To: rc@hawkwind.utcs.toronto.edu
Subject: recent rc hacks
Date: Mon, 28 Sep 1992 16:42:10 -0400	[thread overview]
Message-ID: <9209282042.AA07458@netapp.netapp.com> (raw)

I've been working on a convenient way to keep my history file trimmed
down to a particular size, and yet make the most of the bytes in the
history file. I've now got a little history file "compressor" (more
like "redundancy eliminator" --- which is what a compressor is, so why
am I quibbling?) which I'd like to share with the list.

Here's what I run out of my crontab every 4am or so. It's a script
I call trimhist:

#!/bin/rc -l

if (test -f $history) {
	egrep -v '^(-|; )' $history |
		tac | onlyfirst | sed 5000q | tac > $history.$pid
	mv $history.$pid $history
	echo `{d} `{wc -l < $history} >> $home/lib/histlog
}

First it prunes the completely uninteresting lines from $history, then
reverses the file, then filters out all but the first instance of each
command (more later), trims to 5000 lines, reverses and stores. The "d"
command is a customized "date" script: I keep a log of the length of my
history file. It turns out that it's an interesting number to look at;
if you buy the concept of a "working set" of commands, then a
trimhisted history file should not change much in size over time.  I
have not had the chance to test this much, since my line reverser,
"tail -r", turns out to be broken.

Anyway, before I talk more about "tac", let me quote the "onlyfirst"
filter here:

#!/usr/local/bin/gawk -f
{
	if (hash[$0] == 0) {
		hash[$0] = 1
		strings[total++] = $0
	}
}

END {
	for (i = 0; i < total; i++)
		print strings[i]
}

The pipeline "tac|onlyfirst|tac" has the property that it keeps only
the last instance of whatever command you typed, but preserves the
ordering of the commands overall. (In trimhist, putting the "sed <n>q"
before the second tac makes sure you keep the *last* n lines!)

Anyway, I didn't realize tail -r was broken until very recently (it's
actually documented under BUGS in the Sun man pages --- oops). My
first pass was an awk program, but the performance was terrible.
(I wonder if anyone maintaining gawk might care to see why it's
so expensive to append elements to an array?)

So to wrap it up, here's my C implementation of tac. It's not very
pretty, but it works:

/* tac: a tail -r that really works */

#include <stdio.h>
#include <stdlib.h>

enum {CHUNKSIZE=8192};

static char *readfile(int fd, size_t *size) {
	size_t nread = 0;
	char *file;
	int r;

	if ((file = malloc(CHUNKSIZE)) == NULL) {
		perror("malloc");
		exit(1);
	}

	while (1) {
		switch (r = read(fd, file + nread, CHUNKSIZE)) {
		case -1:
			perror("read");
			exit(1);
		case 0:
			if (size != NULL)
				*size = nread;
			return file;
		default:
			nread += r;
			if ((file = realloc(file, nread + CHUNKSIZE)) == NULL) {
				perror("realloc");
				exit(1);
			}
			break;
		}
	}
}

static void tac(char *file, size_t size) {
	char *end, *last;
	int nlterm = (file[size-1] == '\n');

	for (last = end = &file[size-1-nlterm]; end != &file[-1]; --end)
		if (*end == '\n') {
     			fwrite(end + 1, 1, last - end, stdout);
			putchar('\n');
			last = end - 1;
		}

	fwrite(file, 1, last - end, stdout);
	if (nlterm)
		putchar('\n');
}

extern int main(int argc, char *argv[]) {
	size_t size;
	char *file;

	if (argc == 1) {
		file = readfile(0, &size);
		tac(file, size);
	} else {
		while (*++argv != NULL) {
			int fd = open(*argv, 0);
			if (fd < 0)
				perror(*argv);
			else {
				file = readfile(fd, &size);
				close(fd);
				tac(file, size);
			}
		}
	}
	return 0;
}


             reply	other threads:[~1992-09-28 20:56 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1992-09-28 20:42 Byron Rakitzis [this message]
1992-09-29 14:38 rsalz
1992-09-30 18:33 Arnold Robbins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9209282042.AA07458@netapp.netapp.com \
    --to=byron@netapp.com \
    --cc=rc@hawkwind.utcs.toronto.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).