rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* recent rc hacks
@ 1992-09-28 20:42 Byron Rakitzis
  0 siblings, 0 replies; 3+ messages in thread
From: Byron Rakitzis @ 1992-09-28 20:42 UTC (permalink / raw)
  To: rc

I've been working on a convenient way to keep my history file trimmed
down to a particular size, and yet make the most of the bytes in the
history file. I've now got a little history file "compressor" (more
like "redundancy eliminator" --- which is what a compressor is, so why
am I quibbling?) which I'd like to share with the list.

Here's what I run out of my crontab every 4am or so. It's a script
I call trimhist:

#!/bin/rc -l

if (test -f $history) {
	egrep -v '^(-|; )' $history |
		tac | onlyfirst | sed 5000q | tac > $history.$pid
	mv $history.$pid $history
	echo `{d} `{wc -l < $history} >> $home/lib/histlog
}

First it prunes the completely uninteresting lines from $history, then
reverses the file, then filters out all but the first instance of each
command (more later), trims to 5000 lines, reverses and stores. The "d"
command is a customized "date" script: I keep a log of the length of my
history file. It turns out that it's an interesting number to look at;
if you buy the concept of a "working set" of commands, then a
trimhisted history file should not change much in size over time.  I
have not had the chance to test this much, since my line reverser,
"tail -r", turns out to be broken.

Anyway, before I talk more about "tac", let me quote the "onlyfirst"
filter here:

#!/usr/local/bin/gawk -f
{
	if (hash[$0] == 0) {
		hash[$0] = 1
		strings[total++] = $0
	}
}

END {
	for (i = 0; i < total; i++)
		print strings[i]
}

The pipeline "tac|onlyfirst|tac" has the property that it keeps only
the last instance of whatever command you typed, but preserves the
ordering of the commands overall. (In trimhist, putting the "sed <n>q"
before the second tac makes sure you keep the *last* n lines!)

Anyway, I didn't realize tail -r was broken until very recently (it's
actually documented under BUGS in the Sun man pages --- oops). My
first pass was an awk program, but the performance was terrible.
(I wonder if anyone maintaining gawk might care to see why it's
so expensive to append elements to an array?)

So to wrap it up, here's my C implementation of tac. It's not very
pretty, but it works:

/* tac: a tail -r that really works */

#include <stdio.h>
#include <stdlib.h>

enum {CHUNKSIZE=8192};

static char *readfile(int fd, size_t *size) {
	size_t nread = 0;
	char *file;
	int r;

	if ((file = malloc(CHUNKSIZE)) == NULL) {
		perror("malloc");
		exit(1);
	}

	while (1) {
		switch (r = read(fd, file + nread, CHUNKSIZE)) {
		case -1:
			perror("read");
			exit(1);
		case 0:
			if (size != NULL)
				*size = nread;
			return file;
		default:
			nread += r;
			if ((file = realloc(file, nread + CHUNKSIZE)) == NULL) {
				perror("realloc");
				exit(1);
			}
			break;
		}
	}
}

static void tac(char *file, size_t size) {
	char *end, *last;
	int nlterm = (file[size-1] == '\n');

	for (last = end = &file[size-1-nlterm]; end != &file[-1]; --end)
		if (*end == '\n') {
     			fwrite(end + 1, 1, last - end, stdout);
			putchar('\n');
			last = end - 1;
		}

	fwrite(file, 1, last - end, stdout);
	if (nlterm)
		putchar('\n');
}

extern int main(int argc, char *argv[]) {
	size_t size;
	char *file;

	if (argc == 1) {
		file = readfile(0, &size);
		tac(file, size);
	} else {
		while (*++argv != NULL) {
			int fd = open(*argv, 0);
			if (fd < 0)
				perror(*argv);
			else {
				file = readfile(fd, &size);
				close(fd);
				tac(file, size);
			}
		}
	}
	return 0;
}


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: recent rc hacks
@ 1992-09-30 18:33 Arnold Robbins
  0 siblings, 0 replies; 3+ messages in thread
From: Arnold Robbins @ 1992-09-30 18:33 UTC (permalink / raw)
  To: Byron Rakitzis; +Cc: rc, david

> Date: 	Mon, 28 Sep 1992 16:42:10 -0400
> From: byron@netapp.com (Byron Rakitzis)
> To: rc@hawkwind.utcs.toronto.edu
> Subject: recent rc hacks
> 
> Anyway, before I talk more about "tac", let me quote the "onlyfirst"
> filter here:
> 
> #!/usr/local/bin/gawk -f
> {
> 	if (hash[$0] == 0) {
> 		hash[$0] = 1
> 		strings[total++] = $0
> 	}
> }
> 
> END {
> 	for (i = 0; i < total; i++)
> 		print strings[i]
> }

This would probably be better written as

	{
		if (! ($0 in strings))
			strings[$0]++
	}

	END {
		for (i in strings)
			print i
	}

You could modify the print statement in the END block to print a count
of commands or whatever other info you might want.

> Anyway, I didn't realize tail -r was broken until very recently (it's
> actually documented under BUGS in the Sun man pages --- oops). My
> first pass was an awk program, but the performance was terrible.
> (I wonder if anyone maintaining gawk might care to see why it's
> so expensive to append elements to an array?)

Arrays are hash tables.  The current version uses a fixed size set
of buckets.  You probably used something like

	{ a[NR] = $0 }
	END { for (i = NR; i >= 1; i++) print a[i] }

to do tac -- gawk has to malloc storage to hold your whole file, and
also has to do lots of hash chain following.  I also expect that the
hash function isn't great on purely numeric strings.  I'd be curious what
the performance of this program was compared to SunOS nawk or mawk.

Arnold


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re:  recent rc hacks
@ 1992-09-29 14:38 rsalz
  0 siblings, 0 replies; 3+ messages in thread
From: rsalz @ 1992-09-29 14:38 UTC (permalink / raw)
  To: byron, rc

I like Byron's idea, but I like even more the idea of not needing these
little scripts or programs lying around.  And even more perversely, I like
the idea of using big monstrous perl to tame the history of small elegant
rc.  So, the following 45-line perl script.  It uses $histlog to say
where to write the history log file.

I don't know why but if I invoked this directly then my shell stopped
adding to the history (actually, I do sort of know why; I'd calll it a bug)
Using this function helps:
	fn trimhist { perl $h/bin/trimhist && history=$history }

#! /usr/bin/perl --

$size = 50;

$history = $ENV{'history'}
    || die "No \$history environment variable, stopped";
$old = $history . '~';
rename($history, $old) || die "Can't rename $!, stopped";
open(IN, $old) || die "Can't open $old $!, stopped";

##  Parse $history, filling in @lines with last unique occurence of each
##  command.
%commands = ();
@lines = ();
$count = 0;
line: while ( <IN> ) {
    chop;
    next line if /^-/ || /^; / || /^$/;
    $lines[$commands{$_}] = "" if defined $commands{$_};
    $commands{$_} = $count++;
    push(@lines, $_);
}
close(IN) || die "Can't close $history $!, stopped";
@lines = grep($_ ne "", @lines);

##  Open new output
open(OUT, ">$history") || die "Can't open $history $!, stopped";

##  Print last "$count" lines.
$start = $#lines - $size;
$start = 0 if $start < 0;
$count = 0;
foreach ( @lines[$start .. $#lines] ) {
    print OUT $_, "\n";
    $count++;
}
close(OUT) || die "Can't close $history $!, stopped";
unlink($old) || die "Can't remove $old $!, stopped";

##  Update $histlog (unset it to not do that).
$log = $ENV{'histlog'} || exit(0);
open(OUT, ">>$log") || die "Can't open $log $!, stopped";
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
printf OUT "%4d/%2.2d/%2.2d %d\n", $year + 1900, $mon + 1, $mday, $count;
close(OUT) || die "Can't close $log $!, stopped";


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~1992-09-30 18:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1992-09-28 20:42 recent rc hacks Byron Rakitzis
1992-09-29 14:38 rsalz
1992-09-30 18:33 Arnold Robbins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).