rc-list - mailing list for the rc(1) shell
 help / color / mirror / Atom feed
* Re:  recent rc hacks
@ 1992-09-29 14:38 rsalz
  1992-09-30 21:54 ` recent rc hacks (+ source for revfile) John Mackin
  0 siblings, 1 reply; 4+ messages in thread
From: rsalz @ 1992-09-29 14:38 UTC (permalink / raw)
  To: byron, rc

I like Byron's idea, but I like even more the idea of not needing these
little scripts or programs lying around.  And even more perversely, I like
the idea of using big monstrous perl to tame the history of small elegant
rc.  So, the following 45-line perl script.  It uses $histlog to say
where to write the history log file.

I don't know why but if I invoked this directly then my shell stopped
adding to the history (actually, I do sort of know why; I'd calll it a bug)
Using this function helps:
	fn trimhist { perl $h/bin/trimhist && history=$history }

#! /usr/bin/perl --

$size = 50;

$history = $ENV{'history'}
    || die "No \$history environment variable, stopped";
$old = $history . '~';
rename($history, $old) || die "Can't rename $!, stopped";
open(IN, $old) || die "Can't open $old $!, stopped";

##  Parse $history, filling in @lines with last unique occurence of each
##  command.
%commands = ();
@lines = ();
$count = 0;
line: while ( <IN> ) {
    chop;
    next line if /^-/ || /^; / || /^$/;
    $lines[$commands{$_}] = "" if defined $commands{$_};
    $commands{$_} = $count++;
    push(@lines, $_);
}
close(IN) || die "Can't close $history $!, stopped";
@lines = grep($_ ne "", @lines);

##  Open new output
open(OUT, ">$history") || die "Can't open $history $!, stopped";

##  Print last "$count" lines.
$start = $#lines - $size;
$start = 0 if $start < 0;
$count = 0;
foreach ( @lines[$start .. $#lines] ) {
    print OUT $_, "\n";
    $count++;
}
close(OUT) || die "Can't close $history $!, stopped";
unlink($old) || die "Can't remove $old $!, stopped";

##  Update $histlog (unset it to not do that).
$log = $ENV{'histlog'} || exit(0);
open(OUT, ">>$log") || die "Can't open $log $!, stopped";
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
printf OUT "%4d/%2.2d/%2.2d %d\n", $year + 1900, $mon + 1, $mday, $count;
close(OUT) || die "Can't close $log $!, stopped";


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: recent rc hacks (+ source for revfile)
  1992-09-29 14:38 recent rc hacks rsalz
@ 1992-09-30 21:54 ` John Mackin
  0 siblings, 0 replies; 4+ messages in thread
From: John Mackin @ 1992-09-30 21:54 UTC (permalink / raw)
  To: The rc Mailing List

Heavy sighs.

I should have posted my history compactor to the list ages ago, it seems.
And now when I go to check it before posting it, it seems to have bugs.

I just about give in, but thought I should point out a few things regarding
linewise file reversal (a very useful function in general).

#1: Seventh Edition didn't have "tail -r".  (It was adopted between Seventh
and Ninth -- probably in V8, but I don't have my V8 manual in the office.
I don't think that excuses anything.)  Hence, in general, one can't rely
on "tail -r" being present on your machine.

#2: I don't think that the comment in the BUGS section of SunOS 4.1.1
tail(1) either excuses or explains tail -r's behaviour.  It is referring
to things like "tail -400b", which have never worked.  (Although, if
you are going to give tail a -r option, which in my view is a stupid
plan, there's no reason not to make them work.)

I don't know the exact provenance of either the option or the bug,
but I do know that on DECstations running Ultrix 4.2, "tail -r file"
doesn't exhibit the bug.  I'm not really interested in chasing it
further since I contend (as stated above) that tail -r should never
be used at all in any case.

#3: "tac" is a reinvention of the wheel.  In 1985, there was a huge
debate on net.unix-wizards about the best way to reverse a file line-wise.
It lasted for months.  The final outcome was a utility called "revfile",
written by Stephen J. Muir, Computing Dept., Lancaster University.
Check archie for it; also, a copy is appended to this mail.

I don't want to seem to be slagging off at you in public,
Byron, but _really_, reading the whole file into memory!  I hardly
think that's a reasonable thing to do.  Especially since the kinds
of files we're likely to want to deal with in this way are precisely
those that are really big.

The code in revfile is not what I would call pretty, either; but it's
fast, it works, and it doesn't read the whole file before writing anything.
(Unless, of course, the input isn't seekable in which case there is no
choice: but in that case it copies it to /tmp, clearly the correct
solution.  We should _never_ assume that an arbitrary file fits in
memory.)

#4: Line-wise file reversal clearly calls for a purpose-built C program.
(One of the other conclusions of the debate.)  Things like awk are so
general that they're bound to be very slow at this.  [perl hackers:
Yes, I know perl is ideal for this task, as it is for all others, so
please don't bother telling me.]

OK,
John.

# To unbundle, sh this file
echo revfile.1 1>&2
sed 's/.//' >revfile.1 <<'//GO.SYSIN DD revfile.1'
-.TH REVFILE 1 local
-.SH NAME
-revfile \- reverse order of lines in file
-.SH SYNOPSIS
-.B revfile
-[
-.I file
-\&.\|.\|. ]
-.SH DESCRIPTION
-.I revfile
-copies the named files, or standard input if none are named,
-to standard output, reversing the order of lines in each file.
-The filename
-.B \-
-stands for the standard input.
-.SH AUTHOR
-Stephen J. Muir, Computing Dept., Lancaster University
-.SH "SEE ALSO"
-rev(1)
//GO.SYSIN DD revfile.1
echo revfile.c 1>&2
sed 's/.//' >revfile.c <<'//GO.SYSIN DD revfile.c'
-/* Written by Stephen J. Muir, Computing Dept., Lancaster University
- *  stephen@uk.ac.lancs.comp
- *  stephen@uk.ac.lancaster.computing
- *  dcl-cs!stephen
- *
-
- *  revfile(1) - reverse order of lines in files
- *
-
- */
-
-# include <stdio.h>
-# include <sys/types.h>
-# include <sys/stat.h>
-# include <sys/file.h>
-# include <fcntl.h>
-
-# define BUFSIZE	4096
-
-extern char	*malloc ();
-
-char	*standin = "-", *my_tmpfile = "/tmp/revfileXXXXXX";
-
-struct stat	mystat;
-
-struct list
-{ 
-	char		l_buf [BUFSIZE];
-	short		l_cnt;
-	struct list	*l_next;
-}	
-*head, *pool;
-
-/* insert data at beginning of list */
-linsert (buf, size)
-char	*buf;
-{ 
-	register struct list	*lp;
-	if (size == 0)
-		return;
-	if (lp = pool)	/* try to reuse a list element */
-		pool = pool->l_next;
-	else if ((lp = (struct list *)malloc (sizeof (struct list))) == 0)
-	{ 
-		fprintf (stderr, "Out of memory\n");
-		exit (1);
-	}
-	bcopy (buf, lp->l_buf, size);
-	lp->l_cnt = size;
-	lp->l_next = head;
-	head = lp;	/* insert at head of list */
-}
-
-lflush (buf, size)
-char	*buf;
-{ 
-	register struct list	*lp;
-	if (size && fwrite (buf, 1, size, stdout) != size)
-	{ 
-		perror ("stdout");
-		exit (1);
-	}
-	while (head)		/* flush list */
-	{ 
-		if (fwrite (head->l_buf, 1, head->l_cnt, stdout) != head->l_cnt)
-		{ 
-			perror ("stdout");
-			exit (1);
-		}
-		head = (lp = head)->l_next;
-		lp->l_next = pool;
-		pool = lp;		/* add to list of old elements */
-	}
-}
-
-revfile (name)
-char	*name;
-{ 
-	static char	buf [BUFSIZE];
-	register char	*cp, *ep;
-	register int	ofd, nfd, i, pos;
-	long newpos;
-	long lseek();
-	if (strcmp (name, standin))	/* open the file */
-	{ 
-		if ((ofd = open (name, 0)) == -1)
-		{ 
-			perror (name);
-			return (1);
-		}
-	}
-	else
-		ofd = 0;
-	/* attempt to use original file */
-	if (fstat (ofd, &mystat) == -1 ||
-	    (mystat.st_mode & S_IFMT) != S_IFREG ||	/* regular file? */
-	(pos = lseek (ofd, 0L, 2)) == -1	/* go to EOF? */
-	)
-	{ 
-		pos = 0;				/* failed - copy file */
-		if ((nfd = open (my_tmpfile, O_RDWR|O_CREAT, 0)) == -1 ||
-		    unlink (my_tmpfile) == -1
-		    )
-		{ 
-			perror (my_tmpfile);
-			goto erroro;
-		}
-		while ((i = read (ofd, buf, BUFSIZE)) > 0)
-		{ 
-			if (write (nfd, buf, i) != i)
-			{ 
-				perror (my_tmpfile);
-				goto errorn;
-			}
-			pos += i;
-		}
-		if (i == -1)
-		{ 
-			perror (name);
-			goto errorn;
-		}
-		close (ofd);
-		ofd = nfd;
-		name = my_tmpfile;
-	}
-	while (pos)
-	{ 
-		if ((newpos = pos - BUFSIZE) < 0)
-			newpos = 0;
-		i = pos - newpos;
-		if (lseek (ofd, newpos, 0) != newpos || read (ofd, buf, i) != i)
-		{ 
-			perror (name);
-			goto erroro;
-		}
-		for (cp = ep = &buf [i]; cp > &buf [0]; )
-			if (*--cp == '\n')
-			{ 
-				lflush (cp + 1, ep - (cp + 1));
-				ep = cp + 1;
-			}
-		linsert (cp, ep - cp);
-		pos = newpos;
-	}
-	lflush (0, 0);
-	if (ofd)
-		close (ofd);
-	return (0);
-errorn:	  
-	close (nfd);
-erroro:	  
-	if (ofd)
-		close (ofd);
-	return (1);
-}
-
-/*ARGSUSED*/
-main (argc, argv, envp)
-char	*argv [], *envp [];
-{ 
-	register short	exitstat = 0;
-	if (--argc <= 0)
-	{ 
-		argv = &standin;
-		argc = 1;
-	}
-	else
-		++argv;
-	mktemp (my_tmpfile);
-	while (argc--)
-		if (revfile (*argv++))
-			exitstat = 1;
-	if (fclose (stdout) == EOF)
-	{ 
-		perror ("stdout");
-		exit (1);
-	}
-	exit (exitstat);
-}
-bcopy(from, to, size)
-char *		from;
-char *		to;
-int		size;
-{
-	register char *	cp1 = from;
-	register char *	cp2 = to;
-	register int	n = size;
-
-	while ( n-- )
-		*cp2++ = *cp1++;
-}
//GO.SYSIN DD revfile.c


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: recent rc hacks
@ 1992-09-30 18:33 Arnold Robbins
  0 siblings, 0 replies; 4+ messages in thread
From: Arnold Robbins @ 1992-09-30 18:33 UTC (permalink / raw)
  To: Byron Rakitzis; +Cc: rc, david

> Date: 	Mon, 28 Sep 1992 16:42:10 -0400
> From: byron@netapp.com (Byron Rakitzis)
> To: rc@hawkwind.utcs.toronto.edu
> Subject: recent rc hacks
> 
> Anyway, before I talk more about "tac", let me quote the "onlyfirst"
> filter here:
> 
> #!/usr/local/bin/gawk -f
> {
> 	if (hash[$0] == 0) {
> 		hash[$0] = 1
> 		strings[total++] = $0
> 	}
> }
> 
> END {
> 	for (i = 0; i < total; i++)
> 		print strings[i]
> }

This would probably be better written as

	{
		if (! ($0 in strings))
			strings[$0]++
	}

	END {
		for (i in strings)
			print i
	}

You could modify the print statement in the END block to print a count
of commands or whatever other info you might want.

> Anyway, I didn't realize tail -r was broken until very recently (it's
> actually documented under BUGS in the Sun man pages --- oops). My
> first pass was an awk program, but the performance was terrible.
> (I wonder if anyone maintaining gawk might care to see why it's
> so expensive to append elements to an array?)

Arrays are hash tables.  The current version uses a fixed size set
of buckets.  You probably used something like

	{ a[NR] = $0 }
	END { for (i = NR; i >= 1; i++) print a[i] }

to do tac -- gawk has to malloc storage to hold your whole file, and
also has to do lots of hash chain following.  I also expect that the
hash function isn't great on purely numeric strings.  I'd be curious what
the performance of this program was compared to SunOS nawk or mawk.

Arnold


^ permalink raw reply	[flat|nested] 4+ messages in thread

* recent rc hacks
@ 1992-09-28 20:42 Byron Rakitzis
  0 siblings, 0 replies; 4+ messages in thread
From: Byron Rakitzis @ 1992-09-28 20:42 UTC (permalink / raw)
  To: rc

I've been working on a convenient way to keep my history file trimmed
down to a particular size, and yet make the most of the bytes in the
history file. I've now got a little history file "compressor" (more
like "redundancy eliminator" --- which is what a compressor is, so why
am I quibbling?) which I'd like to share with the list.

Here's what I run out of my crontab every 4am or so. It's a script
I call trimhist:

#!/bin/rc -l

if (test -f $history) {
	egrep -v '^(-|; )' $history |
		tac | onlyfirst | sed 5000q | tac > $history.$pid
	mv $history.$pid $history
	echo `{d} `{wc -l < $history} >> $home/lib/histlog
}

First it prunes the completely uninteresting lines from $history, then
reverses the file, then filters out all but the first instance of each
command (more later), trims to 5000 lines, reverses and stores. The "d"
command is a customized "date" script: I keep a log of the length of my
history file. It turns out that it's an interesting number to look at;
if you buy the concept of a "working set" of commands, then a
trimhisted history file should not change much in size over time.  I
have not had the chance to test this much, since my line reverser,
"tail -r", turns out to be broken.

Anyway, before I talk more about "tac", let me quote the "onlyfirst"
filter here:

#!/usr/local/bin/gawk -f
{
	if (hash[$0] == 0) {
		hash[$0] = 1
		strings[total++] = $0
	}
}

END {
	for (i = 0; i < total; i++)
		print strings[i]
}

The pipeline "tac|onlyfirst|tac" has the property that it keeps only
the last instance of whatever command you typed, but preserves the
ordering of the commands overall. (In trimhist, putting the "sed <n>q"
before the second tac makes sure you keep the *last* n lines!)

Anyway, I didn't realize tail -r was broken until very recently (it's
actually documented under BUGS in the Sun man pages --- oops). My
first pass was an awk program, but the performance was terrible.
(I wonder if anyone maintaining gawk might care to see why it's
so expensive to append elements to an array?)

So to wrap it up, here's my C implementation of tac. It's not very
pretty, but it works:

/* tac: a tail -r that really works */

#include <stdio.h>
#include <stdlib.h>

enum {CHUNKSIZE=8192};

static char *readfile(int fd, size_t *size) {
	size_t nread = 0;
	char *file;
	int r;

	if ((file = malloc(CHUNKSIZE)) == NULL) {
		perror("malloc");
		exit(1);
	}

	while (1) {
		switch (r = read(fd, file + nread, CHUNKSIZE)) {
		case -1:
			perror("read");
			exit(1);
		case 0:
			if (size != NULL)
				*size = nread;
			return file;
		default:
			nread += r;
			if ((file = realloc(file, nread + CHUNKSIZE)) == NULL) {
				perror("realloc");
				exit(1);
			}
			break;
		}
	}
}

static void tac(char *file, size_t size) {
	char *end, *last;
	int nlterm = (file[size-1] == '\n');

	for (last = end = &file[size-1-nlterm]; end != &file[-1]; --end)
		if (*end == '\n') {
     			fwrite(end + 1, 1, last - end, stdout);
			putchar('\n');
			last = end - 1;
		}

	fwrite(file, 1, last - end, stdout);
	if (nlterm)
		putchar('\n');
}

extern int main(int argc, char *argv[]) {
	size_t size;
	char *file;

	if (argc == 1) {
		file = readfile(0, &size);
		tac(file, size);
	} else {
		while (*++argv != NULL) {
			int fd = open(*argv, 0);
			if (fd < 0)
				perror(*argv);
			else {
				file = readfile(fd, &size);
				close(fd);
				tac(file, size);
			}
		}
	}
	return 0;
}


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1992-09-30 23:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1992-09-29 14:38 recent rc hacks rsalz
1992-09-30 21:54 ` recent rc hacks (+ source for revfile) John Mackin
  -- strict thread matches above, loose matches on Subject: below --
1992-09-30 18:33 recent rc hacks Arnold Robbins
1992-09-28 20:42 Byron Rakitzis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).