From mboxrd@z Thu Jan  1 00:00:00 1970
Message-Id: <200210312120.g9VLKGi06485@augusta.math.psu.edu>
To: 9fans@cse.psu.edu
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0"
Content-ID: <6479.1036099195.0@augusta.math.psu.edu>
From: Dan Cross <cross@math.psu.edu>
Subject: [9fans] find(1) revisited.
Date: Thu, 31 Oct 2002 16:20:16 -0500
Topicbox-Message-UUID: 12f71292-eacb-11e9-9e20-41e7f4b1d025

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <6479.1036099195.1@augusta.math.psu.edu>

So find sucks, but is inarguably useful, and I've found myself missing
it before.  What would be a better solution, I've wondered?  Here's a
proposal for one I think fits in much nicer.

Find basically does the following job:  Walk a set of specified
directory trees, apply a set of predicates to the contents, and perform
some action if the result is true.  However, find could be thought of
as a pipeline of filters that operate on a stream of filenames;
discarding some that don't find some criteria, and printing the rest to
the next stage.  The final stage of the pipeline performs some action,
such as printing the file names or making them arguments to some
command.

So, I wrote two commands: walk, and sor.  Walk walks over a directory
tree, printing it's contents, possibly with quoting in the style of the
new "%q" format (using ls's ``lsquote'' routine), and possibly with a
limitation on the depth to which it'll descend.  Sor (Stream OR, get
it?) is an rc script that reads a set of filenames from it's input, and
applies a set of tests to them, echoing those names that pass a test,
discarding the rest.  The effect is that one can now create arbitrary
pipelines that mimic what find does under Unix, only are much more
flexible.  It isn't the fastest thing in the world, but it works, and
seems to work pretty well.

Now, I think sor is genuinely useful, but why walk when we have du -a?
Two reasons.  (1) du is the program to collect disk usage statistics,
not the program to do general walks of a file tree.  As such, it misses
a feature to limit depth, and contains logic to make sure it doesn't
``count'' a file twice.  That's useful for du, but not really for a
general file walker; what if I want to see all names of a file?  What
if I want to look for multiple names with the same qid?  It certainly
makes no sense to augment du with that functionality: du is the disk
usage summarizer, not the general file tree walker.  Also, not being
able to limit the depth of one's search can be annoying.  What if I
want to look for a file, but only in the first three levels of a
hierarchy?  It can be argued that it might be useful to summarize the
disk usage information of everything up to n levels in a hierarchy, but
I'd argue that that's not generally useful enough to warrant a change
to du.  Besides, what if I wanted to find all the names of a given file
within n levels in a hierarchy?  It could be argued that du could be
used along with some clever sed script to achieve the effect of depth
limiting, but we still loose the ability to see a file more than once.
Hence walk.

Anyway, I append both here in case others find them useful.  It'd be
nice to see them go into the distrubition if there's enough general
appeal (and if they don't suck too badly).  In any event, some feedback
on the idea and the tools would be nice.

	- Dan C.


------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <6479.1036099195.2@augusta.math.psu.edu>

/*
 *  Walk a directory tree, in the style of du(1),
 *  but with some additional flourishes.
 *
 *  Dan Cross <cross@math.psu.edu>
 */

#include <u.h>
#include <libc.h>

static int	mkdepth(int);
static char	*mkname(char *, int *, char *, char *);
static void	walk(char *, int, int);
static void	walkname(char *, int, int);
static int	walkquote(int c);

char	*fmt;

void
main(int argc, char *argv[])
{
	char	*dir;
	int	depth;
	Dir	*d;

	dir = ".";
	fmt = "%s\n";
	depth = -1;
	ARGBEGIN {
	case 'd':
		depth = atoi(ARGF());
		break;
	case 'q':
		quotefmtinstall();
		doquote = walkquote;
		fmt = "%q\n";
		break;
	}ARGEND
	if (argc == 0)
		walkname(".", depth, 1);
	else {
		for (dir = *argv; dir; dir = *++argv) {
			if ((d = dirstat(dir)) == nil) {
				fprint(2, "dirstat %s: %r\n", dir);
				continue;
			}
			walkname(dir, depth, d->mode & DMDIR);
			free(d);
		}
	}

	exits(0);
}

/*  Cribbed from ls(1) source.  */
static int
walkquote(int c)
{

	if (c <= ' ' || strchr("`^#*[]=|\?${}()'", c))
		return(1);
	return(0);
}

static void
walkname(char *dirname, int depth, int isdir)
{
	int	fd;

	if (strcmp(dirname, ".") != 0 && strcmp(dirname, "..") != 0)
		print(fmt, dirname);
	if (isdir) {
		fd = open(dirname, OREAD);
		if (fd < 0) {
			fprint(2, "open %s: %r\n", dirname);
			return;
		}
		walk(dirname, fd, depth);
		close(fd);
	}
}

static char *
mkname(char *name, int *l, char *basename, char *filename)
{
	char	*nname;
	int	t;

	t = strlen(basename) + 1 + strlen(filename) + 1;
	if (*l == 0 || name == nil) {
		*l = t;
		name = malloc(t);
		if (name == nil)
			sysfatal("malloc %d: %r\n", l);
	} else if (*l < t) {
		nname = realloc(name, t);
		if (nname == nil) {
			free(name);
			sysfatal("malloc %d: %r\n", l);
		}
		*l = t;
		name = nname;
	}
	snprint(name, t, "%s/%s", basename, filename);
	cleanname(name);

	return(name);
}

static int
mkdepth(int depth)
{

	return((depth == -1) ? depth : depth - 1);
}

static void
walk(char *dirname, int fd, int depth)
{
	Dir	*dir, *dp;
	char	*name, *nname;
	int	i, l, n, t;

	if (depth == 0)
		return;
	l = 0;
	name = nil;
	n = dirreadall(fd, &dir);
	for (dp = dir, i = 0; i < n; dp++, i++) {
		if (strcmp(dp->name, ".") == 0 || strcmp(dp->name, "..") == 0)
			continue;
		name = mkname(name, &l, dirname, dp->name);
		walkname(name, mkdepth(depth), dp->mode & DMDIR);
	}
	free(dir);
	if (name != nil)
		free(name);
}

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <6479.1036099195.3@augusta.math.psu.edu>

#!/bin/rc
rfork e
fn runtests {
	file=$1; shift
	while (! ~ $#* 0 && ! eval $1 ''''^$file^'''')
		shift
	if (! ~ $#* 0)
		echo $file
}
while (file = `{read}) {
	runtests $file $*
}

------- =_aaaaaaaaaa0--