From: Dan Cross <cross@math.psu.edu>
To: 9fans@cse.psu.edu
Subject: [9fans] find(1) revisited.
Date: Thu, 31 Oct 2002 16:20:16 -0500 [thread overview]
Message-ID: <200210312120.g9VLKGi06485@augusta.math.psu.edu> (raw)
[-- Attachment #1: Type: text/plain, Size: 2914 bytes --]
So find sucks, but is inarguably useful, and I've found myself missing
it before. What would be a better solution, I've wondered? Here's a
proposal for one I think fits in much nicer.
Find basically does the following job: Walk a set of specified
directory trees, apply a set of predicates to the contents, and perform
some action if the result is true. However, find could be thought of
as a pipeline of filters that operate on a stream of filenames;
discarding some that don't find some criteria, and printing the rest to
the next stage. The final stage of the pipeline performs some action,
such as printing the file names or making them arguments to some
command.
So, I wrote two commands: walk, and sor. Walk walks over a directory
tree, printing it's contents, possibly with quoting in the style of the
new "%q" format (using ls's ``lsquote'' routine), and possibly with a
limitation on the depth to which it'll descend. Sor (Stream OR, get
it?) is an rc script that reads a set of filenames from it's input, and
applies a set of tests to them, echoing those names that pass a test,
discarding the rest. The effect is that one can now create arbitrary
pipelines that mimic what find does under Unix, only are much more
flexible. It isn't the fastest thing in the world, but it works, and
seems to work pretty well.
Now, I think sor is genuinely useful, but why walk when we have du -a?
Two reasons. (1) du is the program to collect disk usage statistics,
not the program to do general walks of a file tree. As such, it misses
a feature to limit depth, and contains logic to make sure it doesn't
``count'' a file twice. That's useful for du, but not really for a
general file walker; what if I want to see all names of a file? What
if I want to look for multiple names with the same qid? It certainly
makes no sense to augment du with that functionality: du is the disk
usage summarizer, not the general file tree walker. Also, not being
able to limit the depth of one's search can be annoying. What if I
want to look for a file, but only in the first three levels of a
hierarchy? It can be argued that it might be useful to summarize the
disk usage information of everything up to n levels in a hierarchy, but
I'd argue that that's not generally useful enough to warrant a change
to du. Besides, what if I wanted to find all the names of a given file
within n levels in a hierarchy? It could be argued that du could be
used along with some clever sed script to achieve the effect of depth
limiting, but we still loose the ability to see a file more than once.
Hence walk.
Anyway, I append both here in case others find them useful. It'd be
nice to see them go into the distrubition if there's enough general
appeal (and if they don't suck too badly). In any event, some feedback
on the idea and the tools would be nice.
- Dan C.
[-- Attachment #2: Type: text/plain, Size: 2515 bytes --]
/*
* Walk a directory tree, in the style of du(1),
* but with some additional flourishes.
*
* Dan Cross <cross@math.psu.edu>
*/
#include <u.h>
#include <libc.h>
static int mkdepth(int);
static char *mkname(char *, int *, char *, char *);
static void walk(char *, int, int);
static void walkname(char *, int, int);
static int walkquote(int c);
char *fmt;
void
main(int argc, char *argv[])
{
char *dir;
int depth;
Dir *d;
dir = ".";
fmt = "%s\n";
depth = -1;
ARGBEGIN {
case 'd':
depth = atoi(ARGF());
break;
case 'q':
quotefmtinstall();
doquote = walkquote;
fmt = "%q\n";
break;
}ARGEND
if (argc == 0)
walkname(".", depth, 1);
else {
for (dir = *argv; dir; dir = *++argv) {
if ((d = dirstat(dir)) == nil) {
fprint(2, "dirstat %s: %r\n", dir);
continue;
}
walkname(dir, depth, d->mode & DMDIR);
free(d);
}
}
exits(0);
}
/* Cribbed from ls(1) source. */
static int
walkquote(int c)
{
if (c <= ' ' || strchr("`^#*[]=|\?${}()'", c))
return(1);
return(0);
}
static void
walkname(char *dirname, int depth, int isdir)
{
int fd;
if (strcmp(dirname, ".") != 0 && strcmp(dirname, "..") != 0)
print(fmt, dirname);
if (isdir) {
fd = open(dirname, OREAD);
if (fd < 0) {
fprint(2, "open %s: %r\n", dirname);
return;
}
walk(dirname, fd, depth);
close(fd);
}
}
static char *
mkname(char *name, int *l, char *basename, char *filename)
{
char *nname;
int t;
t = strlen(basename) + 1 + strlen(filename) + 1;
if (*l == 0 || name == nil) {
*l = t;
name = malloc(t);
if (name == nil)
sysfatal("malloc %d: %r\n", l);
} else if (*l < t) {
nname = realloc(name, t);
if (nname == nil) {
free(name);
sysfatal("malloc %d: %r\n", l);
}
*l = t;
name = nname;
}
snprint(name, t, "%s/%s", basename, filename);
cleanname(name);
return(name);
}
static int
mkdepth(int depth)
{
return((depth == -1) ? depth : depth - 1);
}
static void
walk(char *dirname, int fd, int depth)
{
Dir *dir, *dp;
char *name, *nname;
int i, l, n, t;
if (depth == 0)
return;
l = 0;
name = nil;
n = dirreadall(fd, &dir);
for (dp = dir, i = 0; i < n; dp++, i++) {
if (strcmp(dp->name, ".") == 0 || strcmp(dp->name, "..") == 0)
continue;
name = mkname(name, &l, dirname, dp->name);
walkname(name, mkdepth(depth), dp->mode & DMDIR);
}
free(dir);
if (name != nil)
free(name);
}
[-- Attachment #3: Type: text/plain, Size: 193 bytes --]
#!/bin/rc
rfork e
fn runtests {
file=$1; shift
while (! ~ $#* 0 && ! eval $1 ''''^$file^'''')
shift
if (! ~ $#* 0)
echo $file
}
while (file = `{read}) {
runtests $file $*
}
next reply other threads:[~2002-10-31 21:20 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-31 21:20 Dan Cross [this message]
2002-10-31 21:58 ` Dan Cross
2002-11-01 12:01 ` Boyd Roberts
2002-11-01 19:14 ` Dan Cross
2002-11-01 0:47 Geoff Collyer
2002-11-01 2:09 ` Dan Cross
2002-11-01 11:53 ` Boyd Roberts
2002-11-01 17:31 a
2002-11-01 17:37 Russ Cox
2002-11-04 8:54 Fco.J.Ballesteros
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200210312120.g9VLKGi06485@augusta.math.psu.edu \
--to=cross@math.psu.edu \
--cc=9fans@cse.psu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).