From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <200210312120.g9VLKGi06485@augusta.math.psu.edu> To: 9fans@cse.psu.edu MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0" Content-ID: <6479.1036099195.0@augusta.math.psu.edu> From: Dan Cross Subject: [9fans] find(1) revisited. Date: Thu, 31 Oct 2002 16:20:16 -0500 Topicbox-Message-UUID: 12f71292-eacb-11e9-9e20-41e7f4b1d025 ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <6479.1036099195.1@augusta.math.psu.edu> So find sucks, but is inarguably useful, and I've found myself missing it before. What would be a better solution, I've wondered? Here's a proposal for one I think fits in much nicer. Find basically does the following job: Walk a set of specified directory trees, apply a set of predicates to the contents, and perform some action if the result is true. However, find could be thought of as a pipeline of filters that operate on a stream of filenames; discarding some that don't find some criteria, and printing the rest to the next stage. The final stage of the pipeline performs some action, such as printing the file names or making them arguments to some command. So, I wrote two commands: walk, and sor. Walk walks over a directory tree, printing it's contents, possibly with quoting in the style of the new "%q" format (using ls's ``lsquote'' routine), and possibly with a limitation on the depth to which it'll descend. Sor (Stream OR, get it?) is an rc script that reads a set of filenames from it's input, and applies a set of tests to them, echoing those names that pass a test, discarding the rest. The effect is that one can now create arbitrary pipelines that mimic what find does under Unix, only are much more flexible. It isn't the fastest thing in the world, but it works, and seems to work pretty well. Now, I think sor is genuinely useful, but why walk when we have du -a? Two reasons. (1) du is the program to collect disk usage statistics, not the program to do general walks of a file tree. As such, it misses a feature to limit depth, and contains logic to make sure it doesn't ``count'' a file twice. That's useful for du, but not really for a general file walker; what if I want to see all names of a file? What if I want to look for multiple names with the same qid? It certainly makes no sense to augment du with that functionality: du is the disk usage summarizer, not the general file tree walker. Also, not being able to limit the depth of one's search can be annoying. What if I want to look for a file, but only in the first three levels of a hierarchy? It can be argued that it might be useful to summarize the disk usage information of everything up to n levels in a hierarchy, but I'd argue that that's not generally useful enough to warrant a change to du. Besides, what if I wanted to find all the names of a given file within n levels in a hierarchy? It could be argued that du could be used along with some clever sed script to achieve the effect of depth limiting, but we still loose the ability to see a file more than once. Hence walk. Anyway, I append both here in case others find them useful. It'd be nice to see them go into the distrubition if there's enough general appeal (and if they don't suck too badly). In any event, some feedback on the idea and the tools would be nice. - Dan C. ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <6479.1036099195.2@augusta.math.psu.edu> /* * Walk a directory tree, in the style of du(1), * but with some additional flourishes. * * Dan Cross */ #include #include static int mkdepth(int); static char *mkname(char *, int *, char *, char *); static void walk(char *, int, int); static void walkname(char *, int, int); static int walkquote(int c); char *fmt; void main(int argc, char *argv[]) { char *dir; int depth; Dir *d; dir = "."; fmt = "%s\n"; depth = -1; ARGBEGIN { case 'd': depth = atoi(ARGF()); break; case 'q': quotefmtinstall(); doquote = walkquote; fmt = "%q\n"; break; }ARGEND if (argc == 0) walkname(".", depth, 1); else { for (dir = *argv; dir; dir = *++argv) { if ((d = dirstat(dir)) == nil) { fprint(2, "dirstat %s: %r\n", dir); continue; } walkname(dir, depth, d->mode & DMDIR); free(d); } } exits(0); } /* Cribbed from ls(1) source. */ static int walkquote(int c) { if (c <= ' ' || strchr("`^#*[]=|\?${}()'", c)) return(1); return(0); } static void walkname(char *dirname, int depth, int isdir) { int fd; if (strcmp(dirname, ".") != 0 && strcmp(dirname, "..") != 0) print(fmt, dirname); if (isdir) { fd = open(dirname, OREAD); if (fd < 0) { fprint(2, "open %s: %r\n", dirname); return; } walk(dirname, fd, depth); close(fd); } } static char * mkname(char *name, int *l, char *basename, char *filename) { char *nname; int t; t = strlen(basename) + 1 + strlen(filename) + 1; if (*l == 0 || name == nil) { *l = t; name = malloc(t); if (name == nil) sysfatal("malloc %d: %r\n", l); } else if (*l < t) { nname = realloc(name, t); if (nname == nil) { free(name); sysfatal("malloc %d: %r\n", l); } *l = t; name = nname; } snprint(name, t, "%s/%s", basename, filename); cleanname(name); return(name); } static int mkdepth(int depth) { return((depth == -1) ? depth : depth - 1); } static void walk(char *dirname, int fd, int depth) { Dir *dir, *dp; char *name, *nname; int i, l, n, t; if (depth == 0) return; l = 0; name = nil; n = dirreadall(fd, &dir); for (dp = dir, i = 0; i < n; dp++, i++) { if (strcmp(dp->name, ".") == 0 || strcmp(dp->name, "..") == 0) continue; name = mkname(name, &l, dirname, dp->name); walkname(name, mkdepth(depth), dp->mode & DMDIR); } free(dir); if (name != nil) free(name); } ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <6479.1036099195.3@augusta.math.psu.edu> #!/bin/rc rfork e fn runtests { file=$1; shift while (! ~ $#* 0 && ! eval $1 ''''^$file^'''') shift if (! ~ $#* 0) echo $file } while (file = `{read}) { runtests $file $* } ------- =_aaaaaaaaaa0--