From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <393a7f63e7c4973e661988f1009f5d7a@vitanuova.com> To: 9fans@cse.psu.edu Subject: Re: [9fans] ls, rc question -- proposed change to rc/glob.c From: rog@vitanuova.com In-Reply-To: <240892daccaec2131635fbd4917479de@hamnavoe.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="upas-ykttiwvdkmfsuzrinyxxkxscna" Date: Sun, 21 Mar 2004 20:47:46 +0000 Topicbox-Message-UUID: 3bffdadc-eacd-11e9-9e20-41e7f4b1d025 This is a multi-part message in MIME format. --upas-ykttiwvdkmfsuzrinyxxkxscna Content-Disposition: inline Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit for programs that require read-write access to a namespace, requirements are different, as charles points out. however, most tools do not write to the namespace. for me, one of plan 9's fundamental strengths is that while a tool is often written to access a predefined portion of the namespace, the actual namespace that it sees can be arranged at will for the tool in question. thus if a program accesses files in directory /d, i think it's reasonable to expect that if i bind another directory, say /b, before /d, i can expect the program to treat /d exactly as if i had really replaced the files therein with files from /b. the program shouldn't have to *care* whether it's dealing with a union directory or not - that should be solely the concern of whoever's arranging the namespace. i think the difference between: bind /dev/null /bin/ls and mkdir /tmp/x > /tmp/x/ls bind /dev/null /tmp/x/ls bind -b /tmp/x /bin should be invisible, particularly to shell scripts, which have no concept of sequentially reading a directory, and which are often short and trivial, but for which the above invariants are still important. russ: > if a file is covered up, it should be completely hidden. > it turns out that actually implementing this is quite hard, i.e. the visibility of mutiple names in a directory is an implementation issue. it's something that's easily solved at user level (no problems with arbitrary buffering there, and programs can eliminate duplicates in their own way if they like as, for example, du(1), mk(1), cron(1) listen(1) do). i think it should be made as easy as possible to write programs that behave in a namespace-agnostic way. i think that if a version of dirreadall was available that eliminated duplicates, most (all?) programs would wish to use it. i think it's particularly silly that wc */*.[ch] should give me erroneous results in a union directory, and that echo /bin/ape/* should be different from echo /bin/ape*/* when there's only one accessible directory starting "ape" in /bin, and the shell's sorting the globbed output anyway. i think eliminating duplicates in two core places would break nothing, but make parts of the system work more smoothly together. it's one less thing to think about. geoff points out many programs that use dirread. a random inspection turns up many programs that already do their own elimination of duplicates (in different ways), and others that behave badly in the face of duplicate directory entries (e.g. diff). this seems to me to point towards a systematic problem, which should be solved centrally. towards this end, i've attached a stable sort routine, along the same lines as qsort(2), which can optionally eliminate duplicates. i've tested it to a degree, but not greatly; i haven't done any performance analysis of it, but i doubt that's a limiting factor in this case. i envisage ls(1) being changed to use this routine (with an option to show/suppress duplicate entries), and perhaps a version of dirreadall that sorts the directory entries alphabetically and eliminates duplicates, to make it easy to replace it in existing programs. PS it's interesting to note that ls(1) does make some sort of an effort to do a stable sort: if(i == 0) i = (a #include int mergesort1(int uniq, char *a, int r, int width, int (*compare)(void*, void*), char *b) { int m, n0, n1, i, j, k, e, c; if(r <= width) return r; m = ((r/width-1)/2 + 1) * width; n0 = mergesort1(uniq, a, m, width, compare, b); n1 = mergesort1(uniq, a+m, r-m, width, compare, b); memcpy(b, a, n0); memcpy(b+n0, a+m, n1); e = n0+n1; i = 0; j = n0; k = 0; for(; i < n0 && j < e; k += width){ c = (*compare)(b+i, b+j); if(c > 0){ memcpy(a+k, b+j, width); j += width; }else{ memcpy(a+k, b+i, width); i += width; if(c == 0 && uniq) j += width; } } if(i < n0){ memcpy(a+k, b+i, n0-i); k += n0-i; }else if(j < e){ memcpy(a+k, b+j, e-j); k += e-j; } return k; } int mergesort(int uniq, void *a, long n, long width, int (*compare)(void*, void*)) { void *b; n *= width; b = malloc(n); if(b == nil) return -1; n = mergesort1(uniq, a, n, width, compare, b); free(b); return n / width; } --upas-ykttiwvdkmfsuzrinyxxkxscna--