From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17506 invoked from network); 3 Dec 2003 15:35:55 -0000 Received: from sunsite.dk (130.225.247.90) by ns1.primenet.com.au with SMTP; 3 Dec 2003 15:35:55 -0000 Received: (qmail 360 invoked by alias); 3 Dec 2003 15:35:33 -0000 Mailing-List: contact zsh-users-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 6831 Received: (qmail 346 invoked from network); 3 Dec 2003 15:35:32 -0000 Received: from localhost (HELO sunsite.dk) (127.0.0.1) by localhost with SMTP; 3 Dec 2003 15:35:32 -0000 X-MessageWall-Score: 0 (sunsite.dk) Received: from [62.193.203.32] by sunsite.dk (MessageWall 1.0.8) with SMTP; 3 Dec 2003 15:35:32 -0000 Received: from DervishD.pleyades.net (212.Red-80-35-44.pooles.rima-tde.net [80.35.44.212]) by madrid10.amenworld.com (8.10.2/8.10.2) with ESMTP id hB3FZM308162; Wed, 3 Dec 2003 16:35:22 +0100 Received: from raul@pleyades.net by DervishD.pleyades.net with local (Exim MTA 2.05) id <1ARWSb-0004n1-00>; Wed, 3 Dec 2003 13:50:29 +0100 Date: Wed, 3 Dec 2003 13:50:29 +0100 From: DervishD To: Bart Schaefer Cc: Zsh Users Subject: Re: Advice for filesystem operations under Zsh Message-ID: <20031203125029.GA18206@DervishD> Mail-Followup-To: Bart Schaefer , Zsh Users References: <20031202171109.GW1814@DervishD> <1031203051040.ZM11532@candle.brasslantern.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1031203051040.ZM11532@candle.brasslantern.com> User-Agent: Mutt/1.4i Organization: Pleyades User-Agent: Mutt/1.4i Hi Bart :) * Bart Schaefer dixit: > } The file list is stored in an array parameter, and in order to > } avoid reading from disk, the check is performed reading every 'list > } file' and comparing its contents (lines that are filenames) against > } the entire array > I don't see what disk read you're avoiding by this, but it probably > isn't important. It depends. If I do the opposite, that is, process the array and, for each entry, look if it is in any 'list file', I must read all 'list files' for each element of the array. That's the disk I/O I wanted to avoid. I know, probably all 'list files' are in cache at the time of the second element of the array, but just in case... > } deleting the corresponding entry if found. That > } way, at the end of iterations, the array contains all 'orphan' files. > When to delete an entry will depend on what else you need to do with > each entry, of course. Obviously, if I need to do another checks, the check for orphans will be the last one. > } - Duplicate files in file descriptor 6 > } - Empty directories in file descriptor 7 > } - etc... > I'm not precisely sure what you mean by "duplicate files". Names that > appear in the array more than once? Or files with different names but > identical contents? Or names that appear in more than one of the "list > files"? Or ...? My excuses. By 'duplicate files' I mean files with the same filename and same contents. But being practical, and since I'm interested in files 'copied-instead-of-moved', just the filecheck will be enought. Moreover, since dupes won't be deleted or moved, but just be present on a list, checking for the contents will be left to the person who run the function. Not a problem, so to say. > } Under Zsh is pretty easy to find all dangling symlinks > } (**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in > } just one travel through the filesystem, since glob qualifiers work > } too with filenames withouth globbing characters. > I'm not following the order of operations here, probably because I have > no idea what "FSlint" is or how it works. It is, more or less, to the filesystem what 'lint' is for C code. It is supposed to report weird things happening on your filesystem. By weird things I refer to things that are 'sintactically correct' (that is, fsck won't catch them because the fs is not damaged), but 'semantically correct' (empty dirs, symlinks outside the current filesystems, things like those). > Do you want to create the array of file names by scanning the file > system, or do you already have the names and now you want to learn > things about those specific files? I must scan the filesystem. Once scanned, all pathnames are stored in an array. I don't do really an 'scan', I just perform recursive globbing, ignoring some directories I don't want to look at. This may be slower than using 'find' for the same, but really I prefer it because is not as slow as one could expect, and find doesn't let me use shell code on every hit. I mean, you can do it, running zsh and a script for every hit, but doing it all in a shell function is more convenient for me. > } My problems are: > } - finding dupes. I've tried to use 'I' subscript flag, but this > } only return all matching keys in an associative array, not in normal > } ones. The only solution seems to be deleting the match and search > } again... > This sounds like you want to find duplicate entries in the array. > Something like this: Thanks a lot. I must adapt it to strip the 'basedir' of each name prior to searching, and to output all the dupes of a given filename. Thanks for such good starting point :)) > } - finding empty directories. > We went over this once before, did we not? We did, and you helped me a lot, but I didn't find any solution. More below. > However, the best approach > in this instance may depend on whether you're scanning the filesystem > anyway, or whether you are testing names obtained some other way. Of course. I have the list, probably unsorted, in an array I build using 'recursive globbing'. Since the '.' and '..' entries are not present in that array for any directory, then an empty dir will appear only once in the array but a non-empty one will appear as many times as files and subdirs it has. If the array is sorted (I can do it while building it, with the 'O' modifier, if I remember well), the only thing I need to do is (correct me here, please), take each element of the array, test if it is a directory, and compare with the next. If there is a match (that is, if the directory name is present on both names, kind of strncmp but in zsh), then it is non-empty. Obviously this must be corrected so empty subdirs can be detected as well. Do you have a better solution? > } - doing all that in one run of the array. Since the 'orphans' > } check destroys the contents of the array, I need to dupe it, or > } convert it to a associative array > Or just check for orphans last of anything, so that you don't need > the elements for other tests any more? That was my idea, but I was thinking about two kind of orphans: those that I find now, and files presents in the 'list files' but not in the filesystem. Missing files that should be installed. And for that I need to destroy the array, too. Anyway, I can do it without an array, so is not an issue now. The best solution is doing the 'orphans' in the last phase. > } I cannot find a solution for empty files. > Empty files wasn't on the list above, but isn't it **/*(L0)? Sorry, a typo, sometimes my fingers act on themselves... I meant 'empty directories'. Thanks for the answer and for the suggestions :) Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/