* Advice for filesystem operations under Zsh @ 2003-12-02 17:11 DervishD 2003-12-03 5:10 ` Bart Schaefer 0 siblings, 1 reply; 3+ messages in thread From: DervishD @ 2003-12-02 17:11 UTC (permalink / raw) To: Zsh Users Hello all :)) I have a little shell function that, for each file in certain places of the filesystem, checks whether the file belongs to an installed program or not, using some list files. I must tell how it is implemented in order to ask for advice, so please be patient O:) The file list is stored in an array parameter, and in order to avoid reading from disk, the check is performed reading every 'list file' and comparing its contents (lines that are filenames) against the entire array, deleting the corresponding entry if found. That way, at the end of iterations, the array contains all 'orphan' files. This works ok for me, but I want to extend this shell function to perform other tasks. One of them is, given a filename, find the 'list file' it belongs to. That's pretty easy ;)) But I want to extend the shell function so that, in one run, it outputs: - Orphan files in file descriptor 3 - Dangling symlinks in file descriptor 4 - Setuid binaries in file descriptor 5 - Duplicate files in file descriptor 6 - Empty directories in file descriptor 7 - etc... Under Zsh is pretty easy to find all dangling symlinks (**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in just one travel through the filesystem, since glob qualifiers work too with filenames withouth globbing characters. My problems are: - finding dupes. I've tried to use 'I' subscript flag, but this only return all matching keys in an associative array, not in normal ones. The only solution seems to be deleting the match and search again... - finding empty directories. Looking for the number of links in the directory doesn't work (it only shows if the directory has files or not, but not if it has subdirs). - doing all that in one run of the array. Since the 'orphans' check destroys the contents of the array, I need to dupe it, or convert it to a associative array, but then it cannot have duplicate entries, since I would need to use the filename as a key and the value as one of dangling, orphan, empty, setuid, etc... or even a combination of that. The objetive is doing a lightweight version of FSlint without using 'find'. The solutions to some of the problems are easy: use a 'master' array and duplicates as needed, for example for finding orphans, duplicated filenames, etc... And I cannot find a solution for empty files. Sorry for the long message, and thanks a lot in advance, truly. Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/ ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Advice for filesystem operations under Zsh 2003-12-02 17:11 Advice for filesystem operations under Zsh DervishD @ 2003-12-03 5:10 ` Bart Schaefer 2003-12-03 12:50 ` DervishD 0 siblings, 1 reply; 3+ messages in thread From: Bart Schaefer @ 2003-12-03 5:10 UTC (permalink / raw) To: Zsh Users On Dec 2, 6:11pm, DervishD wrote: } } Hello all :)) Hello, Raúl ... } The file list is stored in an array parameter, and in order to } avoid reading from disk, the check is performed reading every 'list } file' and comparing its contents (lines that are filenames) against } the entire array I don't see what disk read you're avoiding by this, but it probably isn't important. } deleting the corresponding entry if found. That } way, at the end of iterations, the array contains all 'orphan' files. When to delete an entry will depend on what else you need to do with each entry, of course. } [...] I want to extend the } shell function so that, in one run, it outputs: } } - Orphan files in file descriptor 3 } - Dangling symlinks in file descriptor 4 } - Setuid binaries in file descriptor 5 } - Duplicate files in file descriptor 6 } - Empty directories in file descriptor 7 } - etc... I'm not precisely sure what you mean by "duplicate files". Names that appear in the array more than once? Or files with different names but identical contents? Or names that appear in more than one of the "list files"? Or ...? } Under Zsh is pretty easy to find all dangling symlinks } (**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in } just one travel through the filesystem, since glob qualifiers work } too with filenames withouth globbing characters. I'm not following the order of operations here, probably because I have no idea what "FSlint" is or how it works. Do you want to create the array of file names by scanning the file system, or do you already have the names and now you want to learn things about those specific files? } My problems are: } } - finding dupes. I've tried to use 'I' subscript flag, but this } only return all matching keys in an associative array, not in normal } ones. The only solution seems to be deleting the match and search } again... This sounds like you want to find duplicate entries in the array. Something like this: for element in $array do if [[ "${${(@M)array:#$element}}" = $element ]] then print $element is unique else print $element is a duplicate fi done } - finding empty directories. We went over this once before, did we not? However, the best approach in this instance may depend on whether you're scanning the filesystem anyway, or whether you are testing names obtained some other way. } - doing all that in one run of the array. Since the 'orphans' } check destroys the contents of the array, I need to dupe it, or } convert it to a associative array Or just check for orphans last of anything, so that you don't need the elements for other tests any more? } I cannot find a solution for empty files. Empty files wasn't on the list above, but isn't it **/*(L0)? ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Advice for filesystem operations under Zsh 2003-12-03 5:10 ` Bart Schaefer @ 2003-12-03 12:50 ` DervishD 0 siblings, 0 replies; 3+ messages in thread From: DervishD @ 2003-12-03 12:50 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh Users Hi Bart :) * Bart Schaefer <schaefer@brasslantern.com> dixit: > } The file list is stored in an array parameter, and in order to > } avoid reading from disk, the check is performed reading every 'list > } file' and comparing its contents (lines that are filenames) against > } the entire array > I don't see what disk read you're avoiding by this, but it probably > isn't important. It depends. If I do the opposite, that is, process the array and, for each entry, look if it is in any 'list file', I must read all 'list files' for each element of the array. That's the disk I/O I wanted to avoid. I know, probably all 'list files' are in cache at the time of the second element of the array, but just in case... > } deleting the corresponding entry if found. That > } way, at the end of iterations, the array contains all 'orphan' files. > When to delete an entry will depend on what else you need to do with > each entry, of course. Obviously, if I need to do another checks, the check for orphans will be the last one. > } - Duplicate files in file descriptor 6 > } - Empty directories in file descriptor 7 > } - etc... > I'm not precisely sure what you mean by "duplicate files". Names that > appear in the array more than once? Or files with different names but > identical contents? Or names that appear in more than one of the "list > files"? Or ...? My excuses. By 'duplicate files' I mean files with the same filename and same contents. But being practical, and since I'm interested in files 'copied-instead-of-moved', just the filecheck will be enought. Moreover, since dupes won't be deleted or moved, but just be present on a list, checking for the contents will be left to the person who run the function. Not a problem, so to say. > } Under Zsh is pretty easy to find all dangling symlinks > } (**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in > } just one travel through the filesystem, since glob qualifiers work > } too with filenames withouth globbing characters. > I'm not following the order of operations here, probably because I have > no idea what "FSlint" is or how it works. It is, more or less, to the filesystem what 'lint' is for C code. It is supposed to report weird things happening on your filesystem. By weird things I refer to things that are 'sintactically correct' (that is, fsck won't catch them because the fs is not damaged), but 'semantically correct' (empty dirs, symlinks outside the current filesystems, things like those). > Do you want to create the array of file names by scanning the file > system, or do you already have the names and now you want to learn > things about those specific files? I must scan the filesystem. Once scanned, all pathnames are stored in an array. I don't do really an 'scan', I just perform recursive globbing, ignoring some directories I don't want to look at. This may be slower than using 'find' for the same, but really I prefer it because is not as slow as one could expect, and find doesn't let me use shell code on every hit. I mean, you can do it, running zsh and a script for every hit, but doing it all in a shell function is more convenient for me. > } My problems are: > } - finding dupes. I've tried to use 'I' subscript flag, but this > } only return all matching keys in an associative array, not in normal > } ones. The only solution seems to be deleting the match and search > } again... > This sounds like you want to find duplicate entries in the array. > Something like this: Thanks a lot. I must adapt it to strip the 'basedir' of each name prior to searching, and to output all the dupes of a given filename. Thanks for such good starting point :)) > } - finding empty directories. > We went over this once before, did we not? We did, and you helped me a lot, but I didn't find any solution. More below. > However, the best approach > in this instance may depend on whether you're scanning the filesystem > anyway, or whether you are testing names obtained some other way. Of course. I have the list, probably unsorted, in an array I build using 'recursive globbing'. Since the '.' and '..' entries are not present in that array for any directory, then an empty dir will appear only once in the array but a non-empty one will appear as many times as files and subdirs it has. If the array is sorted (I can do it while building it, with the 'O' modifier, if I remember well), the only thing I need to do is (correct me here, please), take each element of the array, test if it is a directory, and compare with the next. If there is a match (that is, if the directory name is present on both names, kind of strncmp but in zsh), then it is non-empty. Obviously this must be corrected so empty subdirs can be detected as well. Do you have a better solution? > } - doing all that in one run of the array. Since the 'orphans' > } check destroys the contents of the array, I need to dupe it, or > } convert it to a associative array > Or just check for orphans last of anything, so that you don't need > the elements for other tests any more? That was my idea, but I was thinking about two kind of orphans: those that I find now, and files presents in the 'list files' but not in the filesystem. Missing files that should be installed. And for that I need to destroy the array, too. Anyway, I can do it without an array, so is not an issue now. The best solution is doing the 'orphans' in the last phase. > } I cannot find a solution for empty files. > Empty files wasn't on the list above, but isn't it **/*(L0)? Sorry, a typo, sometimes my fingers act on themselves... I meant 'empty directories'. Thanks for the answer and for the suggestions :) Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net & http://raul.pleyades.net/ ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-12-03 15:35 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-12-02 17:11 Advice for filesystem operations under Zsh DervishD 2003-12-03 5:10 ` Bart Schaefer 2003-12-03 12:50 ` DervishD
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).