Advice for filesystem operations under Zsh

zsh-users
 help / color / mirror / code / Atom feed

* Advice for filesystem operations under Zsh
@ 2003-12-02 17:11 DervishD
  2003-12-03  5:10 ` Bart Schaefer
  0 siblings, 1 reply; 3+ messages in thread
From: DervishD @ 2003-12-02 17:11 UTC (permalink / raw)
  To: Zsh Users

    Hello all :))

    I have a little shell function that, for each file in certain
places of the filesystem, checks whether the file belongs to an
installed program or not, using some list files. I must tell how it
is implemented in order to ask for advice, so please be patient O:)

    The file list is stored in an array parameter, and in order to
avoid reading from disk, the check is performed reading every 'list
file' and comparing its contents (lines that are filenames) against
the entire array, deleting the corresponding entry if found. That
way, at the end of iterations, the array contains all 'orphan' files.

    This works ok for me, but I want to extend this shell function to
perform other tasks. One of them is, given a filename, find the 'list
file' it belongs to. That's pretty easy ;)) But I want to extend the
shell function so that, in one run, it outputs:

    - Orphan files in file descriptor 3
    - Dangling symlinks in file descriptor 4
    - Setuid binaries in file descriptor 5
    - Duplicate files in file descriptor 6
    - Empty directories in file descriptor 7
    - etc...

    Under Zsh is pretty easy to find all dangling symlinks
(**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in
just one travel through the filesystem, since glob qualifiers work
too with filenames withouth globbing characters. My problems are:

    - finding dupes. I've tried to use 'I' subscript flag, but this
only return all matching keys in an associative array, not in normal
ones. The only solution seems to be deleting the match and search
again...

    - finding empty directories. Looking for the number of links in
the directory doesn't work (it only shows if the directory has files
or not, but not if it has subdirs).

    - doing all that in one run of the array. Since the 'orphans'
check destroys the contents of the array, I need to dupe it, or
convert it to a associative array, but then it cannot have duplicate
entries, since I would need to use the filename as a key and the
value as one of dangling, orphan, empty, setuid, etc... or even a
combination of that.

    The objetive is doing a lightweight version of FSlint without
using 'find'. The solutions to some of the problems are easy: use a
'master' array and duplicates as needed, for example for finding
orphans, duplicated filenames, etc... And I cannot find a solution
for empty files.

    Sorry for the long message, and thanks a lot in advance, truly.

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Advice for filesystem operations under Zsh
  2003-12-02 17:11 Advice for filesystem operations under Zsh DervishD
@ 2003-12-03  5:10 ` Bart Schaefer
  2003-12-03 12:50   ` DervishD
  0 siblings, 1 reply; 3+ messages in thread
From: Bart Schaefer @ 2003-12-03  5:10 UTC (permalink / raw)
  To: Zsh Users

On Dec 2,  6:11pm, DervishD wrote:
}
}     Hello all :))

Hello, Raúl ...

}     The file list is stored in an array parameter, and in order to
} avoid reading from disk, the check is performed reading every 'list
} file' and comparing its contents (lines that are filenames) against
} the entire array

I don't see what disk read you're avoiding by this, but it probably
isn't important.

} deleting the corresponding entry if found. That
} way, at the end of iterations, the array contains all 'orphan' files.

When to delete an entry will depend on what else you need to do with
each entry, of course.

} [...] I want to extend the
} shell function so that, in one run, it outputs:
} 
}     - Orphan files in file descriptor 3
}     - Dangling symlinks in file descriptor 4
}     - Setuid binaries in file descriptor 5
}     - Duplicate files in file descriptor 6
}     - Empty directories in file descriptor 7
}     - etc...

I'm not precisely sure what you mean by "duplicate files".  Names that
appear in the array more than once?  Or files with different names but
identical contents?  Or names that appear in more than one of the "list
files"?  Or ...?

}     Under Zsh is pretty easy to find all dangling symlinks
} (**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in
} just one travel through the filesystem, since glob qualifiers work
} too with filenames withouth globbing characters.

I'm not following the order of operations here, probably because I have
no idea what "FSlint" is or how it works.  Do you want to create the
array of file names by scanning the file system, or do you already have
the names and now you want to learn things about those specific files?

} My problems are:
} 
}     - finding dupes. I've tried to use 'I' subscript flag, but this
} only return all matching keys in an associative array, not in normal
} ones. The only solution seems to be deleting the match and search
} again...

This sounds like you want to find duplicate entries in the array.
Something like this:

    for element in $array
    do
      if [[ "${${(@M)array:#$element}}" = $element ]]
      then
	print $element is unique
      else
	print $element is a duplicate
      fi
    done

}     - finding empty directories.

We went over this once before, did we not?  However, the best approach
in this instance may depend on whether you're scanning the filesystem
anyway, or whether you are testing names obtained some other way.

}     - doing all that in one run of the array. Since the 'orphans'
} check destroys the contents of the array, I need to dupe it, or
} convert it to a associative array

Or just check for orphans last of anything, so that you don't need
the elements for other tests any more?

} I cannot find a solution for empty files.

Empty files wasn't on the list above, but isn't it **/*(L0)?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Advice for filesystem operations under Zsh
  2003-12-03  5:10 ` Bart Schaefer
@ 2003-12-03 12:50   ` DervishD
  0 siblings, 0 replies; 3+ messages in thread
From: DervishD @ 2003-12-03 12:50 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh Users

    Hi Bart :)

 * Bart Schaefer <schaefer@brasslantern.com> dixit:
> }     The file list is stored in an array parameter, and in order to
> } avoid reading from disk, the check is performed reading every 'list
> } file' and comparing its contents (lines that are filenames) against
> } the entire array
> I don't see what disk read you're avoiding by this, but it probably
> isn't important.

    It depends. If I do the opposite, that is, process the array and,
for each entry, look if it is in any 'list file', I must read all
'list files' for each element of the array. That's the disk I/O I
wanted to avoid. I know, probably all 'list files' are in cache at
the time of the second element of the array, but just in case...

> } deleting the corresponding entry if found. That
> } way, at the end of iterations, the array contains all 'orphan' files.
> When to delete an entry will depend on what else you need to do with
> each entry, of course.

    Obviously, if I need to do another checks, the check for orphans
will be the last one.

> }     - Duplicate files in file descriptor 6
> }     - Empty directories in file descriptor 7
> }     - etc...
> I'm not precisely sure what you mean by "duplicate files".  Names that
> appear in the array more than once?  Or files with different names but
> identical contents?  Or names that appear in more than one of the "list
> files"?  Or ...?

    My excuses. By 'duplicate files' I mean files with the same
filename and same contents. But being practical, and since I'm
interested in files 'copied-instead-of-moved', just the filecheck
will be enought. Moreover, since dupes won't be deleted or moved, but
just be present on a list, checking for the contents will be left to
the person who run the function. Not a problem, so to say.

> }     Under Zsh is pretty easy to find all dangling symlinks
> } (**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in
> } just one travel through the filesystem, since glob qualifiers work
> } too with filenames withouth globbing characters.
> I'm not following the order of operations here, probably because I have
> no idea what "FSlint" is or how it works.

    It is, more or less, to the filesystem what 'lint' is for C code.
It is supposed to report weird things happening on your filesystem.
By weird things I refer to things that are 'sintactically correct'
(that is, fsck won't catch them because the fs is not damaged), but
'semantically correct' (empty dirs, symlinks outside the current
filesystems, things like those).

> Do you want to create the array of file names by scanning the file
> system, or do you already have the names and now you want to learn
> things about those specific files?

    I must scan the filesystem. Once scanned, all pathnames are
stored in an array. I don't do really an 'scan', I just perform
recursive globbing, ignoring some directories I don't want to look
at. This may be slower than using 'find' for the same, but really I
prefer it because is not as slow as one could expect, and find
doesn't let me use shell code on every hit. I mean, you can do it,
running zsh and a script for every hit, but doing it all in a shell
function is more convenient for me.
 
> } My problems are:
> }     - finding dupes. I've tried to use 'I' subscript flag, but this
> } only return all matching keys in an associative array, not in normal
> } ones. The only solution seems to be deleting the match and search
> } again...
> This sounds like you want to find duplicate entries in the array.
> Something like this:

    Thanks a lot. I must adapt it to strip the 'basedir' of each name
prior to searching, and to output all the dupes of a given filename.
Thanks for such good starting point :))
 
> }     - finding empty directories.
> We went over this once before, did we not?

    We did, and you helped me a lot, but I didn't find any solution.
More below.

> However, the best approach
> in this instance may depend on whether you're scanning the filesystem
> anyway, or whether you are testing names obtained some other way.

    Of course. I have the list, probably unsorted, in an array I
build using 'recursive globbing'. Since the '.' and '..' entries are
not present in that array for any directory, then an empty dir will
appear only once in the array but a non-empty one will appear as many
times as files and subdirs it has. If the array is sorted (I can do
it while building it, with the 'O' modifier, if I remember well), the
only thing I need to do is (correct me here, please), take each
element of the array, test if it is a directory, and compare with the
next. If there is a match (that is, if the directory name is present
on both names, kind of strncmp but in zsh), then it is non-empty.
Obviously this must be corrected so empty subdirs can be detected as
well. Do you have a better solution?
    
> }     - doing all that in one run of the array. Since the 'orphans'
> } check destroys the contents of the array, I need to dupe it, or
> } convert it to a associative array
> Or just check for orphans last of anything, so that you don't need
> the elements for other tests any more?

    That was my idea, but I was thinking about two kind of orphans:
those that I find now, and files presents in the 'list files' but not
in the filesystem. Missing files that should be installed. And for
that I need to destroy the array, too. Anyway, I can do it without an
array, so is not an issue now. The best solution is doing the
'orphans' in the last phase.
 
> } I cannot find a solution for empty files.
> Empty files wasn't on the list above, but isn't it **/*(L0)?

    Sorry, a typo, sometimes my fingers act on themselves... I meant
'empty directories'.

    Thanks for the answer and for the suggestions :)

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-12-03 15:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-02 17:11 Advice for filesystem operations under Zsh DervishD
2003-12-03  5:10 ` Bart Schaefer
2003-12-03 12:50   ` DervishD

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).