From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-users-return-6831-mason-zsh=primenet.com.au@sunsite.dk>
Received: (qmail 17506 invoked from network); 3 Dec 2003 15:35:55 -0000
Received: from sunsite.dk (130.225.247.90)
  by ns1.primenet.com.au with SMTP; 3 Dec 2003 15:35:55 -0000
Received: (qmail 360 invoked by alias); 3 Dec 2003 15:35:33 -0000
Mailing-List: contact zsh-users-help@sunsite.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
X-Seq: 6831
Received: (qmail 346 invoked from network); 3 Dec 2003 15:35:32 -0000
Received: from localhost (HELO sunsite.dk) (127.0.0.1)
  by localhost with SMTP; 3 Dec 2003 15:35:32 -0000
X-MessageWall-Score: 0 (sunsite.dk)
Received: from [62.193.203.32] by sunsite.dk (MessageWall 1.0.8) with SMTP; 3 Dec 2003 15:35:32 -0000
Received: from DervishD.pleyades.net (212.Red-80-35-44.pooles.rima-tde.net [80.35.44.212])
	by madrid10.amenworld.com (8.10.2/8.10.2) with ESMTP id hB3FZM308162;
	Wed, 3 Dec 2003 16:35:22 +0100
Received: from raul@pleyades.net by DervishD.pleyades.net with local (Exim MTA 2.05)
	  id <1ARWSb-0004n1-00>; Wed, 3 Dec 2003 13:50:29 +0100
Date: Wed, 3 Dec 2003 13:50:29 +0100
From: DervishD <raul@pleyades.net>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: Zsh Users <zsh-users@sunsite.dk>
Subject: Re: Advice for filesystem operations under Zsh
Message-ID: <20031203125029.GA18206@DervishD>
Mail-Followup-To: Bart Schaefer <schaefer@brasslantern.com>,
	Zsh Users <zsh-users@sunsite.dk>
References: <20031202171109.GW1814@DervishD> <1031203051040.ZM11532@candle.brasslantern.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1031203051040.ZM11532@candle.brasslantern.com>
User-Agent: Mutt/1.4i
Organization: Pleyades
User-Agent: Mutt/1.4i <http://www.mutt.org>

    Hi Bart :)

 * Bart Schaefer <schaefer@brasslantern.com> dixit:
> }     The file list is stored in an array parameter, and in order to
> } avoid reading from disk, the check is performed reading every 'list
> } file' and comparing its contents (lines that are filenames) against
> } the entire array
> I don't see what disk read you're avoiding by this, but it probably
> isn't important.

    It depends. If I do the opposite, that is, process the array and,
for each entry, look if it is in any 'list file', I must read all
'list files' for each element of the array. That's the disk I/O I
wanted to avoid. I know, probably all 'list files' are in cache at
the time of the second element of the array, but just in case...

> } deleting the corresponding entry if found. That
> } way, at the end of iterations, the array contains all 'orphan' files.
> When to delete an entry will depend on what else you need to do with
> each entry, of course.

    Obviously, if I need to do another checks, the check for orphans
will be the last one.

> }     - Duplicate files in file descriptor 6
> }     - Empty directories in file descriptor 7
> }     - etc...
> I'm not precisely sure what you mean by "duplicate files".  Names that
> appear in the array more than once?  Or files with different names but
> identical contents?  Or names that appear in more than one of the "list
> files"?  Or ...?

    My excuses. By 'duplicate files' I mean files with the same
filename and same contents. But being practical, and since I'm
interested in files 'copied-instead-of-moved', just the filecheck
will be enought. Moreover, since dupes won't be deleted or moved, but
just be present on a list, checking for the contents will be left to
the person who run the function. Not a problem, so to say.

> }     Under Zsh is pretty easy to find all dangling symlinks
> } (**/*(@-)), setuid files (**/*(s)), etc... and I can do all that in
> } just one travel through the filesystem, since glob qualifiers work
> } too with filenames withouth globbing characters.
> I'm not following the order of operations here, probably because I have
> no idea what "FSlint" is or how it works.

    It is, more or less, to the filesystem what 'lint' is for C code.
It is supposed to report weird things happening on your filesystem.
By weird things I refer to things that are 'sintactically correct'
(that is, fsck won't catch them because the fs is not damaged), but
'semantically correct' (empty dirs, symlinks outside the current
filesystems, things like those).

> Do you want to create the array of file names by scanning the file
> system, or do you already have the names and now you want to learn
> things about those specific files?

    I must scan the filesystem. Once scanned, all pathnames are
stored in an array. I don't do really an 'scan', I just perform
recursive globbing, ignoring some directories I don't want to look
at. This may be slower than using 'find' for the same, but really I
prefer it because is not as slow as one could expect, and find
doesn't let me use shell code on every hit. I mean, you can do it,
running zsh and a script for every hit, but doing it all in a shell
function is more convenient for me.
 
> } My problems are:
> }     - finding dupes. I've tried to use 'I' subscript flag, but this
> } only return all matching keys in an associative array, not in normal
> } ones. The only solution seems to be deleting the match and search
> } again...
> This sounds like you want to find duplicate entries in the array.
> Something like this:

    Thanks a lot. I must adapt it to strip the 'basedir' of each name
prior to searching, and to output all the dupes of a given filename.
Thanks for such good starting point :))
 
> }     - finding empty directories.
> We went over this once before, did we not?

    We did, and you helped me a lot, but I didn't find any solution.
More below.

> However, the best approach
> in this instance may depend on whether you're scanning the filesystem
> anyway, or whether you are testing names obtained some other way.

    Of course. I have the list, probably unsorted, in an array I
build using 'recursive globbing'. Since the '.' and '..' entries are
not present in that array for any directory, then an empty dir will
appear only once in the array but a non-empty one will appear as many
times as files and subdirs it has. If the array is sorted (I can do
it while building it, with the 'O' modifier, if I remember well), the
only thing I need to do is (correct me here, please), take each
element of the array, test if it is a directory, and compare with the
next. If there is a match (that is, if the directory name is present
on both names, kind of strncmp but in zsh), then it is non-empty.
Obviously this must be corrected so empty subdirs can be detected as
well. Do you have a better solution?
    
> }     - doing all that in one run of the array. Since the 'orphans'
> } check destroys the contents of the array, I need to dupe it, or
> } convert it to a associative array
> Or just check for orphans last of anything, so that you don't need
> the elements for other tests any more?

    That was my idea, but I was thinking about two kind of orphans:
those that I find now, and files presents in the 'list files' but not
in the filesystem. Missing files that should be installed. And for
that I need to destroy the array, too. Anyway, I can do it without an
array, so is not an issue now. The best solution is doing the
'orphans' in the last phase.
 
> } I cannot find a solution for empty files.
> Empty files wasn't on the list above, but isn't it **/*(L0)?

    Sorry, a typo, sometimes my fingers act on themselves... I meant
'empty directories'.

    Thanks for the answer and for the suggestions :)

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net & http://raul.pleyades.net/