* Fastest way to count # of files?
@ 2016-09-08 15:36 Sebastian Gniazdowski
2016-09-08 16:08 ` Bart Schaefer
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Gniazdowski @ 2016-09-08 15:36 UTC (permalink / raw)
To: Zsh Users
Hello,
I'm trying to protect from slow NFS hierarchies and large directory
tries. This is fast:
% typeset -F SECONDS; myst=$SECONDS; arr=( *rarestring(NY1) ); echo
$(( (SECONDS - myst) * 1000 ))
80.286000000342028
this is slow:
% typeset -F SECONDS; myst=$SECONDS; arr=( * ); echo ${#arr}; echo $((
(SECONDS - myst) * 1000 ))
1154.1240000005928
First code obviously has to read every file in directory. But it's
doing this fast. The second code reads the whole directory too, but
it's slow. First code doesn't provide way to determine # of files
read. Is there anything between these two? Something that doesn't
store files, but counts them?
Best regards,
Sebastian Gniazdowski
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fastest way to count # of files?
2016-09-08 15:36 Fastest way to count # of files? Sebastian Gniazdowski
@ 2016-09-08 16:08 ` Bart Schaefer
2016-09-08 16:37 ` Sebastian Gniazdowski
0 siblings, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2016-09-08 16:08 UTC (permalink / raw)
To: Zsh Users
On Sep 8, 5:36pm, Sebastian Gniazdowski wrote:
}
} [...] The second code reads the whole directory too, but
} it's slow. First code doesn't provide way to determine # of files
} read. Is there anything between these two? Something that doesn't
} store files, but counts them?
Try this:
integer nfiles=0
: **/*(Ne?'((++nfiles)) && reply=()'?)
print $nfiles
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fastest way to count # of files?
2016-09-08 16:08 ` Bart Schaefer
@ 2016-09-08 16:37 ` Sebastian Gniazdowski
2016-09-08 20:00 ` Bart Schaefer
0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Gniazdowski @ 2016-09-08 16:37 UTC (permalink / raw)
To: Bart Schaefer; +Cc: Zsh Users
First ran was slower than this and following ones:
% typeset -F SECONDS; myst=$SECONDS; integer nfiles=0; :
**/*(Ne?'((++nfiles)) && reply=()'?); print $nfiles; echo $(( (SECONDS
- myst) * 1000 ))
80001
1427.3609999982
I.e. 1.4 second
Best regards,
Sebastian Gniazdowski
On 8 September 2016 at 18:08, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 8, 5:36pm, Sebastian Gniazdowski wrote:
> }
> } [...] The second code reads the whole directory too, but
> } it's slow. First code doesn't provide way to determine # of files
> } read. Is there anything between these two? Something that doesn't
> } store files, but counts them?
>
> Try this:
>
> integer nfiles=0
> : **/*(Ne?'((++nfiles)) && reply=()'?)
> print $nfiles
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fastest way to count # of files?
2016-09-08 16:37 ` Sebastian Gniazdowski
@ 2016-09-08 20:00 ` Bart Schaefer
2016-09-09 1:09 ` Bart Schaefer
[not found] ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
0 siblings, 2 replies; 7+ messages in thread
From: Bart Schaefer @ 2016-09-08 20:00 UTC (permalink / raw)
To: Zsh Users
On Thu, Sep 8, 2016 at 9:37 AM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> First ran was slower than this and following ones:
>
> % typeset -F SECONDS; myst=$SECONDS; integer nfiles=0; :
> **/*(Ne?'((++nfiles)) && reply=()'?); print $nfiles; echo $(( (SECONDS
> - myst) * 1000 ))
> 80001
> 1427.3609999982
>
> I.e. 1.4 second
It may be using more time than your original example because of the
**/ recursion.
The only other option I can think of is this:
integer nfiles=0
typeset -a nlink
zmodload zsh/stat
: **/*(/Ne?'zstat -A nlink +nlink $REPLY && ((nfiles += $nlink - 2));
reply=()'?)
print $nfiles
This assumes that nlink for a directory is the number of files it
contains plus "." and "..". However, the result I get is off by a
little from the glob without directory filtering.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fastest way to count # of files?
2016-09-08 20:00 ` Bart Schaefer
@ 2016-09-09 1:09 ` Bart Schaefer
[not found] ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
1 sibling, 0 replies; 7+ messages in thread
From: Bart Schaefer @ 2016-09-09 1:09 UTC (permalink / raw)
To: Zsh Users
On Sep 8, 1:00pm, Bart Schaefer wrote:
}
} This assumes that nlink for a directory is the number of files it
} contains plus "." and "..".
Accuracy of this seems to vary by filesystem type, so probably not
a viable solution.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fastest way to count # of files?
[not found] ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
@ 2016-09-09 5:36 ` Daniel Shahaf
2016-09-09 16:31 ` Bart Schaefer
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Shahaf @ 2016-09-09 5:36 UTC (permalink / raw)
To: Zsh Users
Bart Schaefer wrote on Thu, Sep 08, 2016 at 18:09:58 -0700:
> On Sep 8, 1:00pm, Bart Schaefer wrote:
> }
> } This assumes that nlink for a directory is the number of files it
> } contains plus "." and "..".
>
> Accuracy of this seems to vary by filesystem type, so probably not
> a viable solution.
nlink for a directory $d is usually the two plus number of subdirectories
$d has; it's unrelated to the number of files in $d. (A directory is
hard linked from its parent, from itself, and from each subdirectory of
itself, but not from files in it.)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fastest way to count # of files?
2016-09-09 5:36 ` Daniel Shahaf
@ 2016-09-09 16:31 ` Bart Schaefer
0 siblings, 0 replies; 7+ messages in thread
From: Bart Schaefer @ 2016-09-09 16:31 UTC (permalink / raw)
To: Zsh Users
On Sep 9, 5:36am, Daniel Shahaf wrote:
}
} nlink for a directory $d is usually the two plus number of subdirectories
Yes, this is definitely true on traditional filesystems (which I admit
I'd forgotten) and Linux ext*, but some testing on MacOS for example
(which is the only thing I tried before posting) shows counts in nlink
that include all the files, not just the subdirectories.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-09-09 16:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08 15:36 Fastest way to count # of files? Sebastian Gniazdowski
2016-09-08 16:08 ` Bart Schaefer
2016-09-08 16:37 ` Sebastian Gniazdowski
2016-09-08 20:00 ` Bart Schaefer
2016-09-09 1:09 ` Bart Schaefer
[not found] ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
2016-09-09 5:36 ` Daniel Shahaf
2016-09-09 16:31 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).