zsh-users
 help / color / mirror / code / Atom feed
* Fastest way to count # of files?
@ 2016-09-08 15:36 Sebastian Gniazdowski
  2016-09-08 16:08 ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Gniazdowski @ 2016-09-08 15:36 UTC (permalink / raw)
  To: Zsh Users

Hello,
I'm trying to protect from slow NFS hierarchies and large directory
tries. This is fast:

% typeset -F SECONDS; myst=$SECONDS; arr=( *rarestring(NY1) ); echo
$(( (SECONDS - myst) * 1000 ))
80.286000000342028


this is slow:

% typeset -F SECONDS; myst=$SECONDS; arr=( * ); echo ${#arr}; echo $((
(SECONDS - myst) * 1000 ))
1154.1240000005928


First code obviously has to read every file in directory. But it's
doing this fast. The second code reads the whole directory too, but
it's slow. First code doesn't provide way to determine # of files
read. Is there anything between these two? Something that doesn't
store files, but counts them?

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fastest way to count # of files?
  2016-09-08 15:36 Fastest way to count # of files? Sebastian Gniazdowski
@ 2016-09-08 16:08 ` Bart Schaefer
  2016-09-08 16:37   ` Sebastian Gniazdowski
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2016-09-08 16:08 UTC (permalink / raw)
  To: Zsh Users

On Sep 8,  5:36pm, Sebastian Gniazdowski wrote:
}
} [...]  The second code reads the whole directory too, but
} it's slow. First code doesn't provide way to determine # of files
} read. Is there anything between these two? Something that doesn't
} store files, but counts them?

Try this:

  integer nfiles=0
  : **/*(Ne?'((++nfiles)) && reply=()'?)
  print $nfiles


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fastest way to count # of files?
  2016-09-08 16:08 ` Bart Schaefer
@ 2016-09-08 16:37   ` Sebastian Gniazdowski
  2016-09-08 20:00     ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Gniazdowski @ 2016-09-08 16:37 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh Users

First ran was slower than this and following ones:

% typeset -F SECONDS; myst=$SECONDS; integer nfiles=0; :
**/*(Ne?'((++nfiles)) && reply=()'?); print $nfiles; echo $(( (SECONDS
- myst) * 1000 ))
80001
1427.3609999982

I.e. 1.4 second

Best regards,
Sebastian Gniazdowski


On 8 September 2016 at 18:08, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 8,  5:36pm, Sebastian Gniazdowski wrote:
> }
> } [...]  The second code reads the whole directory too, but
> } it's slow. First code doesn't provide way to determine # of files
> } read. Is there anything between these two? Something that doesn't
> } store files, but counts them?
>
> Try this:
>
>   integer nfiles=0
>   : **/*(Ne?'((++nfiles)) && reply=()'?)
>   print $nfiles


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fastest way to count # of files?
  2016-09-08 16:37   ` Sebastian Gniazdowski
@ 2016-09-08 20:00     ` Bart Schaefer
  2016-09-09  1:09       ` Bart Schaefer
       [not found]       ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Bart Schaefer @ 2016-09-08 20:00 UTC (permalink / raw)
  To: Zsh Users

On Thu, Sep 8, 2016 at 9:37 AM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> First ran was slower than this and following ones:
>
> % typeset -F SECONDS; myst=$SECONDS; integer nfiles=0; :
> **/*(Ne?'((++nfiles)) && reply=()'?); print $nfiles; echo $(( (SECONDS
> - myst) * 1000 ))
> 80001
> 1427.3609999982
>
> I.e. 1.4 second

It may be using more time than your original example because of the
**/ recursion.

The only other option I can think of is this:

integer nfiles=0
typeset -a nlink
zmodload zsh/stat
: **/*(/Ne?'zstat -A nlink +nlink $REPLY && ((nfiles += $nlink - 2));
reply=()'?)
print $nfiles

This assumes that nlink for a directory is the number of files it
contains plus "." and "..".  However, the result I get is off by a
little from the glob without directory filtering.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fastest way to count # of files?
  2016-09-08 20:00     ` Bart Schaefer
@ 2016-09-09  1:09       ` Bart Schaefer
       [not found]       ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Bart Schaefer @ 2016-09-09  1:09 UTC (permalink / raw)
  To: Zsh Users

On Sep 8,  1:00pm, Bart Schaefer wrote:
}
} This assumes that nlink for a directory is the number of files it
} contains plus "." and "..".

Accuracy of this seems to vary by filesystem type, so probably not
a viable solution.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fastest way to count # of files?
       [not found]       ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
@ 2016-09-09  5:36         ` Daniel Shahaf
  2016-09-09 16:31           ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Shahaf @ 2016-09-09  5:36 UTC (permalink / raw)
  To: Zsh Users

Bart Schaefer wrote on Thu, Sep 08, 2016 at 18:09:58 -0700:
> On Sep 8,  1:00pm, Bart Schaefer wrote:
> }
> } This assumes that nlink for a directory is the number of files it
> } contains plus "." and "..".
> 
> Accuracy of this seems to vary by filesystem type, so probably not
> a viable solution.

nlink for a directory $d is usually the two plus number of subdirectories
$d has; it's unrelated to the number of files in $d.  (A directory is
hard linked from its parent, from itself, and from each subdirectory of
itself, but not from files in it.)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fastest way to count # of files?
  2016-09-09  5:36         ` Daniel Shahaf
@ 2016-09-09 16:31           ` Bart Schaefer
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Schaefer @ 2016-09-09 16:31 UTC (permalink / raw)
  To: Zsh Users

On Sep 9,  5:36am, Daniel Shahaf wrote:
}
} nlink for a directory $d is usually the two plus number of subdirectories

Yes, this is definitely true on traditional filesystems (which I admit
I'd forgotten) and Linux ext*, but some testing on MacOS for example
(which is the only thing I tried before posting) shows counts in nlink
that include all the files, not just the subdirectories.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-09-09 16:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08 15:36 Fastest way to count # of files? Sebastian Gniazdowski
2016-09-08 16:08 ` Bart Schaefer
2016-09-08 16:37   ` Sebastian Gniazdowski
2016-09-08 20:00     ` Bart Schaefer
2016-09-09  1:09       ` Bart Schaefer
     [not found]       ` <160908180958.ZM14692__14483.1264777624$1473385001$gmane$org@torch.brasslantern.com>
2016-09-09  5:36         ` Daniel Shahaf
2016-09-09 16:31           ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).