zsh-users
 help / color / mirror / code / Atom feed
* listing sub-drectories with most files in
@ 2011-09-02 15:24 zzapper
  2011-09-02 17:03 ` Bart Schaefer
  2011-09-03 12:02 ` Thor Andreassen
  0 siblings, 2 replies; 11+ messages in thread
From: zzapper @ 2011-09-02 15:24 UTC (permalink / raw)
  To: zsh-users

Hi,
I'm grepping a tree (grep string **/*) and want to list the subdirectories 
which have the most files. This is because the grep is taking ages and I'm   
hoping I can exclude some of these.

I guess an easy job for zsh?

-- 
zzapper
http://zzapper.co.uk/ Technical Tips


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-02 15:24 listing sub-drectories with most files in zzapper
@ 2011-09-02 17:03 ` Bart Schaefer
  2011-09-02 18:30   ` zzapper
  2011-09-03 12:02 ` Thor Andreassen
  1 sibling, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2011-09-02 17:03 UTC (permalink / raw)
  To: zsh-users

On Sep 2,  3:24pm, zzapper wrote:
}
} I'm grepping a tree (grep string **/*) and want to list the
} subdirectories which have the most files. This is because the grep is
} taking ages and I'm hoping I can exclude some of these.

This will give you the directories and the count of files in each,
in ascending order by number of files:

print **/*(/ne{'reply=($REPLY/*(N.)); reply=($#reply\:$REPLY)'})

I'll leave it up to you to decide how you want to make use of that
information.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-02 17:03 ` Bart Schaefer
@ 2011-09-02 18:30   ` zzapper
  2011-09-02 18:59     ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: zzapper @ 2011-09-02 18:30 UTC (permalink / raw)
  To: zsh-users

Bart Schaefer wrote in news:110902100313.ZM8455@torch.brasslantern.com:



> 
> This will give you the directories and the count of files in each,
> in ascending order by number of files:
> 
> print **/*(/ne{'reply=($REPLY/*(N.)); reply=($#reply\:$REPLY)'})


Bart: This will be super useful thanks.

Minor quibble:  on my system (Cygwin) it all jumbled onto one line I had to 
pipe it into Vim. Also in reverse order but that's not a problem.



-- 
zzapper
http://zzapper.co.uk/ Technical Tips


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-02 18:30   ` zzapper
@ 2011-09-02 18:59     ` Bart Schaefer
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2011-09-02 18:59 UTC (permalink / raw)
  To: zsh-users

On Sep 2,  6:30pm, zzapper wrote:
} Subject: Re: listing sub-drectories with most files in
}
} Bart Schaefer wrote in news:110902100313.ZM8455@torch.brasslantern.com:
} > 
} > This will give you the directories and the count of files in each,
} > in ascending order by number of files:
} > 
} > print **/*(/ne{'reply=($REPLY/*(N.)); reply=($#reply\:$REPLY)'})

I should probably note that the above triggers some pathologically-bad
globbing behavior in older versions of zsh e.g. 4.2.x for some values
of x.  It may hang your shell for a VERY long time.

} Minor quibble:  on my system (Cygwin) it all jumbled onto one line

Well, yes.  To put one directory on each line:

print -l **/*(/ne{'reply=($REPLY/*(N.)); reply=($#reply\:$REPLY)'})

Or you can assign to a variable

dircounts=( **/*(/ne{'reply=($REPLY/*(N.)); reply=($#reply\:$REPLY)'}) )

and then do as you will.

} Also in reverse order but that's not a problem.

You mean you got the directories with the largest number of files
first in the list?  I'm not sure why that would happen.  In my zsh
build tree (separate from the source tree) for example I get:

1:Completion
1:Config
1:Etc
1:Functions
2:Src/Aliases
8:Test
9:Doc
18:Src/Builtins
81:Src/Modules
109:Src/Zle
147:Src

If you want the big ones first, change "/n" to "/nOn" in the flags.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-02 15:24 listing sub-drectories with most files in zzapper
  2011-09-02 17:03 ` Bart Schaefer
@ 2011-09-03 12:02 ` Thor Andreassen
  2011-09-03 15:13   ` Bart Schaefer
  2011-09-10 17:53   ` Aaron Davies
  1 sibling, 2 replies; 11+ messages in thread
From: Thor Andreassen @ 2011-09-03 12:02 UTC (permalink / raw)
  To: zsh-users

On Fri, Sep 02, 2011 at 03:24:14PM +0000, zzapper wrote:
> Hi,
> I'm grepping a tree (grep string **/*) and want to list the subdirectories 
> which have the most files. This is because the grep is taking ages and I'm   
> hoping I can exclude some of these.
> 
> I guess an easy job for zsh?

Alternative way:

find *(/) | cut -d/ -f1 | uniq -c | sort -n

-- 
best regards
Thor Andreassen


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-03 12:02 ` Thor Andreassen
@ 2011-09-03 15:13   ` Bart Schaefer
  2011-09-03 19:23     ` Thor Andreassen
  2011-09-10 17:53   ` Aaron Davies
  1 sibling, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2011-09-03 15:13 UTC (permalink / raw)
  To: zsh-users

On Sep 3,  2:02pm, Thor Andreassen wrote:
}
} find *(/) | cut -d/ -f1 | uniq -c | sort -n

That'll tell you how many files are in the entire tree below each local
directory, but not how many files are in each subdirectory in the tree.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-03 15:13   ` Bart Schaefer
@ 2011-09-03 19:23     ` Thor Andreassen
  2011-09-03 21:59       ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Thor Andreassen @ 2011-09-03 19:23 UTC (permalink / raw)
  To: zsh-users

On Sat, Sep 03, 2011 at 08:13:20AM -0700, Bart Schaefer wrote:
> On Sep 3,  2:02pm, Thor Andreassen wrote:
> }
> } find *(/) | cut -d/ -f1 | uniq -c | sort -n
> 
> That'll tell you how many files are in the entire tree below each local
> directory, but not how many files are in each subdirectory in the tree.
 
Right, I didn't read the question well enough. Adding -maxdepth 1 and
-type f to find should limit the result correctly:

find *(/) -maxdepth 1 -type f | cut -d/ -f1 | uniq -c | sort -n

-- 
best regards
Thor Andreassen


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-03 19:23     ` Thor Andreassen
@ 2011-09-03 21:59       ` Bart Schaefer
  2011-09-04  0:34         ` Thor Andreassen
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2011-09-03 21:59 UTC (permalink / raw)
  To: zsh-users

On Sep 3,  9:23pm, Thor Andreassen wrote:
}
} Adding -maxdepth 1 and -type f to find should limit the result
} correctly:
} 
} find *(/) -maxdepth 1 -type f | cut -d/ -f1 | uniq -c | sort -n

Unfortunately that's still not quite right.  Because you've lost the
path leading up to the subdirectory name, if two subtrees each contain
a directory with an identical name, you'll either get two counts with
no way to distinguish them, or a single count that is the sum of the
number of files in both of those subdirectories.

Also because find prints in directory scan order, you have to be careful
or you'll get a few files and then a subdirectory and then a few more
files and you'll still end up with multiple counts for the same directory.

You can do it this way:

find *(/) -type f -exec dirname {} \; | sort | uniq -c | sort -n

but that seems like an awful lot of work.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-03 21:59       ` Bart Schaefer
@ 2011-09-04  0:34         ` Thor Andreassen
  0 siblings, 0 replies; 11+ messages in thread
From: Thor Andreassen @ 2011-09-04  0:34 UTC (permalink / raw)
  To: zsh-users

On Sat, Sep 03, 2011 at 02:59:15PM -0700, Bart Schaefer wrote:
> On Sep 3,  9:23pm, Thor Andreassen wrote:
> }
> } Adding -maxdepth 1 and -type f to find should limit the result
> } correctly:
> } 
> } find *(/) -maxdepth 1 -type f | cut -d/ -f1 | uniq -c | sort -n
> 
> Unfortunately that's still not quite right.  Because you've lost the
> path leading up to the subdirectory name, if two subtrees each contain
> a directory with an identical name, you'll either get two counts with
> no way to distinguish them, or a single count that is the sum of the
> number of files in both of those subdirectories.

My brain is working slower than usual, sorry about the confusion.

After re-rereading the OP question, and properly testing your solution I
now get it.
 
> Also because find prints in directory scan order, you have to be careful
> or you'll get a few files and then a subdirectory and then a few more
> files and you'll still end up with multiple counts for the same directory.
> 
> You can do it this way:
> 
> find *(/) -type f -exec dirname {} \; | sort | uniq -c | sort -n
> 
> but that seems like an awful lot of work.

Agreed, a slightly improved version, but still a lot of work:

find . -type d | while read dir; do 
  find $dir -maxdepth 1 -type f | wc -l | tr -d '\n'; print ":$dir"
done | sort -n

But nowhere near as efficient or elegant as the suggested zsh solution,
sorry about the noise and thank you for your patience :).

-- 
best regards
Thor Andreassen


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-03 12:02 ` Thor Andreassen
  2011-09-03 15:13   ` Bart Schaefer
@ 2011-09-10 17:53   ` Aaron Davies
  2011-09-10 19:37     ` Bart Schaefer
  1 sibling, 1 reply; 11+ messages in thread
From: Aaron Davies @ 2011-09-10 17:53 UTC (permalink / raw)
  To: Thor Andreassen; +Cc: zsh-users

On Sep 3, 2011, at 8:02 AM, Thor Andreassen wrote:

> On Fri, Sep 02, 2011 at 03:24:14PM +0000, zzapper wrote:
>> 
> 
>> I'm grepping a tree (grep string **/*) and want to list the subdirectories 
>> which have the most files. This is because the grep is taking ages and I'm   
>> hoping I can exclude some of these.
> 
> find *(/) | cut -d/ -f1 | uniq -c | sort -n

or

(for d (**/*(/)) echo `ls $d|wc -l` $d)|sort -n
-- 
Aaron Davies
aaron.davies@gmail.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: listing sub-drectories with most files in
  2011-09-10 17:53   ` Aaron Davies
@ 2011-09-10 19:37     ` Bart Schaefer
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2011-09-10 19:37 UTC (permalink / raw)
  To: zsh-users

On Sep 10,  1:53pm, Aaron Davies wrote:
}
} (for d (**/*(/)) echo `ls $d|wc -l` $d)|sort -n

These are all good tries, but `ls $d` will include subdirectory names
as well as file names; the original request was for a count of files.

Further you have to worry whether there's an alias for "ls" that may
change it from listing one file per line.

If you use `print -l $d/*(N.)` you fix both of these problems, but
you're still forking `wc -l`.  Try ${#$(print -l $d/*(.))} ?

Replace ( ... ) with { ... } to avoid unnecessary subshells:

{ for d (**/*(/)) print ${#$(print -l $d/*(.))} $d } | sort -n

On my desktop that's 10x faster than using `ls|wc` but still 2x slower
than my all-globbing solution.  OTOH the all-globbing solution might
take 5x as long to type. :-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-09-10 19:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-02 15:24 listing sub-drectories with most files in zzapper
2011-09-02 17:03 ` Bart Schaefer
2011-09-02 18:30   ` zzapper
2011-09-02 18:59     ` Bart Schaefer
2011-09-03 12:02 ` Thor Andreassen
2011-09-03 15:13   ` Bart Schaefer
2011-09-03 19:23     ` Thor Andreassen
2011-09-03 21:59       ` Bart Schaefer
2011-09-04  0:34         ` Thor Andreassen
2011-09-10 17:53   ` Aaron Davies
2011-09-10 19:37     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).