* How do I find shortest match?
@ 2012-05-16 19:18 TJ Luoma
2012-05-17 19:05 ` Peter Stephenson
0 siblings, 1 reply; 4+ messages in thread
From: TJ Luoma @ 2012-05-16 19:18 UTC (permalink / raw)
To: Zsh-Users List
I have a folder which has a lot of txt files, and in that folder
are a lot of duplicate files. Most of the duplicates are
numbered like this:
10-6- Make a universal 10-6-7 Snow Leopard installer-1.txt
10-6- Make a universal 10-6-7 Snow Leopard installer-2.txt
10-6- Make a universal 10-6-7 Snow Leopard installer-3.txt
10-6- Make a universal 10-6-7 Snow Leopard installer-4.txt
10-6- Make a universal 10-6-7 Snow Leopard installer.txt
Not not all of them. For example, I might have another identical
file named
todo-make-snowleopardinstaller.txt
What I want to do is go through the entire folder and find all
duplicate files (files with identical md5sum).
Then I want to keep ONLY the one with the shortest filename.
Here's what I have so far
#!/bin/zsh
DIR=/Users/luomat/Dropbox/txt/
# to avoid 'arg list too long'
# note that 'gmd5sum' prints the sum
# and then two spaces, and then the filename
ALL=$(find $DIR -type f -print0 | xargs -0 gmd5sum)
# these are all the MD5 sums which occur MORE than one time
# (which we get by removing any results with only one result
SUMS=($(echo $ALL | awk '{print $1}' | sort | uniq -c |\
egrep -v '^ 1 ' | awk '{print $2}'))
for SUM in $SUMS
do
# for each unique MD5 sum, do this:
# get a list of all of the matching filenames MINUS the
# sum itself
MATCHES=($(echo "$ALL" | egrep "^$SUM" | sed "s${SUM} ##g"))
# ???
done
I don't know what to do in the ??? to compare the filenames and
choose the shortest one.
Any ideas?
Or is there a better way to do this?
Thanks
TjL
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How do I find shortest match?
2012-05-16 19:18 How do I find shortest match? TJ Luoma
@ 2012-05-17 19:05 ` Peter Stephenson
2012-05-17 19:11 ` TJ Luoma
0 siblings, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2012-05-17 19:05 UTC (permalink / raw)
To: Zsh-Users List
On Wed, 16 May 2012 15:18:31 -0400
TJ Luoma <luomat@gmail.com> wrote:
> I don't know what to do in the ??? to compare the filenames and
> choose the shortest one.
sz() { REPLY=${(l.4..0.)${#REPLY}} }
print -l ${^MATCHES}(o+sz[1])
Note you'll need the MATCHES to be the actual path to the file since
glob qualifiers only work on real files.
There's a really nasty trick that doesn't require them to correspond to files, which you'll hate. I think you can combine it into a single expression, but it's bad enough in two.
local min
min=(${${(o)${MATCHES//?/\?}}[1]})
print -l ${MATCHES:#^${~min}}
I'm sure it's completely obvious what this is doing. It's been posted before.
--
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How do I find shortest match?
2012-05-17 19:05 ` Peter Stephenson
@ 2012-05-17 19:11 ` TJ Luoma
2012-05-17 20:04 ` Peter Stephenson
0 siblings, 1 reply; 4+ messages in thread
From: TJ Luoma @ 2012-05-17 19:11 UTC (permalink / raw)
To: Peter Stephenson; +Cc: Zsh-Users List
On Thursday, May 17, 2012 at 3:05 PM, Peter Stephenson wrote:
> There's a really nasty trick that doesn't require them to correspond to files, which you'll hate. I think you can combine it into a single expression, but it's bad enough in two.
>
> local min
> min=(${${(o)${MATCHES//?/\?}}[1]})
> print -l ${MATCHES:#^${~min}}
>
> I'm sure it's completely obvious what this is doing. It's been posted before.
I have the memory and attention span of a tsetse fly, so I'll take your word for it, although I myself have no idea what this is doing. I've long accepted the fact that zsh is smarter than I am, and only hope to make the best use of it whenever possible :-)
TjL
ps - Thanks for the reply and solutions.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: How do I find shortest match?
2012-05-17 19:11 ` TJ Luoma
@ 2012-05-17 20:04 ` Peter Stephenson
0 siblings, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2012-05-17 20:04 UTC (permalink / raw)
To: Zsh-Users List
On Thu, 17 May 2012 15:11:09 -0400
TJ Luoma <luomat@gmail.com> wrote:
> > local min
> > min=(${${(o)${MATCHES//?/\?}}[1]})
> > print -l ${MATCHES:#^${~min}}
> >
> > I'm sure it's completely obvious what this is doing. It's been
> posted before.
> I have the memory and attention span of a tsetse fly, so I'll take
> your word for it,
I think it was some years ago now.
> although I myself have no idea what this is doing.
(I should have pointed out it needs EXTENDED_GLOB for the "^".)
First it replaces all the characters in all the elements of $MATCHES
with a question mark, then orders them in the standard collating order
with (o). As all the characters are the same, the shortest string comes
first and [1] picks it. So $min is a single element array containing as
many ?'s as there characters in the shortest string.
The next match "simply" eliminates from MATCHES (':#') anything that
doesn't match ('^') the pattern ${~min}, where the ~ turns the ?'s into
active pattern characters. So this matches any element of MATCHES which
has as many characters as the shortest string. In fact, that could be
more than one, so you really need
${${MATCHES:#^${~min}}[1]}
to select the first.
--
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-05-17 21:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-16 19:18 How do I find shortest match? TJ Luoma
2012-05-17 19:05 ` Peter Stephenson
2012-05-17 19:11 ` TJ Luoma
2012-05-17 20:04 ` Peter Stephenson
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).