zsh-users
 help / color / mirror / code / Atom feed
* How do I find shortest match?
@ 2012-05-16 19:18 TJ Luoma
  2012-05-17 19:05 ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: TJ Luoma @ 2012-05-16 19:18 UTC (permalink / raw)
  To: Zsh-Users List

I have a folder which has a lot of txt files, and in that folder 
are a lot of duplicate files. Most of the duplicates are 
numbered like this:

10-6- Make a universal 10-6-7 Snow Leopard installer-1.txt
10-6- Make a universal 10-6-7 Snow Leopard installer-2.txt
10-6- Make a universal 10-6-7 Snow Leopard installer-3.txt
10-6- Make a universal 10-6-7 Snow Leopard installer-4.txt
10-6- Make a universal 10-6-7 Snow Leopard installer.txt

Not not all of them. For example, I might have another identical 
file named

     todo-make-snowleopardinstaller.txt

What I want to do is go through the entire folder and find all 
duplicate files (files with identical md5sum).

Then I want to keep ONLY the one with the shortest filename.

Here's what I have so far

#!/bin/zsh

DIR=/Users/luomat/Dropbox/txt/

     # to avoid 'arg list too long'
     # note that 'gmd5sum' prints the sum
     # and then two spaces, and then the filename
ALL=$(find $DIR -type f -print0 | xargs -0 gmd5sum)

     # these are all the MD5 sums which occur MORE than one time
     # (which we get by removing any results with only one result
SUMS=($(echo $ALL | awk '{print $1}' | sort  | uniq -c |\
         egrep -v '^   1 ' | awk '{print $2}'))



for SUM in $SUMS
do

     # for each unique MD5 sum, do this:


     # get a list of all of the matching filenames MINUS the
     # sum itself
     MATCHES=($(echo "$ALL" | egrep "^$SUM" | sed "s${SUM}  ##g"))

     # ???

done


I don't know what to do in the ??? to compare the filenames and 
choose the shortest one.

Any ideas?

Or is there a better way to do this?

Thanks

TjL




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How do I find shortest match?
  2012-05-16 19:18 How do I find shortest match? TJ Luoma
@ 2012-05-17 19:05 ` Peter Stephenson
  2012-05-17 19:11   ` TJ Luoma
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2012-05-17 19:05 UTC (permalink / raw)
  To: Zsh-Users List

On Wed, 16 May 2012 15:18:31 -0400
TJ Luoma <luomat@gmail.com> wrote:
> I don't know what to do in the ??? to compare the filenames and 
> choose the shortest one.

  sz() { REPLY=${(l.4..0.)${#REPLY}} }
  print -l ${^MATCHES}(o+sz[1])

Note you'll need the MATCHES to be the actual path to the file since
glob qualifiers only work on real files.

There's a really nasty trick that doesn't require them to correspond to files, which you'll hate.  I think you can combine it into a single expression, but it's bad enough in two.

  local min
  min=(${${(o)${MATCHES//?/\?}}[1]})
  print -l ${MATCHES:#^${~min}}

I'm sure it's completely obvious what this is doing.  It's been posted before.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How do I find shortest match?
  2012-05-17 19:05 ` Peter Stephenson
@ 2012-05-17 19:11   ` TJ Luoma
  2012-05-17 20:04     ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: TJ Luoma @ 2012-05-17 19:11 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh-Users List



On Thursday, May 17, 2012 at 3:05 PM, Peter Stephenson wrote:

> There's a really nasty trick that doesn't require them to correspond to files, which you'll hate. I think you can combine it into a single expression, but it's bad enough in two.
> 
> local min
> min=(${${(o)${MATCHES//?/\?}}[1]})
> print -l ${MATCHES:#^${~min}}
> 
> I'm sure it's completely obvious what this is doing. It's been posted before. 
I have the memory and attention span of a tsetse fly, so I'll take your word for it, although I myself have no idea what this is doing. I've long accepted the fact that zsh is smarter than I am, and only hope to make the best use of it whenever possible :-) 

TjL

ps - Thanks for the reply and solutions. 






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How do I find shortest match?
  2012-05-17 19:11   ` TJ Luoma
@ 2012-05-17 20:04     ` Peter Stephenson
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2012-05-17 20:04 UTC (permalink / raw)
  To: Zsh-Users List

On Thu, 17 May 2012 15:11:09 -0400
TJ Luoma <luomat@gmail.com> wrote:
> > local min
> > min=(${${(o)${MATCHES//?/\?}}[1]})
> > print -l ${MATCHES:#^${~min}}
> > 
> > I'm sure it's completely obvious what this is doing. It's been
> posted before.
> I have the memory and attention span of a tsetse fly, so I'll take
> your word for it,

I think it was some years ago now.

> although I myself have no idea what this is doing.

(I should have pointed out it needs EXTENDED_GLOB for the "^".)

First it replaces all the characters in all the elements of $MATCHES
with a question mark, then orders them in the standard collating order
with (o).  As all the characters are the same, the shortest string comes
first and [1] picks it.  So $min is a single element array containing as
many ?'s as there characters in the shortest string.

The next match "simply" eliminates from MATCHES (':#') anything that
doesn't match ('^') the pattern ${~min}, where the ~ turns the ?'s into
active pattern characters.  So this matches any element of MATCHES which
has as many characters as the shortest string.  In fact, that could be
more than one, so you really need

${${MATCHES:#^${~min}}[1]}

to select the first.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-17 21:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-16 19:18 How do I find shortest match? TJ Luoma
2012-05-17 19:05 ` Peter Stephenson
2012-05-17 19:11   ` TJ Luoma
2012-05-17 20:04     ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).