From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id f3a200ab for ; Mon, 8 Apr 2019 17:15:21 +0000 (UTC) Received: (qmail 16225 invoked by alias); 8 Apr 2019 17:15:08 -0000 Mailing-List: contact zsh-users-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Users List List-Post: List-Help: List-Unsubscribe: X-Seq: 23914 Received: (qmail 13551 invoked by uid 1010); 8 Apr 2019 17:15:08 -0000 X-Qmail-Scanner-Diagnostics: from mail-it1-f172.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.101.1/25412. spamassassin: 3.4.2. Clear:RC:0(209.85.166.172):SA:0(-1.9/5.0):. Processed in 2.213503 secs); 08 Apr 2019 17:15:08 -0000 X-Envelope-From: schaefer@brasslantern.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at _netblocks.google.com designates 209.85.166.172 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brasslantern-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=17IBHh97xcuMAFQIlXXc/OJESIkqgxy4YA8dtJeDfC0=; b=xseAiaoVntD7V2wyh8RerWCWeMmE9p1UnWYyw5TM3BdHsjjOmQCT8QZPe0hqt4lnE3 NSQomPJQs859C5t6unfGtcjB0twlS8IhaUGB7TcssK0I06hGJ/eK4jwKy+S5EbPbcttb KJtwIjvtrkKXmFVaj6tVpsQxchVZvCC957g2uZw+5HTXBe6YnmbjNQHZnpT9BzU4mux7 /LCXp/RLC6Io+PXbhsZaRyimCeFQ2Cwow/c0d+lUOoHGTt7rZwvGgdDxrBPtrmhDzfIG GuwGYMywQ67qlBv9ZF6p5QVROpKMKVYl+Rl9C9RM7fecXKzifRjAKhw3uDlQIZi0oCiM kqKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=17IBHh97xcuMAFQIlXXc/OJESIkqgxy4YA8dtJeDfC0=; b=pgaqibTC5zf/SsGBLtFCsmNeDirqw7U/J65RdKVC9776M7ynyQaVUYff7rMH7sC5Ny 63FH8Pbq+hiw3EBu8qSrsIyBBkJZrc8BwLLL4EOBsFrkmYDa9nvZtmIk/pIXaH8UVUoQ X1KBaT9GYtchcrtxUCGKOOiXJqsaGcCrLYN2Cenugtg5tJ6d+qOTwwAjYuklcYJ4VNzX 2O1s3TMaV4Fm5neNUAC+8Ha/2uXF2WkbKSrlGFPnpLnCbxSRS0EG6YT7Pw/ZrluN1RQ6 RtEmLKar+Yap960k6RgDcTRfIfi3m2R9bZtJmx5kcSOTMpR1OBRP51S884REzSsErbjH poYw== X-Gm-Message-State: APjAAAVSWRo/L2PLObzoclLM+xVM1E48E+/urMxpxi9EO91/EBXkCtNg k0jCHNgrRPExt1Ub51Yy0SG/vvnfmH0N6mzIsCg9ng== X-Google-Smtp-Source: APXvYqxbfHvpTKIJ/jZHxXvldznplpJ+cmxaNnuU0tLLo4HaMVF+thRpXTOUdqvahOanVyy1WrkymDULfCz3+5sp7yI= X-Received: by 2002:a02:bb81:: with SMTP id g1mr22319108jan.49.1554743673213; Mon, 08 Apr 2019 10:14:33 -0700 (PDT) MIME-Version: 1.0 References: <86v9zrbsic.fsf@zoho.eu> <20190406130242.GA29292@trot> <86tvfb9ore.fsf@zoho.eu> In-Reply-To: From: Bart Schaefer Date: Mon, 8 Apr 2019 10:14:20 -0700 Message-ID: Subject: Re: find duplicate files To: Charles Blake Cc: Zsh Users Content-Type: text/plain; charset="UTF-8" On Mon, Apr 8, 2019 at 4:18 AM Charles Blake wrote: > > >I find that a LOT more understandable than the python code. > > Understandability is, of course, somewhat subjective (e.g. some might say > every 15th field is unclear relative to a named label) Yes, lack of multi-dimensional data structures is a limitation on the shell implementation. I could have done it this way: names=( **/*(.l+0) ) zstat -tA stats $names sizes=( ${(M)stats:#size *} ) I chose the other way so the name and size would be directly connected in the stats array rather than rely on implicit ordering (to one of your later points, bad things happen with the above if a file is removed between generating the list of names and collecting the file stats). > >unless you're NOT going to consider linked files as duplicates you > >might as well just compare sizes. (It would be faster to get inodes > > It may have been underappreciated is that handling hard-link identity also > lets you skip recomputing hashes over a hard link cluster Yes, this could be used to reduce the number of names passed to "cksum" or the equivalent. > Almost everything you say needs a "probably/maybe" > qualifier. I don't think you disagree. I'm just elaborating a little > for passers by. Absolutely. The flip side of this is that shells and utilities are generally optimized for the average case, not for the extremes.