zsh-users
 help / color / mirror / code / Atom feed
From: Jim <linux.tech.guy@gmail.com>
To: Roman Perepelitsa <roman.perepelitsa@gmail.com>
Cc: zsh <zsh-users@zsh.org>
Subject: Re: special characters in file names issue
Date: Sat, 11 Nov 2023 12:26:21 -0600	[thread overview]
Message-ID: <CA+rB6GJ1bWD_H3JPfiHN0P5+rC3tYCfOgBub8Y-8zzYs20eTng@mail.gmail.com> (raw)
In-Reply-To: <CAN=4vMq6bKP5fh3yu-o1ROmoV=QFsFMnwNPP9wbgX4NgZbfdAQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2794 bytes --]

On Fri, Nov 10, 2023 at 3:51 AM Roman Perepelitsa <
roman.perepelitsa@gmail.com> wrote:

> On Fri, Nov 10, 2023 at 12:17 AM Jim <linux.tech.guy@gmail.com> wrote:
>
-clip-

> > ...
> > for E (${(@)AllFileNames}) {
> > [[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]}
> }  # line that fails
> > ...
>
> -clip-

>
> Associative arrays in zsh are finicky when it comes to the content of
> their keys. The problem you are experiencing can be distilled to this:
>
>     % typeset -A dict
>     % key='('
>     % [[ -v dict[$key] ]]
>     zsh: invalid subscript
>
> -clip-

> Roman.
>
> P.S.
>
> From the description of your problem I would think that you want file
> hashes as keys. Something like this:
>
>     # usage: detect-dup-files [file]..
>     function detect-dup-files() {
>       emulate -L zsh
>       (( ARGC )) || return 0
>       local -A seen
>       local i files fname hash orig
>       files=( $(shasum -ba 256 -- "$@") ) || return
>       (( 2 * ARGC == $#files )) || return
>       for i in {1..$ARGC}; do
>         fname=$argv[i]
>         hash=${files[2*i-1]#\\}
>         if [[ -n ${orig::=$seen[$hash]} ]]; then
>           print -r -- "${(q+)fname} is a dup of ${(q+)orig}"
>         else
>           seen[$hash]=$fname
>         fi
>       done
>     }
>
> This code has an added advantage of forking only once. It also handles
> file names with backslashes and linefeeds in them.
>

Was only expecting at best an answer to the one line, so thanks for the
function.
After 58 years of working on computers and IT this old dog is open to new
code.
I usually learn something from every example I get my hands on, and this
function was no exception.  So a big thanks. Parts of it will be used in
the future.

Since everyone was working with limited information about what I was doing,
there are some issues. The files I'm working on are in excess of 96K, and
most
utilities, including shasum, report the input line is too long. So a few
changes
are needed. Even with "groups" of files, shasum takes over two and half
hours
to do 96K.  So I implemented gdbm to store the results. So even when I
hit the "key: problem", I could skip all files that were already hashed.

And you were right, I was working on hashes as keys but in a different way.
Used the following to create a second associative array(also ztied).
Well over two hours to do. I thought it would be a lot faster, oh well.

for V ("${(@u)FileNameCkSum}") \
  CkSumFileNames[$V]=${(pj:\0:)${(o)${(@k)FileNameCkSum[(Re)$V]}}}

Each key's(hash) value has all the files with the same hash separated by
NULLs.
The rest of the code uses this second associative array.

Again thanks, and best regards,

Jim

[-- Attachment #2: Type: text/html, Size: 4093 bytes --]

  parent reply	other threads:[~2023-11-11 18:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09 23:16 Jim
2023-11-10  5:04 ` Mikael Magnusson
2023-11-10  9:50 ` Roman Perepelitsa
2023-11-10 14:17   ` Mikael Magnusson
2023-11-10 14:28     ` Roman Perepelitsa
2023-11-11 18:26     ` Jim
2023-11-10 16:33   ` Lawrence Velázquez
2023-11-10 17:02     ` Bart Schaefer
2023-11-10 20:37     ` Roman Perepelitsa
2023-11-11  0:13       ` Bart Schaefer
2023-11-11 17:18         ` Ray Andrews
2023-11-11 18:19           ` Bart Schaefer
2023-11-11 18:52             ` Ray Andrews
2023-11-11 18:26   ` Jim [this message]
2023-11-12  0:08     ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+rB6GJ1bWD_H3JPfiHN0P5+rC3tYCfOgBub8Y-8zzYs20eTng@mail.gmail.com \
    --to=linux.tech.guy@gmail.com \
    --cc=linuxtechguy@gmail.com \
    --cc=roman.perepelitsa@gmail.com \
    --cc=zsh-users@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).