On Fri, Nov 10, 2023 at 3:51 AM Roman Perepelitsa <
roman.perepelitsa@gmail.com> wrote:

> On Fri, Nov 10, 2023 at 12:17 AM Jim <linux.tech.guy@gmail.com> wrote:
>
-clip-

> > ...
> > for E (${(@)AllFileNames}) {
> > [[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]}
> }  # line that fails
> > ...
>
> -clip-

>
> Associative arrays in zsh are finicky when it comes to the content of
> their keys. The problem you are experiencing can be distilled to this:
>
>     % typeset -A dict
>     % key='('
>     % [[ -v dict[$key] ]]
>     zsh: invalid subscript
>
> -clip-

> Roman.
>
> P.S.
>
> From the description of your problem I would think that you want file
> hashes as keys. Something like this:
>
>     # usage: detect-dup-files [file]..
>     function detect-dup-files() {
>       emulate -L zsh
>       (( ARGC )) || return 0
>       local -A seen
>       local i files fname hash orig
>       files=( $(shasum -ba 256 -- "$@") ) || return
>       (( 2 * ARGC == $#files )) || return
>       for i in {1..$ARGC}; do
>         fname=$argv[i]
>         hash=${files[2*i-1]#\\}
>         if [[ -n ${orig::=$seen[$hash]} ]]; then
>           print -r -- "${(q+)fname} is a dup of ${(q+)orig}"
>         else
>           seen[$hash]=$fname
>         fi
>       done
>     }
>
> This code has an added advantage of forking only once. It also handles
> file names with backslashes and linefeeds in them.
>

Was only expecting at best an answer to the one line, so thanks for the
function.
After 58 years of working on computers and IT this old dog is open to new
code.
I usually learn something from every example I get my hands on, and this
function was no exception.  So a big thanks. Parts of it will be used in
the future.

Since everyone was working with limited information about what I was doing,
there are some issues. The files I'm working on are in excess of 96K, and
most
utilities, including shasum, report the input line is too long. So a few
changes
are needed. Even with "groups" of files, shasum takes over two and half
hours
to do 96K.  So I implemented gdbm to store the results. So even when I
hit the "key: problem", I could skip all files that were already hashed.

And you were right, I was working on hashes as keys but in a different way.
Used the following to create a second associative array(also ztied).
Well over two hours to do. I thought it would be a lot faster, oh well.

for V ("${(@u)FileNameCkSum}") \
  CkSumFileNames[$V]=${(pj:\0:)${(o)${(@k)FileNameCkSum[(Re)$V]}}}

Each key's(hash) value has all the files with the same hash separated by
NULLs.
The rest of the code uses this second associative array.

Again thanks, and best regards,

Jim