zsh-users
 help / color / mirror / code / Atom feed
* special characters in file names issue
@ 2023-11-09 23:16 Jim
  2023-11-10  5:04 ` Mikael Magnusson
  2023-11-10  9:50 ` Roman Perepelitsa
  0 siblings, 2 replies; 15+ messages in thread
From: Jim @ 2023-11-09 23:16 UTC (permalink / raw)
  To: zsh

[-- Attachment #1: Type: text/plain, Size: 744 bytes --]

Hi everyone,

Using scripts, looking to cleanup duplicate files even if named differently.
The issue I ran into is when a file path contains parentheses. '(' or ')'

Example File Name:  Wallpapers/Web_downloads/05 (1).jpg

The following is part of an anonymous function:

local E
local -a AllFileNames
local -A FileNameCkSum
...
for E (${(@)AllFileNames}) {
[[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]} }
# line that fails
...

AllFileName contains the result of a glob statement.

Error Message:  (anon):<line no>: invalid subscript

I'm sure this is a quoting issue, but everything I've tried so far has
failed.

If someone could point me to documentation or examples it would be
appreciated.

Regards,

Jim Murphy

[-- Attachment #2: Type: text/html, Size: 1144 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-09 23:16 special characters in file names issue Jim
@ 2023-11-10  5:04 ` Mikael Magnusson
  2023-11-10  9:50 ` Roman Perepelitsa
  1 sibling, 0 replies; 15+ messages in thread
From: Mikael Magnusson @ 2023-11-10  5:04 UTC (permalink / raw)
  To: linuxtechguy; +Cc: zsh

On 11/10/23, Jim <linux.tech.guy@gmail.com> wrote:
> Hi everyone,
>
> Using scripts, looking to cleanup duplicate files even if named
> differently.
> The issue I ran into is when a file path contains parentheses. '(' or ')'
>
> Example File Name:  Wallpapers/Web_downloads/05 (1).jpg
>
> The following is part of an anonymous function:
>
> local E
> local -a AllFileNames
> local -A FileNameCkSum
> ...
> for E (${(@)AllFileNames}) {
> [[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]} }
> # line that fails
> ...
>
> AllFileName contains the result of a glob statement.
>
> Error Message:  (anon):<line no>: invalid subscript
>
> I'm sure this is a quoting issue, but everything I've tried so far has
> failed.
>
> If someone could point me to documentation or examples it would be
> appreciated.

The code you posted works fine (although it would be appreciated if
you posted actually runnable minimal test cases, rather than
excerpts).

PS gmail flagged your mail as spam, possibly because your From: and
Reply-to: headers contain different addresses.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-09 23:16 special characters in file names issue Jim
  2023-11-10  5:04 ` Mikael Magnusson
@ 2023-11-10  9:50 ` Roman Perepelitsa
  2023-11-10 14:17   ` Mikael Magnusson
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Roman Perepelitsa @ 2023-11-10  9:50 UTC (permalink / raw)
  To: linuxtechguy; +Cc: zsh

On Fri, Nov 10, 2023 at 12:17 AM Jim <linux.tech.guy@gmail.com> wrote:
>
> Hi everyone,
>
> Using scripts, looking to cleanup duplicate files even if named differently.
> The issue I ran into is when a file path contains parentheses. '(' or ')'
>
> Example File Name:  Wallpapers/Web_downloads/05 (1).jpg
>
> The following is part of an anonymous function:
>
> local E
> local -a AllFileNames
> local -A FileNameCkSum
> ...
> for E (${(@)AllFileNames}) {
> [[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]} }  # line that fails
> ...
>
> AllFileName contains the result of a glob statement.
>
> Error Message:  (anon):<line no>: invalid subscript

Associative arrays in zsh are finicky when it comes to the content of
their keys. The problem you are experiencing can be distilled to this:

    % typeset -A dict
    % key='('
    % [[ -v dict[$key] ]]
    zsh: invalid subscript

There is no simple quoting that you can apply to $key here: (q), (b),
etc. are all wrong. You could perhaps escape a specific list of
characters ('(', '[', '{' but not '$' or '*') although my memory tells
me that some keys cannot be made to work under `[[ -v ...]]` or
`unset` no matter how you try to escape them. I could be wrong though.

I usually apply one of two workarounds: use hash($x) instead of $x as
a key, or replace the associative array with two plain arrays, one for
keys and another for values. The latter results in O(N) lookup though.

Roman.

P.S.

From the description of your problem I would think that you want file
hashes as keys. Something like this:

    # usage: detect-dup-files [file]..
    function detect-dup-files() {
      emulate -L zsh
      (( ARGC )) || return 0
      local -A seen
      local i files fname hash orig
      files=( $(shasum -ba 256 -- "$@") ) || return
      (( 2 * ARGC == $#files )) || return
      for i in {1..$ARGC}; do
        fname=$argv[i]
        hash=${files[2*i-1]#\\}
        if [[ -n ${orig::=$seen[$hash]} ]]; then
          print -r -- "${(q+)fname} is a dup of ${(q+)orig}"
        else
          seen[$hash]=$fname
        fi
      done
    }

This code has an added advantage of forking only once. It also handles
file names with backslashes and linefeeds in them.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10  9:50 ` Roman Perepelitsa
@ 2023-11-10 14:17   ` Mikael Magnusson
  2023-11-10 14:28     ` Roman Perepelitsa
  2023-11-11 18:26     ` Jim
  2023-11-10 16:33   ` Lawrence Velázquez
  2023-11-11 18:26   ` Jim
  2 siblings, 2 replies; 15+ messages in thread
From: Mikael Magnusson @ 2023-11-10 14:17 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: linuxtechguy, zsh

On 11/10/23, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> On Fri, Nov 10, 2023 at 12:17 AM Jim <linux.tech.guy@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> Using scripts, looking to cleanup duplicate files even if named
>> differently.
>> The issue I ran into is when a file path contains parentheses. '(' or ')'
>>
>> Example File Name:  Wallpapers/Web_downloads/05 (1).jpg
>>
>> The following is part of an anonymous function:
>>
>> local E
>> local -a AllFileNames
>> local -A FileNameCkSum
>> ...
>> for E (${(@)AllFileNames}) {
>> [[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]} }
>> # line that fails
>> ...
>>
>> AllFileName contains the result of a glob statement.
>>
>> Error Message:  (anon):<line no>: invalid subscript
>
> Associative arrays in zsh are finicky when it comes to the content of
> their keys. The problem you are experiencing can be distilled to this:
>
>     % typeset -A dict
>     % key='('
>     % [[ -v dict[$key] ]]
>     zsh: invalid subscript
>
> There is no simple quoting that you can apply to $key here: (q), (b),
> etc. are all wrong. You could perhaps escape a specific list of
> characters ('(', '[', '{' but not '$' or '*') although my memory tells
> me that some keys cannot be made to work under `[[ -v ...]]` or
> `unset` no matter how you try to escape them. I could be wrong though.

Not sure why I didn't get this error when testing yesterday, but in
this case you can also avoid it by using the more typical ((
$+dict[$key] )) test instead of [[ -v ...]].

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10 14:17   ` Mikael Magnusson
@ 2023-11-10 14:28     ` Roman Perepelitsa
  2023-11-11 18:26     ` Jim
  1 sibling, 0 replies; 15+ messages in thread
From: Roman Perepelitsa @ 2023-11-10 14:28 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: linuxtechguy, zsh

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

On Fri, 10 Nov 2023 at 15:17, Mikael Magnusson <mikachu@gmail.com> wrote:

> On 11/10/23, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
> > On Fri, Nov 10, 2023 at 12:17 AM Jim <linux.tech.guy@gmail.com> wrote:
> >>
> >> Hi everyone,
> >>
> >> Using scripts, looking to cleanup duplicate files even if named
> >> differently.
> >> The issue I ran into is when a file path contains parentheses. '(' or
> ')'
> >>
> >> Example File Name:  Wallpapers/Web_downloads/05 (1).jpg
> >>
> >> The following is part of an anonymous function:
> >>
> >> local E
> >> local -a AllFileNames
> >> local -A FileNameCkSum
> >> ...
> >> for E (${(@)AllFileNames}) {
> >> [[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]}
> }
> >> # line that fails
> >> ...
> >>
> >> AllFileName contains the result of a glob statement.
> >>
> >> Error Message:  (anon):<line no>: invalid subscript
> >
> > Associative arrays in zsh are finicky when it comes to the content of
> > their keys. The problem you are experiencing can be distilled to this:
> >
> >     % typeset -A dict
> >     % key='('
> >     % [[ -v dict[$key] ]]
> >     zsh: invalid subscript
> >
> > There is no simple quoting that you can apply to $key here: (q), (b),
> > etc. are all wrong. You could perhaps escape a specific list of
> > characters ('(', '[', '{' but not '$' or '*') although my memory tells
> > me that some keys cannot be made to work under `[[ -v ...]]` or
> > `unset` no matter how you try to escape them. I could be wrong though.
>
> Not sure why I didn't get this error when testing yesterday, but in
> this case you can also avoid it by using the more typical ((
> $+dict[$key] )) test instead of [[ -v ...]].


Indeed. The two capricious constructs I know of that dislike weird keys are `[[
-v ... ]]` and `unset ...`.

Roman.

[-- Attachment #2: Type: text/html, Size: 3262 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10  9:50 ` Roman Perepelitsa
  2023-11-10 14:17   ` Mikael Magnusson
@ 2023-11-10 16:33   ` Lawrence Velázquez
  2023-11-10 17:02     ` Bart Schaefer
  2023-11-10 20:37     ` Roman Perepelitsa
  2023-11-11 18:26   ` Jim
  2 siblings, 2 replies; 15+ messages in thread
From: Lawrence Velázquez @ 2023-11-10 16:33 UTC (permalink / raw)
  To: Roman Perepelitsa, linuxtechguy; +Cc: zsh-users

On Fri, Nov 10, 2023, at 4:50 AM, Roman Perepelitsa wrote:
> Associative arrays in zsh are finicky when it comes to the content of
> their keys. The problem you are experiencing can be distilled to this:
>
>     % typeset -A dict
>     % key='('
>     % [[ -v dict[$key] ]]
>     zsh: invalid subscript
>
> There is no simple quoting that you can apply to $key here: (q), (b),
> etc. are all wrong. You could perhaps escape a specific list of
> characters ('(', '[', '{' but not '$' or '*') although my memory tells
> me that some keys cannot be made to work under `[[ -v ...]]` or
> `unset` no matter how you try to escape them. I could be wrong though.

Subscripted arguments to [[ -v ... ]] appear to undergo a second
round of expansions, so quoting "$key" itself should be sufficient.

	% typeset -A dict=('(' foo)
	% key='('
	% [[ -v dict[\$key] ]]; echo $?
	0
	% [[ -v dict['$key'] ]]; echo $?
	0
	% [[ -v 'dict[$key]' ]]; echo $?
	0

-- 
vq


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10 16:33   ` Lawrence Velázquez
@ 2023-11-10 17:02     ` Bart Schaefer
  2023-11-10 20:37     ` Roman Perepelitsa
  1 sibling, 0 replies; 15+ messages in thread
From: Bart Schaefer @ 2023-11-10 17:02 UTC (permalink / raw)
  To: zsh-users; +Cc: linuxtechguy

On Fri, Nov 10, 2023 at 8:34 AM Lawrence Velázquez <larryv@zsh.org> wrote:
>
> On Fri, Nov 10, 2023, at 4:50 AM, Roman Perepelitsa wrote:
> >
> > [...] my memory tells
> > me that some keys cannot be made to work under `[[ -v ...]]` or
> > `unset` no matter how you try to escape them. I could be wrong though.

% typeset -A dict
% key='('
% dict=( [\(]=paren )
% typeset -p dict
typeset -A dict=( ['(']=paren )
% unset "dict[$key]"
% typeset -p dict
typeset -A dict=( )
%

Do I misunderstand something about the example?

Roman is however correct that there's no single quoting strategy that
works everywhere you might use an associative array subscript.  You
have to match the quoting to the context.

> Subscripted arguments to [[ -v ... ]] appear to undergo a second
> round of expansions, so quoting "$key" itself should be sufficient.

This makes sense from the implementation standpoint if perhaps not
from user perspective; -v has to evaluate the subscript to find the
array element, and that it has already undergone expansion by
order-of-evaluation in [[ ]] isn't "known".


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10 16:33   ` Lawrence Velázquez
  2023-11-10 17:02     ` Bart Schaefer
@ 2023-11-10 20:37     ` Roman Perepelitsa
  2023-11-11  0:13       ` Bart Schaefer
  1 sibling, 1 reply; 15+ messages in thread
From: Roman Perepelitsa @ 2023-11-10 20:37 UTC (permalink / raw)
  To: Lawrence Velázquez; +Cc: linuxtechguy, zsh-users

On Fri, Nov 10, 2023 at 5:34 PM Lawrence Velázquez <larryv@zsh.org> wrote:
>
> On Fri, Nov 10, 2023, at 4:50 AM, Roman Perepelitsa wrote:
> > Associative arrays in zsh are finicky when it comes to the content of
> > their keys. The problem you are experiencing can be distilled to this:
> >
> >     % typeset -A dict
> >     % key='('
> >     % [[ -v dict[$key] ]]
> >     zsh: invalid subscript
> >
> > There is no simple quoting that you can apply to $key here: (q), (b),
> > etc. are all wrong. You could perhaps escape a specific list of
> > characters ('(', '[', '{' but not '$' or '*') although my memory tells
> > me that some keys cannot be made to work under `[[ -v ...]]` or
> > `unset` no matter how you try to escape them. I could be wrong though.
>
> Subscripted arguments to [[ -v ... ]] appear to undergo a second
> round of expansions [...]

Oh wow! This is very surprising to me.

    % foo=x
    % bar='$foo'
    % typeset -A dict=(x 1)
    % [[ -v dict[$bar] ]] && echo 'very surprising'
    very surprising

There is also a scarier version of this, which causes execution of an
external command when I don't expect it.

    % typeset -A dict
    % var='dict[$(print -u2 pwnd)]'
    % [[ -v $var ]]
    pwnd

Is this intended?

On Fri, Nov 10, 2023 at 6:03 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> Do I misunderstand something about the example?

No, it's my bad misremembering stuff. There was something with
"unset[$key]" where I was unable to make it work for any "$key" but I
cannot recall the specifics. It could have had something to do with
binary data in keys.

> -v has to evaluate the subscript to find the array element [...]

Why does it have to evaluate the subscript? `unset` does not do it,
why would [[ -v .. ]] be different?

Roman.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10 20:37     ` Roman Perepelitsa
@ 2023-11-11  0:13       ` Bart Schaefer
  2023-11-11 17:18         ` Ray Andrews
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Schaefer @ 2023-11-11  0:13 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Lawrence Velázquez, linuxtechguy, zsh-users

In reversed order ...

On Fri, Nov 10, 2023 at 12:38 PM Roman Perepelitsa
<roman.perepelitsa@gmail.com> wrote:
>
> On Fri, Nov 10, 2023 at 6:03 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > -v has to evaluate the subscript to find the array element [...]
>
> Why does it have to evaluate the subscript? `unset` does not do it,
> why would [[ -v .. ]] be different?

The shortest possible answer is that [[ -v ... ]] will autoload the
parameter before reporting whether it is set, whereas unset just acts
on the current state.

% typeset -p aliases
% unset 'aliases[run-help]'
zsh: aliases: assignment to invalid subscript range
% [[ -v aliases[run-help] ]] && echo yes
yes
%

However, I doubt Oliver was specifically thinking about that seven
years ago when he implemented [[ -v ... ]].

> On Fri, Nov 10, 2023 at 5:34 PM Lawrence Velázquez <larryv@zsh.org> wrote:
> >
> > Subscripted arguments to [[ -v ... ]] appear to undergo a second
> > round of expansions [...]
>
> There is also a scarier version of this, which causes execution of an
> external command when I don't expect it.

This is why the new named-references code evaluates subscripts with
NO_EXEC temporarily in effect.

> Is this intended?

That answer is probably lost to history.  If we presently believe that
"no" is the best answer, we can use the NO_EXEC trick for [[ -v ... ]]
as well, although that would mean that a bit of factored-out code
that's currently called only from [[ ]] could never be called from
anywhere else.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-11  0:13       ` Bart Schaefer
@ 2023-11-11 17:18         ` Ray Andrews
  2023-11-11 18:19           ` Bart Schaefer
  0 siblings, 1 reply; 15+ messages in thread
From: Ray Andrews @ 2023-11-11 17:18 UTC (permalink / raw)
  To: zsh-users

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]


On 2023-11-10 16:13, Bart Schaefer wrote:
> However, I doubt Oliver was specifically thinking about that seven
> years ago when he implemented [[ -v ... ]].

Bart:

Another hypothetical:  Since all that was added not too long ago, and 
since Oliver is still kicking and could comment on the issue, what would 
be the practicalities of Thomas just hacking the code to his own 
satisfaction?  That kind of solution never seems to be suggested but it 
would seem to be a possibility.  Too complicated?  Too dangerous?


[-- Attachment #2: Type: text/html, Size: 1033 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-11 17:18         ` Ray Andrews
@ 2023-11-11 18:19           ` Bart Schaefer
  2023-11-11 18:52             ` Ray Andrews
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Schaefer @ 2023-11-11 18:19 UTC (permalink / raw)
  To: Ray Andrews; +Cc: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]

On Sat, Nov 11, 2023, 9:18 AM Ray Andrews <rayandrews@eastlink.ca> wrote:

> what would be the practicalities of Thomas just hacking the code to his
> own satisfaction?  That kind of solution never seems to be suggested but it
> would seem to be a possibility.
>
It's always implicit that anyone can submit a patch, though usually that
should happen on zsh-workers rather than -users. (Everyone on -workers gets
to see -users by default, so the distinction is mostly so the consumers
don't have to see the sausage being made, so to speak.)  But at least for
me the presumption on -users is that an answer that works now is preferable
to one that has to wait for the next version or use a local build.

[-- Attachment #2: Type: text/html, Size: 1083 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10 14:17   ` Mikael Magnusson
  2023-11-10 14:28     ` Roman Perepelitsa
@ 2023-11-11 18:26     ` Jim
  1 sibling, 0 replies; 15+ messages in thread
From: Jim @ 2023-11-11 18:26 UTC (permalink / raw)
  To: Mikael Magnusson; +Cc: zsh

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

On Fri, Nov 10, 2023 at 8:17 AM Mikael Magnusson <mikachu@gmail.com> wrote:
 -clip-

> Not sure why I didn't get this error when testing yesterday, but in
> this case you can also avoid it by using the more typical ((
> $+dict[$key] )) test instead of [[ -v ...]].
>
> Mikael Magnusson
>

Using ((  $+FileNameCkSum[$E]  )) completed without any errors.

Thanks.

Jim

[-- Attachment #2: Type: text/html, Size: 830 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-10  9:50 ` Roman Perepelitsa
  2023-11-10 14:17   ` Mikael Magnusson
  2023-11-10 16:33   ` Lawrence Velázquez
@ 2023-11-11 18:26   ` Jim
  2023-11-12  0:08     ` Bart Schaefer
  2 siblings, 1 reply; 15+ messages in thread
From: Jim @ 2023-11-11 18:26 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: zsh

[-- Attachment #1: Type: text/plain, Size: 2794 bytes --]

On Fri, Nov 10, 2023 at 3:51 AM Roman Perepelitsa <
roman.perepelitsa@gmail.com> wrote:

> On Fri, Nov 10, 2023 at 12:17 AM Jim <linux.tech.guy@gmail.com> wrote:
>
-clip-

> > ...
> > for E (${(@)AllFileNames}) {
> > [[ -v FileNameCkSum[$E] ]] || FileNameCkSum[$E]=${$(shasum -a 1 $E)[1]}
> }  # line that fails
> > ...
>
> -clip-

>
> Associative arrays in zsh are finicky when it comes to the content of
> their keys. The problem you are experiencing can be distilled to this:
>
>     % typeset -A dict
>     % key='('
>     % [[ -v dict[$key] ]]
>     zsh: invalid subscript
>
> -clip-

> Roman.
>
> P.S.
>
> From the description of your problem I would think that you want file
> hashes as keys. Something like this:
>
>     # usage: detect-dup-files [file]..
>     function detect-dup-files() {
>       emulate -L zsh
>       (( ARGC )) || return 0
>       local -A seen
>       local i files fname hash orig
>       files=( $(shasum -ba 256 -- "$@") ) || return
>       (( 2 * ARGC == $#files )) || return
>       for i in {1..$ARGC}; do
>         fname=$argv[i]
>         hash=${files[2*i-1]#\\}
>         if [[ -n ${orig::=$seen[$hash]} ]]; then
>           print -r -- "${(q+)fname} is a dup of ${(q+)orig}"
>         else
>           seen[$hash]=$fname
>         fi
>       done
>     }
>
> This code has an added advantage of forking only once. It also handles
> file names with backslashes and linefeeds in them.
>

Was only expecting at best an answer to the one line, so thanks for the
function.
After 58 years of working on computers and IT this old dog is open to new
code.
I usually learn something from every example I get my hands on, and this
function was no exception.  So a big thanks. Parts of it will be used in
the future.

Since everyone was working with limited information about what I was doing,
there are some issues. The files I'm working on are in excess of 96K, and
most
utilities, including shasum, report the input line is too long. So a few
changes
are needed. Even with "groups" of files, shasum takes over two and half
hours
to do 96K.  So I implemented gdbm to store the results. So even when I
hit the "key: problem", I could skip all files that were already hashed.

And you were right, I was working on hashes as keys but in a different way.
Used the following to create a second associative array(also ztied).
Well over two hours to do. I thought it would be a lot faster, oh well.

for V ("${(@u)FileNameCkSum}") \
  CkSumFileNames[$V]=${(pj:\0:)${(o)${(@k)FileNameCkSum[(Re)$V]}}}

Each key's(hash) value has all the files with the same hash separated by
NULLs.
The rest of the code uses this second associative array.

Again thanks, and best regards,

Jim

[-- Attachment #2: Type: text/html, Size: 4093 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-11 18:19           ` Bart Schaefer
@ 2023-11-11 18:52             ` Ray Andrews
  0 siblings, 0 replies; 15+ messages in thread
From: Ray Andrews @ 2023-11-11 18:52 UTC (permalink / raw)
  To: zsh-users


On 2023-11-11 10:19, Bart Schaefer wrote:
> But at least for me the presumption on -users is that an answer that 
> works now is preferable to one that has to wait for the next version 
> or use a local build.
Of course.  If existing functionality does the trick, no need to 
re-invent the wheel.  Dunno,  it's only the most general of questions 
but still, it could be that local ad hoc hacks would be something we 
might see now and again -- or, as you say, maybe that is happening, but 
just not on -users.  Back in my DOS days, given the limitations on 
memory, one would make specific modifications to the code to solve some 
local issue rather than try to build more functionality into the base 
code -- too much unaffordable bloat. Different culture for different OS.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: special characters in file names issue
  2023-11-11 18:26   ` Jim
@ 2023-11-12  0:08     ` Bart Schaefer
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Schaefer @ 2023-11-12  0:08 UTC (permalink / raw)
  To: linuxtechguy; +Cc: Roman Perepelitsa, zsh

On Sat, Nov 11, 2023 at 10:28 AM Jim <linux.tech.guy@gmail.com> wrote:
>
>>       local i files fname hash orig
>>       files=( $(shasum -ba 256 -- "$@") ) || return
>>
>> This code has an added advantage of forking only once. It also handles
>> file names with backslashes and linefeeds in them.
>
> there are some issues. The files I'm working on are in excess of 96K, and most
> utilities, including shasum, report the input line is too long.

If you're already putting the hashes in a gdbm, it should be possible
to write a zargs command to automatically batch them up and populate
the database.  Once that's working on a few files as a test case, you
can use zargs -P N to run N copies of the hashing job at once.

> So a few changes
> are needed. Even with "groups" of files, shasum takes over two and half hours
> to do 96K.

For your purposes, do you need to generate a hash of the file contents
(which shasum is doing) or just hash the file name to hide special
characters?  Roman's example needs the former because it is searching
for duplicated content.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-12  0:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 23:16 special characters in file names issue Jim
2023-11-10  5:04 ` Mikael Magnusson
2023-11-10  9:50 ` Roman Perepelitsa
2023-11-10 14:17   ` Mikael Magnusson
2023-11-10 14:28     ` Roman Perepelitsa
2023-11-11 18:26     ` Jim
2023-11-10 16:33   ` Lawrence Velázquez
2023-11-10 17:02     ` Bart Schaefer
2023-11-10 20:37     ` Roman Perepelitsa
2023-11-11  0:13       ` Bart Schaefer
2023-11-11 17:18         ` Ray Andrews
2023-11-11 18:19           ` Bart Schaefer
2023-11-11 18:52             ` Ray Andrews
2023-11-11 18:26   ` Jim
2023-11-12  0:08     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).