zsh-users
 help / color / mirror / code / Atom feed
From: Bart Schaefer <schaefer@brasslantern.com>
To: linuxtechguy@gmail.com
Cc: Roman Perepelitsa <roman.perepelitsa@gmail.com>, zsh <zsh-users@zsh.org>
Subject: Re: special characters in file names issue
Date: Sat, 11 Nov 2023 16:08:13 -0800	[thread overview]
Message-ID: <CAH+w=7YLvXG8L2Kre878eUwVY_K97b1+LK+-xck1MzAWuVk2_g@mail.gmail.com> (raw)
In-Reply-To: <CA+rB6GJ1bWD_H3JPfiHN0P5+rC3tYCfOgBub8Y-8zzYs20eTng@mail.gmail.com>

On Sat, Nov 11, 2023 at 10:28 AM Jim <linux.tech.guy@gmail.com> wrote:
>
>>       local i files fname hash orig
>>       files=( $(shasum -ba 256 -- "$@") ) || return
>>
>> This code has an added advantage of forking only once. It also handles
>> file names with backslashes and linefeeds in them.
>
> there are some issues. The files I'm working on are in excess of 96K, and most
> utilities, including shasum, report the input line is too long.

If you're already putting the hashes in a gdbm, it should be possible
to write a zargs command to automatically batch them up and populate
the database.  Once that's working on a few files as a test case, you
can use zargs -P N to run N copies of the hashing job at once.

> So a few changes
> are needed. Even with "groups" of files, shasum takes over two and half hours
> to do 96K.

For your purposes, do you need to generate a hash of the file contents
(which shasum is doing) or just hash the file name to hide special
characters?  Roman's example needs the former because it is searching
for duplicated content.


      reply	other threads:[~2023-11-12  0:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09 23:16 Jim
2023-11-10  5:04 ` Mikael Magnusson
2023-11-10  9:50 ` Roman Perepelitsa
2023-11-10 14:17   ` Mikael Magnusson
2023-11-10 14:28     ` Roman Perepelitsa
2023-11-11 18:26     ` Jim
2023-11-10 16:33   ` Lawrence Velázquez
2023-11-10 17:02     ` Bart Schaefer
2023-11-10 20:37     ` Roman Perepelitsa
2023-11-11  0:13       ` Bart Schaefer
2023-11-11 17:18         ` Ray Andrews
2023-11-11 18:19           ` Bart Schaefer
2023-11-11 18:52             ` Ray Andrews
2023-11-11 18:26   ` Jim
2023-11-12  0:08     ` Bart Schaefer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH+w=7YLvXG8L2Kre878eUwVY_K97b1+LK+-xck1MzAWuVk2_g@mail.gmail.com' \
    --to=schaefer@brasslantern.com \
    --cc=linuxtechguy@gmail.com \
    --cc=roman.perepelitsa@gmail.com \
    --cc=zsh-users@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).