From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 30709 invoked from network); 12 Nov 2023 00:09:12 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 12 Nov 2023 00:09:12 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1699747752; b=YsSCHUBD3xV0k7y1FmY8BWlF6h3N+caJjBphpuKtw5ilnqaRdwYJuj9GuIWskRK4nLLqIRnMvX 3JBRlZvs1cIk3idkShWY9xpy8NYSN00rm6N0G5m7+rwfQa5kb3creYGqQy2PGUqEi4+3R3wJuq Kc3jxym/KrVA8a/pm1NjDGi1UVGiUVf6QK4L7uNzGI6ro/45sGxs6R8T/IN9Kxm/dNrJoxKjrb 2uFSeDrTP9MWgY/LFrgqvxRRSlQfYourMv0q0FwkVxgHv4m3HVC9v8jJrpzgIlEZo9sMJclsWa WXz26ujOql4NDuQ70UITSknOYPhK7T1TjfX/IGCWbe2M4w==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (mail-ej1-f47.google.com) smtp.remote-ip=209.85.218.47; dkim=pass header.d=brasslantern-com.20230601.gappssmtp.com header.s=20230601 header.a=rsa-sha256; dmarc=none header.from=brasslantern.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1699747752; bh=SxA0YVJyKyYfsBpQm/Fjgiy8F+fotVrOFghYS9vMYG0=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:Content-Type:Cc:To:Subject: Message-ID:Date:From:In-Reply-To:References:MIME-Version:DKIM-Signature: DKIM-Signature; b=GZexzsBeApJH+sz54anP3Jw6xJcovlSGXhVts2J7Zkiew441C3CmMk5HjO91fzeNEzjyo/WvS2 1LhIo8HAe0AIhnkdq8kpPwghvEyGF2yDVGwK30fDFwBhS+BTzWaf4cCEG7e+i8Okwp5YSKv1yA y+h/UzZpCQiJwok6b1aLmHjTarCMBSkfVcy5626LXLSPx/6yKifQEFdRj28FXqvqAsA86CxNrA HRvg6W633gYUDkf9T3NBPv4JrZNtgEMMVZ+oxVfGnESFnKCU6whMCGFXy88R7O3Eo5/sz8FC1c 3KyLM9Q+6cwoZIoYQ6hpr17qat/NtQ75TATlxjp39oS9rg==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Content-Type:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References: MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=VdCHynkl2FIzvR5JmlwmBWC7GtnMgvQOpAOutTVsFa0=; b=YIkwq54pMLr4Em3gLnqjF2UnnY vpagdze3iNLSty97LIrZD8csHyGCAtkBrPOQQUCB5ov8quUL2i/3I4ynePdR8IyayG55F4kmNFvTg c9x0XqTUK/AHmx35jNbnNiRILZ/WI02xAWOqzMhkr1oB855ltVX7RP99HT1O3r/mVJbwx9YkEvYVm BXtZQVVObxqL/PkMDe6Zc3Ai5wwmwg0lGcyQDyJ0HQ3V4/CdpFsEpge8WsFD3zn6TrIonRnV27q4N 73XhkQ2QkxvwB+eXMelw2UwnoJfJkqhuKkwJDXEQmJCzcEzSDmGlt+LOsqPkZ/UHKJbQevTYBOzq1 QIQ7ZCog==; Received: by zero.zsh.org with local id 1r1y2J-0001LU-RA; Sun, 12 Nov 2023 00:09:11 +0000 Authentication-Results: zsh.org; iprev=pass (mail-ej1-f47.google.com) smtp.remote-ip=209.85.218.47; dkim=pass header.d=brasslantern-com.20230601.gappssmtp.com header.s=20230601 header.a=rsa-sha256; dmarc=none header.from=brasslantern.com; arc=none Received: from mail-ej1-f47.google.com ([209.85.218.47]:48214) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) id 1r1y1Z-0000f0-DQ; Sun, 12 Nov 2023 00:08:26 +0000 Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-9e61e969b1aso282924366b.0 for ; Sat, 11 Nov 2023 16:08:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brasslantern-com.20230601.gappssmtp.com; s=20230601; t=1699747705; x=1700352505; darn=zsh.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VdCHynkl2FIzvR5JmlwmBWC7GtnMgvQOpAOutTVsFa0=; b=TPU+XBnhw1MMMWrKvmaFUBY6t9Pj0KPabh5fay55Sa3OJJajSvKkgxXJeIWPICH6fz W3llYpyDVd6VWwk9C14IHfXo4nFYvvFWFrmWoxDdCbCbzSb1XN9eMvM1CjFJnCLXUWzS fuSUPPUaWHmXRkITtgpz6Ysn6pWmi8VE9o35gqpZ4AsyG4l4jdj+NSMdbdYDLi5GNcRr nN+tbCLH7fB2yPK7wGYywihwUBmrW+yw4vaGdO2SqhnzWIeEfLTxd1aegrnnoCZtmyK/ 0KSm9KOSgoe9ueect1t85ObNTzBJJD6X3cXP7WrGeWb8H6lNT7TjqdOjrGDbmfmoyUGT VwuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699747705; x=1700352505; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VdCHynkl2FIzvR5JmlwmBWC7GtnMgvQOpAOutTVsFa0=; b=buMlgyi1pr3GdxryEfDz31x5dwtlJiHPgPKkLPBgJfXF91++MfPI2z9GTX7UFzdfZV 6MvCHU2bhUK7pNhf8V2xL2V4ZsOn95MunNo4Wx8qnjEy+UqzNjkwT9LkgXvROGQNVijh JxMEDFuQx6/HrcDAdHU+AU512rLZI7WoM1CRtP07JV9vlshjox5uD2l+7StXdBJW02g4 YSSrTPumj6XmEoSFC/jzK0M4R8csTjQLkDjZOA/0U4ONnWjQMD8JpRDAa+Y+gAiKwbB9 aeoLufLMWD0QwCwH2Au4BLNAFLmzoUsQepYR4fqfZJF9A3qCCidZq49sfgMq0lF21H8p scPg== X-Gm-Message-State: AOJu0YwN+LqFK8Y+6waGtmmiHn/Otldz/4n4ezalpuMe6h0JVq4A4QSQ qRK0gjBhEphES0GJImKMovi+xGlxCKREe7ecMnZ1nBtT63tq3V/BD6k= X-Google-Smtp-Source: AGHT+IGabLZqPBXZID7YOSL4BPdKKJIdmrexw3vlqoubObYquolr1r3fFC42GpmNiVumalTd/aNtO1xwEDOBu4JOtBY= X-Received: by 2002:a17:906:5ad4:b0:9df:254d:3e45 with SMTP id x20-20020a1709065ad400b009df254d3e45mr1827531ejs.24.1699747704737; Sat, 11 Nov 2023 16:08:24 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Bart Schaefer Date: Sat, 11 Nov 2023 16:08:13 -0800 Message-ID: Subject: Re: special characters in file names issue To: linuxtechguy@gmail.com Cc: Roman Perepelitsa , zsh Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Seq: 29350 Archived-At: X-Loop: zsh-users@zsh.org Errors-To: zsh-users-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-users-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: On Sat, Nov 11, 2023 at 10:28=E2=80=AFAM Jim wro= te: > >> local i files fname hash orig >> files=3D( $(shasum -ba 256 -- "$@") ) || return >> >> This code has an added advantage of forking only once. It also handles >> file names with backslashes and linefeeds in them. > > there are some issues. The files I'm working on are in excess of 96K, and= most > utilities, including shasum, report the input line is too long. If you're already putting the hashes in a gdbm, it should be possible to write a zargs command to automatically batch them up and populate the database. Once that's working on a few files as a test case, you can use zargs -P N to run N copies of the hashing job at once. > So a few changes > are needed. Even with "groups" of files, shasum takes over two and half h= ours > to do 96K. For your purposes, do you need to generate a hash of the file contents (which shasum is doing) or just hash the file name to hide special characters? Roman's example needs the former because it is searching for duplicated content.