From: Mikael Magnusson <mikachu@gmail.com>
To: dominik.vogt@gmx.de, Zsh Users <zsh-users@zsh.org>
Subject: Re: Using file lines as "input files"
Date: Sat, 9 Jul 2022 04:21:37 +0200 [thread overview]
Message-ID: <CAHYJk3T9G+VZO9prf_qm2P4yAYVJ-qi_urSBzGTWDhnn9dPTdw@mail.gmail.com> (raw)
In-Reply-To: <Ysi7JTBpUXK/F8v/@gmx.de>
On 7/9/22, Dominik Vogt <dominik.vogt@gmx.de> wrote:
> On Fri, Jul 08, 2022 at 03:04:31PM -0700, Bart Schaefer wrote:
>> On Fri, Jul 8, 2022 at 1:58 PM Dominik Vogt <dominik.vogt@gmx.de> wrote:
>> >
>> > Disclaimer: I _know_ this can be done in seconds with perl /
>> > python, but I like to not rely on scripting languages when the
>> > shell can do the job.
>>
>> This is sort of like saying "I like to not rely on hiking boots when
>> shoes can do the job."
>
> Actually, for me, scripting languages are the "shoes" because they
> don't interact very well with the command pipeline, unless you
> spend an absurd amount of work to make them do so. Calling
> commands for everything can be slower, but most of the time it's
> just a symptom of bad scripting. GNU coreutils are faster than
> anything I'll ever be willing to code (or any perl or python
> script or C or C++ library for that matter). The trick is keeping
> the process spawning overhead low.
>
>> > $ chksum Fline1 Fline2 Fline3 ... Fline265000
>> >
>> > (Of course without actually splitting the input file
>>
>> If "not actually splitting" means what it seems to mean, and you
>> literally want to run cksum, the answer is no.
>
> Right.
>
> This does the job pretty well, relying entirely on existing Unix
> tools:
>
> ulimit -s 100000
> split -l 1 "$INPUTF" ff
> cksum ff*
> rm ff*
>
> That cuts runtime down to seven seconds instead of four minutes,
> at the cost of a fem hunred MB on the RAM disk. Splitting the
> source file and removing the fragments takes about three to four
> seconds.
>
> Thanks for the comments which put me on the right track.
>
> --
>
> (I prefer to have a huge stack size anyway to be able to do things
> like "grep foobar **/*(.)".)
I realized I misinterpreted the question originally, and the following
doesn't seem to work 100% but it was a fun idea:
% mkfifo apipe
% foo[265000]='' # number of lines in the file
% cksum apipe$^foo # pass "apipe" to cksum 265000 times
(in another terminal or job control etc)
% while read; do echo $REPLY > apipe; done < infile
When I tried the above on some test data, I got about 10 broken pipes.
Also several lines sometimes get passed through the pipe without an
intervening EOF, I'll admit I don't know the finer points of pipe/fifo
behavior when you open and close them rapidly.
That said, this also seems to take around 4-5 seconds to run.
--
Mikael Magnusson
next prev parent reply other threads:[~2022-07-09 2:22 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-08 20:58 Dominik Vogt
2022-07-08 21:58 ` Mikael Magnusson
2022-07-08 22:04 ` Bart Schaefer
2022-07-08 23:17 ` Dominik Vogt
2022-07-09 2:21 ` Mikael Magnusson [this message]
2022-07-10 0:42 ` Dominik Vogt
2022-07-10 0:45 ` Dominik Vogt
2022-07-10 3:27 ` Bart Schaefer
2022-07-10 17:49 ` Bart Schaefer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHYJk3T9G+VZO9prf_qm2P4yAYVJ-qi_urSBzGTWDhnn9dPTdw@mail.gmail.com \
--to=mikachu@gmail.com \
--cc=dominik.vogt@gmx.de \
--cc=zsh-users@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).