zsh-users
 help / color / mirror / code / Atom feed
From: Mikael Magnusson <mikachu@gmail.com>
To: dominik.vogt@gmx.de, Zsh Users <zsh-users@zsh.org>
Subject: Re: Using file lines as "input files"
Date: Sat, 9 Jul 2022 04:21:37 +0200	[thread overview]
Message-ID: <CAHYJk3T9G+VZO9prf_qm2P4yAYVJ-qi_urSBzGTWDhnn9dPTdw@mail.gmail.com> (raw)
In-Reply-To: <Ysi7JTBpUXK/F8v/@gmx.de>

On 7/9/22, Dominik Vogt <dominik.vogt@gmx.de> wrote:
> On Fri, Jul 08, 2022 at 03:04:31PM -0700, Bart Schaefer wrote:
>> On Fri, Jul 8, 2022 at 1:58 PM Dominik Vogt <dominik.vogt@gmx.de> wrote:
>> >
>> > Disclaimer: I _know_ this can be done in seconds with perl /
>> > python, but I like to not rely on scripting languages when the
>> > shell can do the job.
>>
>> This is sort of like saying "I like to not rely on hiking boots when
>> shoes can do the job."
>
> Actually, for me, scripting languages are the "shoes" because they
> don't interact very well with the command pipeline, unless you
> spend an absurd amount of work to make them do so.  Calling
> commands for everything can be slower, but most of the time it's
> just a symptom of bad scripting.  GNU coreutils are faster than
> anything I'll ever be willing to code (or any perl or python
> script or C or C++ library for that matter).  The trick is keeping
> the process spawning overhead low.
>
>> >   $ chksum Fline1 Fline2 Fline3 ... Fline265000
>> >
>> > (Of course without actually splitting the input file
>>
>> If "not actually splitting" means what it seems to mean, and you
>> literally want to run cksum, the answer is no.
>
> Right.
>
> This does the job pretty well, relying entirely on existing Unix
> tools:
>
>   ulimit -s 100000
>   split -l 1 "$INPUTF" ff
>   cksum ff*
>   rm ff*
>
> That cuts runtime down to seven seconds instead of four minutes,
> at the cost of a fem hunred MB on the RAM disk.  Splitting the
> source file and removing the fragments takes about three to four
> seconds.
>
> Thanks for the comments which put me on the right track.
>
> --
>
> (I prefer to have a huge stack size anyway to be able to do things
> like "grep foobar **/*(.)".)

I realized I misinterpreted the question originally, and the following
doesn't seem to work 100% but it was a fun idea:
% mkfifo apipe
% foo[265000]='' # number of lines in the file
% cksum apipe$^foo # pass "apipe" to cksum 265000 times
(in another terminal or job control etc)
% while read; do echo $REPLY > apipe; done < infile

When I tried the above on some test data, I got about 10 broken pipes.
Also several lines sometimes get passed through the pipe without an
intervening EOF, I'll admit I don't know the finer points of pipe/fifo
behavior when you open and close them rapidly.

That said, this also seems to take around 4-5 seconds to run.

-- 
Mikael Magnusson


  reply	other threads:[~2022-07-09  2:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08 20:58 Dominik Vogt
2022-07-08 21:58 ` Mikael Magnusson
2022-07-08 22:04 ` Bart Schaefer
2022-07-08 23:17   ` Dominik Vogt
2022-07-09  2:21     ` Mikael Magnusson [this message]
2022-07-10  0:42       ` Dominik Vogt
2022-07-10  0:45         ` Dominik Vogt
2022-07-10  3:27         ` Bart Schaefer
2022-07-10 17:49           ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHYJk3T9G+VZO9prf_qm2P4yAYVJ-qi_urSBzGTWDhnn9dPTdw@mail.gmail.com \
    --to=mikachu@gmail.com \
    --cc=dominik.vogt@gmx.de \
    --cc=zsh-users@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).