From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 22582 invoked from network); 6 Jan 2022 10:55:51 -0000 Received: from 4ess.inri.net (216.126.196.42) by inbox.vuxu.org with ESMTPUTF8; 6 Jan 2022 10:55:51 -0000 Received: from mail.9lab.org ([168.119.8.41]) by 4ess; Thu Jan 6 05:47:39 -0500 2022 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=9lab.org; s=20210803; t=1641466046; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to; bh=WHDIlU5C31cuQKf1VLdPM8jVRLJltTQuN6s3qHF6wk0=; b=NWT+Eb755u3+zolIzZ6tBfjcU6tyk9nMDhZPqH1yNMsOB0bNN7cYvjkmJcVA5ZnugK8i9O UoRncPmyC1eaIyewqALiK6qVSfvbQDgtk48fG6MYWw78Keww82ckSqKm7Wn+ww739iUAOq +v406mktS/vElkDJ9E0cmzz1eRMIvlI= Received: from pjw (host-185-64-155-70.ecsnet.at [185.64.155.70]) by mail.9lab.org (OpenSMTPD) with ESMTPSA id a23295b2 (TLSv1.2:ECDHE-RSA-CHACHA20-POLY1305:256:NO) for <9front@9front.org>; Thu, 6 Jan 2022 11:47:25 +0100 (CET) Message-ID: <642E7A9684DC9999ABD5C35763872D03@9lab.org> To: 9front@9front.org Date: Thu, 06 Jan 2022 11:47:23 +0100 From: igor@9lab.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: SVG over SOAP component method element realtime-java database Subject: Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping Reply-To: 9front@9front.org Precedence: bulk Quoth ori@eigenstate.org: > Quoth Michael Forney : > > On 2022-01-04, igor@9lab.org wrote: > > > xargs also has a parallel mode that comes in handy to speed up search > > > in this case. > > > > Is there a possibility that this might result in intermixed grep > > results (i.e. one process printing a line in the middle of a another > > line)? That'd be my main concern with adding parallelism to xargs. Isn't this only an issue with shared memory concurrency (i.e. multiple threads trying to call printf concurrently)? At least that is the context I have encountered this issue before, where the solution is to synchronise around printf. The implementation in /sys/src/cmd/xargs.c uses fork to spawn multiple processes. A quick test searching across multiple large project bases with sufficient parallelism for xargs did not reveal any issues of garbled output. However, that just might have been luck… Anyway, if we decide to use the `-p` flag with xargs the amount of parallelism can be derived from $NPROC, as in mk(1). > > > case * > > > - pattern=$1 > > > - shift > > > for(f in $*){ > > > if(test -d $f) > > > - files=($files `$nl{walk -f $recurse -- $* \ > > > - | grep -e $fullnames -e $suffixes >[2]/dev/null}) > > > + walk -f $recurse -- $f \ > > > + | grep -e $fullnames -e $suffixes >[2]/dev/null > > > if not > > > - files=($files $f) > > > + echo -n $f$nl > > > > If we don't care about ordering of results, we could also skip the > > for-loop completely and replace this entire case with > > > > walk -f -n0 -- $* > > walk -f $recurse -- $* | grep -e $fullnames -e $suffixes >[2]/dev/null > > > > and change the default to recurse='-n1,'. This would walk all named > > file arguments first, followed by the files in the directories. > > I don't think current `g` makes any promises about the order, at least it doesn't document them. > And if we do, we can |sort at the end, which means the results won't > trickle in, but it will be sorted. I'd prefer that. #metoo