9front - general discussion about 9front
 help / color / mirror / Atom feed
* [9front] [PATCH] g: use xargs instead of finding complete file list before greping
@ 2022-01-01 10:26 Michael Forney
  2022-01-02  1:13 ` 有澤 健治
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Michael Forney @ 2022-01-01 10:26 UTC (permalink / raw)
  To: 9front

---
This patch requires the other g patch I sent earlier to apply cleanly.

I noticed that with the recent rc changes, I see the message 'Write
error' on stderr when using g in a large directory (such as
/sys/src/cmd).  This is because rc now uses io.c code to write env
files before executing a child, and this fails when the size of a list
exceeds Maxenvsize.  Previously, these write errors were just ignored.

It can be reproduced with

	% a=`{seq 1 4000}
	% echo
	Write error
	
	%

I'm not sure if this error message is desired or not for cases like
this, but regardless, I think g could be a bit more efficient by using
xargs rather than enumerating the complete list of files up front,
avoiding the issue completely.

One slight issue with this patch is that files named '-n' are skipped.
I don't think it's too big a deal, but I wasn't quite sure how to
handle this.  One idea is to use `walk -f -- $f` instead of `echo $f`.
Another idea is `echo -n $f$nl`.  In general, is there a good way to
print a variable, regardless of its value (similar to POSIX `printf
'%s\n' "$f"`)?

diff 525c7bc4922fc86ba0bbe2281fbe92e697a4f6d8 dd997a744e5ec3704d48dedc259d06c062d56aec
--- a/rc/bin/g	Sat Jan  1 14:51:39 2022
+++ b/rc/bin/g	Sat Jan  1 02:26:14 2022
@@ -14,25 +14,24 @@
 }
 if(~ $1 --)
 	shift
+if(~ $#* 0) {
+	echo 'usage: g [flags] pattern [files]' >[1=2]
+	exit usage
+}
+pattern=$1
+shift
 
 suffixes='\.([bcChlmsy]|asm|awk|cc|cgi|cpp|cs|go|goc|hs|java|lua|lx|mk|ml|mli|ms|myr|pl|py|rc|sh|tex|xy)$'
 fullnames='(^|/)mkfile$'
 switch($#*){
 case 0
-	echo 'usage: g [flags] pattern [files]' >[1=2]
-	exit usage
-case 1
-	pattern=$1
-	files=`$nl{walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null}
+	walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null
 case *
-	pattern=$1
-	shift
 	for(f in $*){
 		if(test -d $f)
-			files=($files `$nl{walk -f $recurse -- $f \
-				| grep -e $fullnames -e $suffixes >[2]/dev/null})
+			walk -f $recurse -- $f \
+				| grep -e $fullnames -e $suffixes >[2]/dev/null
 		if not
-			files=($files $f)
+			echo $f
 	}
-}
-grep -n $flags -- $pattern $files /dev/null
+} | xargs grep -n $flags -- $pattern

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-01 10:26 [9front] [PATCH] g: use xargs instead of finding complete file list before greping Michael Forney
@ 2022-01-02  1:13 ` 有澤 健治
  2022-01-02  1:20   ` ori
  2022-01-02  2:34   ` Michael Forney
  2022-01-02  1:28 ` 有澤 健治
  2022-01-05  0:05 ` igor
  2 siblings, 2 replies; 10+ messages in thread
From: 有澤 健治 @ 2022-01-02  1:13 UTC (permalink / raw)
  To: 9front

Hello,

Experiment in my server is

hebe% a=`{seq 1 4000}
hebe% echo $a
1 2 3 4 [snip] 3997 3998 3999 4000
hebe%

hebe% ls -l /sys/src/cmd/rc/io.c
--rw-rw-r-- M 98029 sys sys 3757 Oct  5  2019 /sys/src/cmd/rc/io.c
hebe%

Kenji Arisawa

On 2022/01/01 19:26, Michael Forney wrote:
> ---
> This patch requires the other g patch I sent earlier to apply cleanly.
>
> I noticed that with the recent rc changes, I see the message 'Write
> error' on stderr when using g in a large directory (such as
> /sys/src/cmd).  This is because rc now uses io.c code to write env
> files before executing a child, and this fails when the size of a list
> exceeds Maxenvsize.  Previously, these write errors were just ignored.
>
> It can be reproduced with
>
> 	% a=`{seq 1 4000}
> 	% echo
> 	Write error
> 	
> 	%
>
> I'm not sure if this error message is desired or not for cases like
> this, but regardless, I think g could be a bit more efficient by using
> xargs rather than enumerating the complete list of files up front,
> avoiding the issue completely.
>
> One slight issue with this patch is that files named '-n' are skipped.
> I don't think it's too big a deal, but I wasn't quite sure how to
> handle this.  One idea is to use `walk -f -- $f` instead of `echo $f`.
> Another idea is `echo -n $f$nl`.  In general, is there a good way to
> print a variable, regardless of its value (similar to POSIX `printf
> '%s\n' "$f"`)?
>
> diff 525c7bc4922fc86ba0bbe2281fbe92e697a4f6d8 dd997a744e5ec3704d48dedc259d06c062d56aec
> --- a/rc/bin/g	Sat Jan  1 14:51:39 2022
> +++ b/rc/bin/g	Sat Jan  1 02:26:14 2022
> @@ -14,25 +14,24 @@
>   }
>   if(~ $1 --)
>   	shift
> +if(~ $#* 0) {
> +	echo 'usage: g [flags] pattern [files]' >[1=2]
> +	exit usage
> +}
> +pattern=$1
> +shift
>   
>   suffixes='\.([bcChlmsy]|asm|awk|cc|cgi|cpp|cs|go|goc|hs|java|lua|lx|mk|ml|mli|ms|myr|pl|py|rc|sh|tex|xy)$'
>   fullnames='(^|/)mkfile$'
>   switch($#*){
>   case 0
> -	echo 'usage: g [flags] pattern [files]' >[1=2]
> -	exit usage
> -case 1
> -	pattern=$1
> -	files=`$nl{walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null}
> +	walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null
>   case *
> -	pattern=$1
> -	shift
>   	for(f in $*){
>   		if(test -d $f)
> -			files=($files `$nl{walk -f $recurse -- $f \
> -				| grep -e $fullnames -e $suffixes >[2]/dev/null})
> +			walk -f $recurse -- $f \
> +				| grep -e $fullnames -e $suffixes >[2]/dev/null
>   		if not
> -			files=($files $f)
> +			echo $f
>   	}
> -}
> -grep -n $flags -- $pattern $files /dev/null
> +} | xargs grep -n $flags -- $pattern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-02  1:13 ` 有澤 健治
@ 2022-01-02  1:20   ` ori
  2022-01-02  4:56     ` 有澤 健治
  2022-01-02  2:34   ` Michael Forney
  1 sibling, 1 reply; 10+ messages in thread
From: ori @ 2022-01-02  1:20 UTC (permalink / raw)
  To: 9front

Quoth 有澤 健治 <karisawa@gmail.com>:
> Hello,
> 
> Experiment in my server is
> 
> hebe% a=`{seq 1 4000}
> hebe% echo $a
> 1 2 3 4 [snip] 3997 3998 3999 4000
> hebe%
> 
> hebe% ls -l /sys/src/cmd/rc/io.c
> --rw-rw-r-- M 98029 sys sys 3757 Oct  5  2019 /sys/src/cmd/rc/io.c
> hebe%
> 

have you sysupdated recently? there were
a lot of changes related to rc.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-01 10:26 [9front] [PATCH] g: use xargs instead of finding complete file list before greping Michael Forney
  2022-01-02  1:13 ` 有澤 健治
@ 2022-01-02  1:28 ` 有澤 健治
  2022-01-05  0:05 ` igor
  2 siblings, 0 replies; 10+ messages in thread
From: 有澤 健治 @ 2022-01-02  1:28 UTC (permalink / raw)
  To: 9front

Hello,

 >I think g could be a bit more efficient by using
 >xargs rather than enumerating the complete list of files up front,
 >avoiding the issue completely.
I agree with this point.

I have been working on recursive g (named gr) that works efficient
and easy to use, and also works on unix.
look http://p9.nyx.link/netlib/

Kenji Arisawa

On 2022/01/01 19:26, Michael Forney wrote:
> ---
> This patch requires the other g patch I sent earlier to apply cleanly.
>
> I noticed that with the recent rc changes, I see the message 'Write
> error' on stderr when using g in a large directory (such as
> /sys/src/cmd).  This is because rc now uses io.c code to write env
> files before executing a child, and this fails when the size of a list
> exceeds Maxenvsize.  Previously, these write errors were just ignored.
>
> It can be reproduced with
>
> 	% a=`{seq 1 4000}
> 	% echo
> 	Write error
> 	
> 	%
>
> I'm not sure if this error message is desired or not for cases like
> this, but regardless, I think g could be a bit more efficient by using
> xargs rather than enumerating the complete list of files up front,
> avoiding the issue completely.
>
> One slight issue with this patch is that files named '-n' are skipped.
> I don't think it's too big a deal, but I wasn't quite sure how to
> handle this.  One idea is to use `walk -f -- $f` instead of `echo $f`.
> Another idea is `echo -n $f$nl`.  In general, is there a good way to
> print a variable, regardless of its value (similar to POSIX `printf
> '%s\n' "$f"`)?
>
> diff 525c7bc4922fc86ba0bbe2281fbe92e697a4f6d8 dd997a744e5ec3704d48dedc259d06c062d56aec
> --- a/rc/bin/g	Sat Jan  1 14:51:39 2022
> +++ b/rc/bin/g	Sat Jan  1 02:26:14 2022
> @@ -14,25 +14,24 @@
>   }
>   if(~ $1 --)
>   	shift
> +if(~ $#* 0) {
> +	echo 'usage: g [flags] pattern [files]' >[1=2]
> +	exit usage
> +}
> +pattern=$1
> +shift
>   
>   suffixes='\.([bcChlmsy]|asm|awk|cc|cgi|cpp|cs|go|goc|hs|java|lua|lx|mk|ml|mli|ms|myr|pl|py|rc|sh|tex|xy)$'
>   fullnames='(^|/)mkfile$'
>   switch($#*){
>   case 0
> -	echo 'usage: g [flags] pattern [files]' >[1=2]
> -	exit usage
> -case 1
> -	pattern=$1
> -	files=`$nl{walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null}
> +	walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null
>   case *
> -	pattern=$1
> -	shift
>   	for(f in $*){
>   		if(test -d $f)
> -			files=($files `$nl{walk -f $recurse -- $f \
> -				| grep -e $fullnames -e $suffixes >[2]/dev/null})
> +			walk -f $recurse -- $f \
> +				| grep -e $fullnames -e $suffixes >[2]/dev/null
>   		if not
> -			files=($files $f)
> +			echo $f
>   	}
> -}
> -grep -n $flags -- $pattern $files /dev/null
> +} | xargs grep -n $flags -- $pattern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-02  1:13 ` 有澤 健治
  2022-01-02  1:20   ` ori
@ 2022-01-02  2:34   ` Michael Forney
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Forney @ 2022-01-02  2:34 UTC (permalink / raw)
  To: 9front

On 2022-01-01, 有澤 健治 <karisawa@gmail.com> wrote:
> Experiment in my server is
>
> hebe% a=`{seq 1 4000}
> hebe% echo $a
> 1 2 3 4 [snip] 3997 3998 3999 4000
> hebe%
>
> hebe% ls -l /sys/src/cmd/rc/io.c
> --rw-rw-r-- M 98029 sys sys 3757 Oct  5  2019 /sys/src/cmd/rc/io.c
> hebe%

Looks like you're using an rc from before the recent refactoring on
2021-12-31 (commit b90036a0, rc: fix everything).

Before the refactor, the write errors to /env/a were silently ignored.
The rc process that initially set the `a` variable *does* keep track
of the entire list, it is only when it goes to update /env/a that it
gets truncated.

You can try:

a=`{seq 1 4000}
rc -c 'echo $#a'

This should show that the list is truncated in both versions of rc.
It's just that the new version writes out the message 'Write error'
when this happens.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-02  1:20   ` ori
@ 2022-01-02  4:56     ` 有澤 健治
  0 siblings, 0 replies; 10+ messages in thread
From: 有澤 健治 @ 2022-01-02  4:56 UTC (permalink / raw)
  To: 9front

Hello,

On 2022/01/02 10:20, ori@eigenstate.org wrote:
> have you sysupdated recently? there were
> a lot of changes related to rc.
Sorry, I am hesitating to apply sysupdate to working server.
I need to have another 9front server for experiment.

Kenji Arisawa


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-01 10:26 [9front] [PATCH] g: use xargs instead of finding complete file list before greping Michael Forney
  2022-01-02  1:13 ` 有澤 健治
  2022-01-02  1:28 ` 有澤 健治
@ 2022-01-05  0:05 ` igor
  2022-01-05  3:05   ` Michael Forney
  2 siblings, 1 reply; 10+ messages in thread
From: igor @ 2022-01-05  0:05 UTC (permalink / raw)
  To: 9front

Hi Michael,

Thanks! I have tested your patch and can confirm it works.

Quoth Michael Forney <mforney@mforney.org>:
[…]
> I'm not sure if this error message is desired or not for cases like
> this, but regardless, I think g could be a bit more efficient by using
> xargs rather than enumerating the complete list of files up front,
> avoiding the issue completely.
[…]

xargs also has a parallel mode that comes in handy to speed up search
in this case.

[…]
> One slight issue with this patch is that files named '-n' are skipped.
> I don't think it's too big a deal, but I wasn't quite sure how to
> handle this.  One idea is to use `walk -f -- $f` instead of `echo $f`.
> Another idea is `echo -n $f$nl`.  In general, is there a good way to
> print a variable, regardless of its value (similar to POSIX `printf
> '%s\n' "$f"`)?
[…]

Not sure if there is something similar to POSIX `printf`. I am using
`echo -n $f$nl`.

Below is your patch with minor modifications if anyone else want's to
give this a go and provide further feedback.

Cheers,
Igor

---
diff 31957c1aa36808a2ba4a496fd5a1559ef46b9517 02d9badb3c55e01a8e9d274b89d99f67433f15ec
--- a/rc/bin/g	Tue Jan  4 20:11:07 2022
+++ b/rc/bin/g	Wed Jan  5 00:59:10 2022
@@ -14,25 +14,25 @@
 }
 if(~ $1 --)
 	shift
+if(~ $#* 0) {
+	echo 'usage: g [flags] pattern [files]' >[1=2]
+	exit usage
+}
+pattern=$1
+shift
 
 suffixes='\.([bcChlmsy]|asm|awk|cc|cgi|cpp|cs|go|goc|hs|java|lua|lx|mk|ml|mli|ms|myr|pl|py|rc|sh|tex|xy)$'
 fullnames='(^|/)mkfile$'
 switch($#*){
 case 0
-	echo 'usage: g [flags] pattern [files]' >[1=2]
+	walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null
 	exit usage
-case 1
-	pattern=$1
-	files=`$nl{walk -f $recurse | grep -e $fullnames -e $suffixes >[2]/dev/null}
 case *
-	pattern=$1
-	shift
 	for(f in $*){
 		if(test -d $f)
-			files=($files `$nl{walk -f $recurse -- $* \
-				| grep -e $fullnames -e $suffixes >[2]/dev/null})
+			walk -f $recurse -- $f \
+				| grep -e $fullnames -e $suffixes >[2]/dev/null
 		if not
-			files=($files $f)
+			echo -n $f$nl
 	}
-}
-grep -n $flags -- $pattern $files /dev/null
+} | xargs -p4 grep -n $flags -- $pattern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-05  0:05 ` igor
@ 2022-01-05  3:05   ` Michael Forney
  2022-01-06  1:35     ` ori
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Forney @ 2022-01-05  3:05 UTC (permalink / raw)
  To: 9front

On 2022-01-04, igor@9lab.org <igor@9lab.org> wrote:
> xargs also has a parallel mode that comes in handy to speed up search
> in this case.

Is there a possibility that this might result in intermixed grep
results (i.e. one process printing a line in the middle of a another
line)? That'd be my main concern with adding parallelism to xargs.

>  case *
> -	pattern=$1
> -	shift
>  	for(f in $*){
>  		if(test -d $f)
> -			files=($files `$nl{walk -f $recurse -- $* \
> -				| grep -e $fullnames -e $suffixes >[2]/dev/null})
> +			walk -f $recurse -- $f \
> +				| grep -e $fullnames -e $suffixes >[2]/dev/null
>  		if not
> -			files=($files $f)
> +			echo -n $f$nl

If we don't care about ordering of results, we could also skip the
for-loop completely and replace this entire case with

	walk -f -n0 -- $*
	walk -f $recurse -- $* | grep -e $fullnames -e $suffixes >[2]/dev/null

and change the default to recurse='-n1,'. This would walk all named
file arguments first, followed by the files in the directories.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-05  3:05   ` Michael Forney
@ 2022-01-06  1:35     ` ori
  2022-01-06 10:47       ` igor
  0 siblings, 1 reply; 10+ messages in thread
From: ori @ 2022-01-06  1:35 UTC (permalink / raw)
  To: 9front

Quoth Michael Forney <mforney@mforney.org>:
> On 2022-01-04, igor@9lab.org <igor@9lab.org> wrote:
> > xargs also has a parallel mode that comes in handy to speed up search
> > in this case.
> 
> Is there a possibility that this might result in intermixed grep
> results (i.e. one process printing a line in the middle of a another
> line)? That'd be my main concern with adding parallelism to xargs.
> 
> >  case *
> > -	pattern=$1
> > -	shift
> >  	for(f in $*){
> >  		if(test -d $f)
> > -			files=($files `$nl{walk -f $recurse -- $* \
> > -				| grep -e $fullnames -e $suffixes >[2]/dev/null})
> > +			walk -f $recurse -- $f \
> > +				| grep -e $fullnames -e $suffixes >[2]/dev/null
> >  		if not
> > -			files=($files $f)
> > +			echo -n $f$nl
> 
> If we don't care about ordering of results, we could also skip the
> for-loop completely and replace this entire case with
> 
> 	walk -f -n0 -- $*
> 	walk -f $recurse -- $* | grep -e $fullnames -e $suffixes >[2]/dev/null
> 
> and change the default to recurse='-n1,'. This would walk all named
> file arguments first, followed by the files in the directories.
> 

And if we do, we can |sort at the end, which means the results won't
trickle in, but it will be sorted. I'd prefer that.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9front] [PATCH] g: use xargs instead of finding complete file list before greping
  2022-01-06  1:35     ` ori
@ 2022-01-06 10:47       ` igor
  0 siblings, 0 replies; 10+ messages in thread
From: igor @ 2022-01-06 10:47 UTC (permalink / raw)
  To: 9front

Quoth ori@eigenstate.org:
> Quoth Michael Forney <mforney@mforney.org>:
> > On 2022-01-04, igor@9lab.org <igor@9lab.org> wrote:
> > > xargs also has a parallel mode that comes in handy to speed up search
> > > in this case.
> > 
> > Is there a possibility that this might result in intermixed grep
> > results (i.e. one process printing a line in the middle of a another
> > line)? That'd be my main concern with adding parallelism to xargs.

Isn't this only an issue with shared memory concurrency (i.e.
multiple threads trying to call printf concurrently)?  At least that
is the context I have encountered this issue before, where the
solution is to synchronise around printf.

The implementation in /sys/src/cmd/xargs.c uses fork to spawn multiple
processes.

A quick test searching across multiple large project bases with
sufficient parallelism for xargs did not reveal any issues of garbled
output. However, that just might have been luck…

Anyway, if we decide to use the `-p` flag with xargs the amount of
parallelism can be derived from $NPROC, as in mk(1).

> > >  case *
> > > -	pattern=$1
> > > -	shift
> > >  	for(f in $*){
> > >  		if(test -d $f)
> > > -			files=($files `$nl{walk -f $recurse -- $* \
> > > -				| grep -e $fullnames -e $suffixes >[2]/dev/null})
> > > +			walk -f $recurse -- $f \
> > > +				| grep -e $fullnames -e $suffixes >[2]/dev/null
> > >  		if not
> > > -			files=($files $f)
> > > +			echo -n $f$nl
> > 
> > If we don't care about ordering of results, we could also skip the
> > for-loop completely and replace this entire case with
> > 
> > 	walk -f -n0 -- $*
> > 	walk -f $recurse -- $* | grep -e $fullnames -e $suffixes >[2]/dev/null
> > 
> > and change the default to recurse='-n1,'. This would walk all named
> > file arguments first, followed by the files in the directories.
> > 

I don't think current `g` makes any promises about the order, at least
it doesn't document them. 

> And if we do, we can |sort at the end, which means the results won't
> trickle in, but it will be sorted. I'd prefer that.

#metoo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-06 10:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-01 10:26 [9front] [PATCH] g: use xargs instead of finding complete file list before greping Michael Forney
2022-01-02  1:13 ` 有澤 健治
2022-01-02  1:20   ` ori
2022-01-02  4:56     ` 有澤 健治
2022-01-02  2:34   ` Michael Forney
2022-01-02  1:28 ` 有澤 健治
2022-01-05  0:05 ` igor
2022-01-05  3:05   ` Michael Forney
2022-01-06  1:35     ` ori
2022-01-06 10:47       ` igor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).