9front - general discussion about 9front
 help / color / mirror / Atom feed
* [9front] this history(1) is scratched
@ 2022-11-12  8:15 qwx
  2022-11-12 11:03 ` Eckard Brauer
  2022-11-12 19:13 ` ori
  0 siblings, 2 replies; 14+ messages in thread
From: qwx @ 2022-11-12  8:15 UTC (permalink / raw)
  To: 9front

Hello,

I'm having an issue with our new history(1) implementation: printing
paths in the dump always fails for me.  Example from the manpage:

	; history /adm/users
	Feb 28 09:52:37 CET 2022 /adm/users 159 [adm]
	walk: path:  /n/dump/2022/1111/adm/users: ' ' directory entry not found
	Nov 12 08:39:54 CET 2022walk: path:  /n/dump/2022/1111/adm/users: ' ' directory entry not found
	walk: path:  /n/dump/2022/0228/adm/users: ' ' directory entry not found
	Nov 12 08:39:54 CET 2022walk: path:  /n/dump/2022/0228/adm/users: ' ' directory entry not found

Same for any other file -- a space character is prepended to the
paths, so walk(1) fails.  The issue seems to be in the awk command at
/rc/bin/history:39:

	[...]
 	awk '"/n/'$dump/$since'" <= $2 {next}
 	     $1 != qid {
 		qid=$1
		gsub($1, "")
 		print}' |
	[...]

This does a substitution on $0, which retains the space between $1 and
$2, even though $1 has been replaced by $2:

	; echo 'a b' | awk '{gsub($1, ""); print}'
	 b
	; echo 'a b' | awk '{gsub($1, ""); print $1}'
	b

If I understand the intent correctly, then the simplest fix is to add
the space character to the regex:

	; echo 'a b' | awk '{gsub($1" ", ""); print}'
	b

It's also not necessary to call `gsub' instead of just `sub'.

The thing is, this change is already a few months old, and I cannot
imagine that the script has not worked for anyone in all of this time,
including when it was first pushed, or that absolutely everyone has
been using the old C version without noticing it.  This behavior in
awk is surprising to me, but is not 9front-specific.  What the hell is
going on?  Can anyone else reproduce this?  Any idea what I could be
doing wrong?

Including a fix below, if any is actually needed.

Thanks,
qwx


diff 7a5a9b592af208aac719a6cfc0dacf44a5eebcef uncommitted
--- a//rc/bin/history
+++ b//rc/bin/history
@@ -36,7 +36,7 @@
 	awk '"/n/'$dump/$since'" <= $2 {next}
 	     $1 != qid {
 		qid=$1
-		gsub($1, "")
+		sub($1" ", "")
 		print}' |
 	while(new=`$nl{read}){
 		prfile $new

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12  8:15 [9front] this history(1) is scratched qwx
@ 2022-11-12 11:03 ` Eckard Brauer
  2022-11-12 12:16   ` Eckard Brauer
  2022-11-12 19:13 ` ori
  1 sibling, 1 reply; 14+ messages in thread
From: Eckard Brauer @ 2022-11-12 11:03 UTC (permalink / raw)
  To: 9front


> If I understand the intent correctly, then the simplest fix is to add
> the space character to the regex:
>
> 	; echo 'a b' | awk '{gsub($1" ", ""); print}'
> 	b

currently, I don't have a 9front/plan9 available here to check context,
but knowing awk a bit, i'd remark:

* while sub() replaces a single occurenco of 'a' in each input record,
  gsub replaces every one. If this is intended in that context, it's
  fine.

* you maybe should not add " " to $1, but "[ \t]+", as awk usually
  treats each consecutive sequence of whitespace as a field delimiter,
  except of course you can be sure that only one single space char will
  be there in any case.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 11:03 ` Eckard Brauer
@ 2022-11-12 12:16   ` Eckard Brauer
  2022-11-12 18:07     ` Eckard Brauer
  0 siblings, 1 reply; 14+ messages in thread
From: Eckard Brauer @ 2022-11-12 12:16 UTC (permalink / raw)
  To: 9front

> * you maybe should not add " " to $1, but "[ \t]+"

even gsub($1 FS "+", "") should work (field separator).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 12:16   ` Eckard Brauer
@ 2022-11-12 18:07     ` Eckard Brauer
  0 siblings, 0 replies; 14+ messages in thread
From: Eckard Brauer @ 2022-11-12 18:07 UTC (permalink / raw)
  To: 9front

> > * you maybe should not add " " to $1, but "[ \t]+"
>
> even gsub($1 FS "+", "") should work (field separator).

just checked the source: the former is more correct. one can see the
(default) input FS treatment in /sys/src/cmd/awk/lib.c:272-290.

if FS is " ", consecutive occurences of one or more space, tab or
newline characters are taken as separators; as newline is already the
(default) record separator, it can easily be left out, so gsub($1 "[
\t]+", $2) should be choosen here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12  8:15 [9front] this history(1) is scratched qwx
  2022-11-12 11:03 ` Eckard Brauer
@ 2022-11-12 19:13 ` ori
  2022-11-12 20:32   ` Eckard Brauer
  2022-11-13  3:35   ` Anthony Martin
  1 sibling, 2 replies; 14+ messages in thread
From: ori @ 2022-11-12 19:13 UTC (permalink / raw)
  To: 9front

Quoth qwx@sciops.net:
> The thing is, this change is already a few months old, and I cannot
> imagine that the script has not worked for anyone in all of this time,
> including when it was first pushed, or that absolutely everyone has
> been using the old C version without noticing it.  This behavior in
> awk is surprising to me, but is not 9front-specific.  What the hell is
> going on?  Can anyone else reproduce this?  Any idea what I could be
> doing wrong?

...I'm slightly embarrassed to admit that I've got the old history
in my binds before the new one, so I didn't notice I was using the
old one.

I can reproduce this.

Relatedly -- one thing I've repeatedly found myself wanting in
awk is a way to say "give me all text between fields $A..$B,
without modification" so, for example:

	label <and some file path with tabs, spaces, and other junk>

would be easy to handle; something like:

	map[$1] = $2..$NF

maybe someone better at awk than me knows a clean way to do this already?
should we add this to our awk?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 19:13 ` ori
@ 2022-11-12 20:32   ` Eckard Brauer
  2022-11-12 20:44     ` Eckard Brauer
  2022-11-12 21:26     ` ori
  2022-11-13  3:35   ` Anthony Martin
  1 sibling, 2 replies; 14+ messages in thread
From: Eckard Brauer @ 2022-11-12 20:32 UTC (permalink / raw)
  To: 9front

> Relatedly -- one thing I've repeatedly found myself wanting in
> awk is a way to say "give me all text between fields $A..$B,
> without modification" so, for example:
>
> 	label <and some file path with tabs, spaces, and other junk>
>
> would be easy to handle; something like:
>
> 	map[$1] = $2..$NF

hmm, not exactly what you want, but usually i do such things with
setting different RS, e.g.

awk -v RS='[<>]' '!(NR%2)'

but just noticed that p9 awk doesn't take RS as a regex, as gnu-awk
does. So i'd maybe translate one of them using tr before (also noticed
that escape sequences don't work with tr, however, this works somehow),
e.g.

echo 'outside<and inside>angle brackets' | tr '<>' '

' | awk '!(NR%2)'

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 20:32   ` Eckard Brauer
@ 2022-11-12 20:44     ` Eckard Brauer
  2022-11-12 20:53       ` Eckard Brauer
  2022-11-12 21:26     ` ori
  1 sibling, 1 reply; 14+ messages in thread
From: Eckard Brauer @ 2022-11-12 20:44 UTC (permalink / raw)
  To: 9front

>
> echo 'outside<and inside>angle brackets' | tr '<>' '
>
> ' | awk '!(NR%2)'

Bullshit, as after newline it gets out of sync. We need some char not
contained in the input, e.g.

echo 'outside<and inside>angle brackets' | tr '<>' '||' | awk -v 'RS=|' '!(NR%2)'

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 20:44     ` Eckard Brauer
@ 2022-11-12 20:53       ` Eckard Brauer
  0 siblings, 0 replies; 14+ messages in thread
From: Eckard Brauer @ 2022-11-12 20:53 UTC (permalink / raw)
  To: 9front

but however, having that gnu extension:

If RS is any single character, that character separates records. Otherwise, 
RS  is  a regular expression.  Text in the input that matches this regular
expression separates the record.

could maybe avoid those trickery.

Just as a remark: awk seems to have some difficulties with unusual chars
(i tried with "§", didn't work as RS, maybe other (multbyte?) too) as
RS.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 20:32   ` Eckard Brauer
  2022-11-12 20:44     ` Eckard Brauer
@ 2022-11-12 21:26     ` ori
  2022-11-12 22:43       ` umbraticus
  1 sibling, 1 reply; 14+ messages in thread
From: ori @ 2022-11-12 21:26 UTC (permalink / raw)
  To: 9front

Quoth Eckard Brauer <eckard.brauer@gmx.de>:
> 
> 
> but just noticed that p9 awk doesn't take RS as a regex, as gnu-awk
> does. So i'd maybe translate one of them using tr before (also noticed
> that escape sequences don't work with tr, however, this works somehow),
> e.g.

I think I explained poorly:

The problem is that I want to separate only *some* fields
or records based on the separator.

	foo bar     baz quux

should have 'bar     baz quux' extractable as a single
record.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 21:26     ` ori
@ 2022-11-12 22:43       ` umbraticus
  0 siblings, 0 replies; 14+ messages in thread
From: umbraticus @ 2022-11-12 22:43 UTC (permalink / raw)
  To: 9front

not sure how “clean” but I guess this works:

{key = $1; sub("[^ \t]+[ \t]+", ""); map[key] = $0}

umbraticus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-12 19:13 ` ori
  2022-11-12 20:32   ` Eckard Brauer
@ 2022-11-13  3:35   ` Anthony Martin
  2022-11-13 23:20     ` qwx
  1 sibling, 1 reply; 14+ messages in thread
From: Anthony Martin @ 2022-11-13  3:35 UTC (permalink / raw)
  To: 9front

ori@eigenstate.org once said:
> Quoth qwx@sciops.net:
> > The thing is, this change is already a few months old, and I cannot
> > imagine that the script has not worked for anyone in all of this time,
> > including when it was first pushed, or that absolutely everyone has
> > been using the old C version without noticing it.  This behavior in
> > awk is surprising to me, but is not 9front-specific.  What the hell is
> > going on?  Can anyone else reproduce this?  Any idea what I could be
> > doing wrong?
>
> ...I'm slightly embarrassed to admit that I've got the old history
> in my binds before the new one, so I didn't notice I was using the
> old one.

I think we all are. The default namespace set up by init(8) binds
/rc/bin after /$cputype/bin. A sysupdate(1) followed by a clean
build wouldn't blow away the old binary. Maybe we should detect
duplicate files in /bin and warn about them when updating.

% cat /bin/bindups
#!/bin/rc

rfork e
fs = `{cd /bin; ls | sort | uniq -d}
ds = `{ns | awk '$NF == "/bin" { print $(NF-1) }'}
>[2]/dev/null for(f in $fs) ls -ld $ds^/$f

Cheers,
  Anthony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-13  3:35   ` Anthony Martin
@ 2022-11-13 23:20     ` qwx
  2022-11-15 11:49       ` qwx
  0 siblings, 1 reply; 14+ messages in thread
From: qwx @ 2022-11-13 23:20 UTC (permalink / raw)
  To: 9front

Hello,

Sorry for the delay.  After a bunch of testing, I agree with Eckard
wrt.  how the substitution should be done, I had used `sub($1
"["FS"]+", "")' with success, but `gsub($1 "[ \t]+", "")' might make
more sense.

	; history /lib/vgadb
	Aug 21 10:50:43 CES 2022 /lib/vgadb 44403 [qwx]
	Aug 21 10:50:43 CES 2022 /n/dump/2022/1113/lib/vgadb 44403 [qwx]
	Aug 20 20:42:16 CES 2022 /n/dump/2022/0821/lib/vgadb 44403 [qwx]
	Aug 12 06:47:55 CES 2022 /n/dump/2022/0820/lib/vgadb 44398 [qwx]
	[...]


Unfortunately, there's another issue, handling files with spaces:

	; history '/lib/v/The Power Glove (NES) - Angry Video Game Nerd (AVGN)-MYDuy7wM8Gk.mkv'
	Aug 30 14:05:32 CES 2021 /lib/v/The 0 [Glove]
	walk: path: '/n/dump/2022/1113/lib/v/The Power Glove (NES) - Angry Video Game Nerd (AVGN)-MYDuy7wM8Gk.mkv': '''' directory entry not found
	Nov 13 23:20:18 CET 2022walk: path: '/n/dump/2022/1113/lib/v/The Power Glove (NES) - Angry Video Game Nerd (AVGN)-MYDuy7wM8Gk.mkv': '''' directory entry not found


That seems easy to fix: awk is fed the output of ls -qr, and it's
enough to add -Q there.  Later, pretty printing the paths must also be
adjusted.  That seems to be enough:

	; history '/lib/v/The Power Glove (NES) - Angry Video Game Nerd (AVGN)-MYDuy7wM8Gk.mkv'
	Aug 30 14:05:32 CES 2021 /lib/v/The Power Glove (NES) - Angry Video Game Nerd (AVGN)-MYDuy7wM8Gk.mkv 33349226 [qwx]
	Aug 30 14:05:32 CES 2021 /n/dump/2022/1113/lib/v/The Power Glove (NES) - Angry Video Game Nerd (AVGN)-MYDuy7wM8Gk.mkv 33349226 [qwx]


I don't know if the quotes should be preserved in that case.  I also
agree with Anthony that care should be taken when something like this
occurs; then again it's rare enough that maybe forcing an announcement
on the ml is enough?  This has usually been the way thus far for any
breaking or important changes.

Patch below, any comments welcome.  Thanks!

Cheers,
qwx


diff 7fcf96b44dc4765605b827ba49d389b5711d7e72 uncommitted
--- a//rc/bin/history
+++ b//rc/bin/history
@@ -6,7 +6,7 @@
 
 fn prfile {
 	echo -n `{date $flagu -f 'MMM DD hh:mm:ss ZZZ YYYY' `{walk -e m $1}}
-	walk -e psM $1 | awk '{printf " %s %lld [%s]\n", $1,$2,$3,$4}'
+	walk -e psM $1 | awk '{$NF="["$NF"]"; print " "$0}'
 }
 
 fn diffflags {
@@ -31,12 +31,12 @@
 		echo history: warning: $file does not exist >[1=2]
 
 	old=()
-	ls -qr /n/$dump/*/*/$file >[2] /dev/null |
+	ls -Qqr /n/$dump/*/*/$file >[2] /dev/null |
 	sed  's/\(([^ ]*) *([^ ]*) *([^ ]*)\)/\1\2\3/p' |
 	awk '"/n/'$dump/$since'" <= $2 {next}
 	     $1 != qid {
 		qid=$1
-		gsub($1, "")
+		gsub($1"[ \t]+", "")
 		print}' |
 	while(new=`$nl{read}){
 		prfile $new

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-13 23:20     ` qwx
@ 2022-11-15 11:49       ` qwx
  2022-11-15 21:59         ` qwx
  0 siblings, 1 reply; 14+ messages in thread
From: qwx @ 2022-11-15 11:49 UTC (permalink / raw)
  To: 9front

On Mon Nov 14 00:20:45 +0100 2022, qwx@sciops.net wrote:
[...]
> diff 7fcf96b44dc4765605b827ba49d389b5711d7e72 uncommitted
> --- a//rc/bin/history
> +++ b//rc/bin/history
> @@ -6,7 +6,7 @@
>  
>  fn prfile {
>  	echo -n `{date $flagu -f 'MMM DD hh:mm:ss ZZZ YYYY' `{walk -e m $1}}
> -	walk -e psM $1 | awk '{printf " %s %lld [%s]\n", $1,$2,$3,$4}'
> +	walk -e psM $1 | awk '{$NF="["$NF"]"; print " "$0}'
>  }
>  
>  fn diffflags {
> @@ -31,12 +31,12 @@
>  		echo history: warning: $file does not exist >[1=2]
>  
>  	old=()
> -	ls -qr /n/$dump/*/*/$file >[2] /dev/null |
> +	ls -Qqr /n/$dump/*/*/$file >[2] /dev/null |
>  	sed  's/\(([^ ]*) *([^ ]*) *([^ ]*)\)/\1\2\3/p' |
>  	awk '"/n/'$dump/$since'" <= $2 {next}
>  	     $1 != qid {
>  		qid=$1
> -		gsub($1, "")
> +		gsub($1"[ \t]+", "")
>  		print}' |
>  	while(new=`$nl{read}){
>  		prfile $new

Unless there are any objections, I'll push this later today.
Thanks all!

Cheers,
qwx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9front] this history(1) is scratched
  2022-11-15 11:49       ` qwx
@ 2022-11-15 21:59         ` qwx
  0 siblings, 0 replies; 14+ messages in thread
From: qwx @ 2022-11-15 21:59 UTC (permalink / raw)
  To: 9front

Pushed.

Cheers,
qwx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-11-15 22:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-12  8:15 [9front] this history(1) is scratched qwx
2022-11-12 11:03 ` Eckard Brauer
2022-11-12 12:16   ` Eckard Brauer
2022-11-12 18:07     ` Eckard Brauer
2022-11-12 19:13 ` ori
2022-11-12 20:32   ` Eckard Brauer
2022-11-12 20:44     ` Eckard Brauer
2022-11-12 20:53       ` Eckard Brauer
2022-11-12 21:26     ` ori
2022-11-12 22:43       ` umbraticus
2022-11-13  3:35   ` Anthony Martin
2022-11-13 23:20     ` qwx
2022-11-15 11:49       ` qwx
2022-11-15 21:59         ` qwx

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).