9front - general discussion about 9front
 help / color / mirror / Atom feed
* [9front] seq: fix infinite loop
@ 2021-08-15 22:02 Sean Hinchee
  2021-08-16 21:45 ` Amavect
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sean Hinchee @ 2021-08-15 22:02 UTC (permalink / raw)
  To: 9front; +Cc: nabijaczleweli

наб on twitter, then over e-mail (CC'd) pointed out that seq(1) in
9front with some very large inputs

For example:

    ; seq 2e16 20000000000000002

would loop forever.

After patch application, seq with the above example would print
'2e+16' once and then exit.

Patch also brings the usage message in line with the manual usage.

Patch link: http://okturing.com/src/11872/body

Inline:

--- //.git/fs/object/2af46e406bbd443ae10025777247798a685afc3c/tree/sys/src/cmd/seq.c
+++ sys/src/cmd/seq.c
@@ -11,7 +11,7 @@
 void
 usage(void)
 {
- fprint(2, "usage: seq [-fformat] [-w] [first [incr]] last\n");
+ fprint(2, "usage: seq [-w] [-fformat] [first [incr]] last\n");
  exits("usage");
 }

@@ -67,6 +67,7 @@
  default:
  goto out;
  }ARGEND
+
     out:
  if(argc<1 || argc>3)
  usage();
@@ -81,6 +82,7 @@
  }
  if(!format)
  buildfmt();
+
  if(incr > 0){
  for(val = min; val <= max; val += incr){
  n = sprint(buf, format, val);
@@ -88,6 +90,8 @@
  for(j=0; buf[j]==' '; j++)
  buf[j] ='0';
  write(1, buf, n);
+ if(val == max)
+ break;
  }
  }else{
  for(val = min; val >= max; val += incr){
@@ -96,7 +100,10 @@
  for(j=0; buf[j]==' '; j++)
  buf[j] ='0';
  write(1, buf, n);
+ if(val == max)
+ break;
  }
  }
+
  exits(0);
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] seq: fix infinite loop
  2021-08-15 22:02 [9front] seq: fix infinite loop Sean Hinchee
@ 2021-08-16 21:45 ` Amavect
  2021-08-18 22:53   ` Eckard Brauer
  2021-08-16 22:24 ` [9front] " Anthony Martin
  2021-08-18  2:49 ` [9front] " ori
  2 siblings, 1 reply; 8+ messages in thread
From: Amavect @ 2021-08-16 21:45 UTC (permalink / raw)
  To: 9front

On Sun, 15 Aug 2021 15:02:38 -0700
Sean Hinchee <henesy.dev@gmail.com> wrote:
>     ; seq 2e16 20000000000000002
> would loop forever.
At this range, 2e16 and 20000000000000002 are represented by the same
double floating point value.

> + if(val == max)
> + break;
We can cover more cases.
For example, your patch does not fix the following:
seq 9007199254740992 9007199254740996
The issue is with floating point precision.

A double has a 52 bit mantissa, meaning that it allows 2^52 different
values before the exponent is incremented (thus, floating point.)
A double has a 11 bit exponent, and 1 sign bit.
Exponents range from −1022 to +1023, with exponents −1023 and
+1024 reserved for special numbers.

At an exponent e, the mantissa represents a value between [2^e,2^(e+1)).
The precision is range/buckets, so we can calculate the precision.
At an exponent e and mantissa m, the precision is 2^(e-m)
(2^(e+1)-2^e)/2^m = 2^e/2^m = 2^(e-m)

Between [1, 2), the exponent is 0.
The precision is 2^(0-52) ≈ 0.0000000000000002220446
Between [2^52, 2^53), the exponent is 52.
The precision is 2^(52-52) = 1.

2e16 is between [2^54, 2^55), so the exponent is 54.
The precision is 2^(54-52) = 4.
Halfway ties round to the even bucket.

This is demonstrated by the following.

# infinite loop
seq 2e16 19999999999999998

# 2^53-1, halts
seq 9007199254740991 9007199254740991

# 2^53, infinite loop due to halfway tie rounding down to even bucket.
seq 9007199254740992 9007199254740992
# halts, halfway tie rounds up to even bucket, 9007199254740996
seq 9007199254740994 9007199254740994

# different due to rounding error in loss of precision
seq 5000000000000000 1.1 5000000000000002
seq 0 1.1 2


I don't believe that there is an ideal resolution to rounding error in
that last example (libmp would be too complex), but we can cover more
cases if we check that the previous val value is different from the
current val.
The correct thing to do is to throw an error warning about precision.

Thanks,
Amavect

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9front] Re: seq: fix infinite loop
  2021-08-15 22:02 [9front] seq: fix infinite loop Sean Hinchee
  2021-08-16 21:45 ` Amavect
@ 2021-08-16 22:24 ` Anthony Martin
  2021-08-16 23:38   ` ori
  2021-08-21  1:30   ` Amavect
  2021-08-18  2:49 ` [9front] " ori
  2 siblings, 2 replies; 8+ messages in thread
From: Anthony Martin @ 2021-08-16 22:24 UTC (permalink / raw)
  To: 9front; +Cc: nabijaczleweli

Sean Hinchee <henesy.dev@gmail.com> once said:
> наб on twitter, then over e-mail (CC'd) pointed out that seq(1) in
> 9front with some very large inputs
>
> For example:
>
>     ; seq 2e16 20000000000000002
>
> would loop forever.
>
> After patch application, seq with the above example would print
> '2e+16' once and then exit.
>
> [...]
>
> @@ -88,6 +90,8 @@
>   for(j=0; buf[j]==' '; j++)
>   buf[j] ='0';
>   write(1, buf, n);
> + if(val == max)
> + break;

This isn't quite right. There are many cases where the
precision of any one of the values in {min, incr, max}
will not be enough to make progress in the loop body.
You need something like:

	/* ensure adequate precision of loop invariants */
	if(incr > 0 && (max+incr <= max || min+incr <= min))
		sysfatal("increment is too small");
	if(incr < 0 && (max+incr >= max || min+incr >= min))
		sysfatal("increment is too small");

before going into the loop.

And that's just off the top of my head. I'd need to do
more testing to be confident. Floating point can be
nasty.

But really, if you're playing games at the boundaries
of floating point precision, you're bound to lose. I
suggest we either a) do nothing and leave the loop
exactly as it is described in the manual but add a note
in the BUGS section, or b) go back to the v8 method
that calculates the number of steps in the loop before
executing it. Even with the latter method you can see
odd results due to precision limitations but at least
there will be no infinite loops.

I vote for "a".

Cheers,
  Anthony

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] Re: seq: fix infinite loop
  2021-08-16 22:24 ` [9front] " Anthony Martin
@ 2021-08-16 23:38   ` ori
  2021-08-17  2:15     ` hiro
  2021-08-21  1:30   ` Amavect
  1 sibling, 1 reply; 8+ messages in thread
From: ori @ 2021-08-16 23:38 UTC (permalink / raw)
  To: 9front; +Cc: nabijaczleweli

Quoth Anthony Martin <ality@pbrane.org>:
> But really, if you're playing games at the boundaries
> of floating point precision, you're bound to lose. I
> suggest we either a) do nothing and leave the loop
> exactly as it is described in the manual but add a note
> in the BUGS section, or b) go back to the v8 method
> that calculates the number of steps in the loop before
> executing it. Even with the latter method you can see
> odd results due to precision limitations but at least
> there will be no infinite loops.
> 
> I vote for "a".
> 

I agree -- *BUT* I like the consolidation of the
two loops. I'd take that.

I also wonder if:

	for(val = min; val <= max; val += incr){
		if(val == prev)
			sysfatal("insufficient precision")
		...
		prev = val
	}

covers the infinite loop case sufficiently.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] Re: seq: fix infinite loop
  2021-08-16 23:38   ` ori
@ 2021-08-17  2:15     ` hiro
  0 siblings, 0 replies; 8+ messages in thread
From: hiro @ 2021-08-17  2:15 UTC (permalink / raw)
  To: 9front

sounds like a good addition to the BUGS section of the man page

On 8/17/21, ori@eigenstate.org <ori@eigenstate.org> wrote:
> Quoth Anthony Martin <ality@pbrane.org>:
>> But really, if you're playing games at the boundaries
>> of floating point precision, you're bound to lose. I
>> suggest we either a) do nothing and leave the loop
>> exactly as it is described in the manual but add a note
>> in the BUGS section, or b) go back to the v8 method
>> that calculates the number of steps in the loop before
>> executing it. Even with the latter method you can see
>> odd results due to precision limitations but at least
>> there will be no infinite loops.
>>
>> I vote for "a".
>>
>
> I agree -- *BUT* I like the consolidation of the
> two loops. I'd take that.
>
> I also wonder if:
>
> 	for(val = min; val <= max; val += incr){
> 		if(val == prev)
> 			sysfatal("insufficient precision")
> 		...
> 		prev = val
> 	}
>
> covers the infinite loop case sufficiently.
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] seq: fix infinite loop
  2021-08-15 22:02 [9front] seq: fix infinite loop Sean Hinchee
  2021-08-16 21:45 ` Amavect
  2021-08-16 22:24 ` [9front] " Anthony Martin
@ 2021-08-18  2:49 ` ori
  2 siblings, 0 replies; 8+ messages in thread
From: ori @ 2021-08-18  2:49 UTC (permalink / raw)
  To: 9front; +Cc: nabijaczleweli

Quoth Sean Hinchee <henesy.dev@gmail.com>:
> наб on twitter, then over e-mail (CC'd) pointed out that seq(1) in
> 9front with some very large inputs
> 
> For example:
> 
>     ; seq 2e16 20000000000000002
> 
> would loop forever.
> 
> After patch application, seq with the above example would print
> '2e+16' once and then exit.
> 

Another question: does seq even need to do floating
point? Can we change it to simply do integers?

Surprisingly, it seems like we don't even use seq much
in our base system.  If I had to guess at how often it
was used, I'd have been off by an order of magnitude at
least.

	% cd /rc/bin && g seq `{walk -f .} 
	./ape/libpng-config:46:   --static        revise subsequent outputs for static linking
	./homespool:9: echo creating seqno file
	./homespool:10: touch $home/spool/ctrl/seqno
	./homespool:12: chmod 222 $home/spool/ctrl/seqno
	./nietzsche:3: 	n=`{seq 1 638 | fortune /fd/0}
	./newt:90: 	case ,*;	seq 1 `{echo $* | sed 's/,//g'}
	./newt:91: 	case *,;	seq `{echo $* | sed 's/,//g'} $posts($#posts)
	./newt:92: 	case *,*;	seq `{echo $* | sed 's/,/ /g'}
	./newt:167: 		posts=`{seq 1 $#rposts}
	./newt:231: 		r=`{seq $r(1) `{echo $r(1)^+10|bc}}
	./newt:233: 			r=`{seq $r(1) $posts($#posts)}

Out of the places that I see 'seq', four are
accidentally picked up, and the remainder would not be
affected by this.

While we're here:

	% seq -f '%s' 10
	seq 4704701: suicide: sys: trap: general protection violation pc=0x20331f

I wonder if it'd be worth scanning for unknown format
verbs and barfing on them, or whether that's too much
complexity.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] seq: fix infinite loop
  2021-08-16 21:45 ` Amavect
@ 2021-08-18 22:53   ` Eckard Brauer
  0 siblings, 0 replies; 8+ messages in thread
From: Eckard Brauer @ 2021-08-18 22:53 UTC (permalink / raw)
  To: 9front

Am Mon, 16 Aug 2021 16:45:04 -0500
schrieb Amavect <amavect@gmail.com>:

> On Sun, 15 Aug 2021 15:02:38 -0700
> Sean Hinchee <henesy.dev@gmail.com> wrote:
>  [...]  
> At this range, 2e16 and 20000000000000002 are represented by the same
> double floating point value.
> 
>  [...]  
> We can cover more cases.
> For example, your patch does not fix the following:
> seq 9007199254740992 9007199254740996
> The issue is with floating point precision.
> 
> A double has a 52 bit mantissa, meaning that it allows 2^52 different
> values before the exponent is incremented (thus, floating point.)
> A double has a 11 bit exponent, and 1 sign bit.
> Exponents range from −1022 to +1023, with exponents −1023 and
> +1024 reserved for special numbers.
> 
> At an exponent e, the mantissa represents a value between
> [2^e,2^(e+1)). The precision is range/buckets, so we can calculate
> the precision. At an exponent e and mantissa m, the precision is
> 2^(e-m) (2^(e+1)-2^e)/2^m = 2^e/2^m = 2^(e-m)
> 
> Between [1, 2), the exponent is 0.
> The precision is 2^(0-52) ≈ 0.0000000000000002220446
> Between [2^52, 2^53), the exponent is 52.
> The precision is 2^(52-52) = 1.
> 
> 2e16 is between [2^54, 2^55), so the exponent is 54.
> The precision is 2^(54-52) = 4.
> Halfway ties round to the even bucket.
> 
> This is demonstrated by the following.
> 
> # infinite loop
> seq 2e16 19999999999999998
> 
> # 2^53-1, halts
> seq 9007199254740991 9007199254740991
> 
> # 2^53, infinite loop due to halfway tie rounding down to even bucket.
> seq 9007199254740992 9007199254740992
> # halts, halfway tie rounds up to even bucket, 9007199254740996
> seq 9007199254740994 9007199254740994
> 
> # different due to rounding error in loss of precision
> seq 5000000000000000 1.1 5000000000000002
> seq 0 1.1 2
> 
> 
> I don't believe that there is an ideal resolution to rounding error in
> that last example (libmp would be too complex), but we can cover more
> cases if we check that the previous val value is different from the
> current val.
> The correct thing to do is to throw an error warning about precision.
> 
> Thanks,
> Amavect

IMHO the args could easily be double, the internal counter can never.
seq could be kind of that:

if (argc == 3)
	delta = argv[2];
else
	delta = 1;
if (!(argv[1] + delta > argv[1]))
	error(...);
for (ulong i=0; i<MAXULONG && val <= argv[argc], i++)
	fprintf("%whatever\n", val = (argv[1]+i*delta));

or do i have too much beer tonight...?

-- 
Corona? Ich desinfiziere regelmäßig mit Bier. 
What Corona? I regularly disinfect with beer.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] Re: seq: fix infinite loop
  2021-08-16 22:24 ` [9front] " Anthony Martin
  2021-08-16 23:38   ` ori
@ 2021-08-21  1:30   ` Amavect
  1 sibling, 0 replies; 8+ messages in thread
From: Amavect @ 2021-08-21  1:30 UTC (permalink / raw)
  To: 9front

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On Mon, 16 Aug 2021 15:24:29 -0700
Anthony Martin <ality@pbrane.org> wrote:

> But really, if you're playing games at the boundaries
> of floating point precision, you're bound to lose. I
> suggest we either a) do nothing and leave the loop
> exactly as it is described in the manual but add a note
> in the BUGS section, or b) go back to the v8 method
> that calculates the number of steps in the loop before
> executing it. Even with the latter method you can see
> odd results due to precision limitations but at least
> there will be no infinite loops.
> 
> I vote for "a".

I vote for 'b', at the toss of a coin.

I've spent some time bikeshedding this sickly program, and attached is
the sickly result.
It does neat things like `seq -w 003` printing out 001, 002, 003.
It probably shouldn't do that, though.
It also doesn't use exponential notation to print out numbers.
This allows seq to not be limited from -999999 to 999999.
Even still, the result doesn't feel quite right.
Implementing a floating point number parser in seq is weird.
Invalid number combinations cause it to crash.

The following does not print out 0.3 in either implementation:
seq 0.1 0.1 0.3
The only way to correctly print out decimals is by nesting seq:
for(i in `{seq 0 9}) for(j in `{seq -w 0 9}) echo $i.$j

Therefore, I propose replacing floating point with integers.
If we're lucky, seq can count decimals using integers.
We can also implement hexadecimal counting.

Thanks,
Amavect

[-- Attachment #2: seq.c --]
[-- Type: text/x-c++src, Size: 2691 bytes --]

#include <u.h>
#include <libc.h>
#include <ctype.h>

double	first = 1.0;
double	last = 0.0;
double	step = 1.0;
double	nsteps = 0.0;
double	final = 0.0;
int	constant;
int	consigned;
char	*format;
char	*zarg;
char	*sstep;

void
usage(void)
{
	fprint(2, "usage: %s [-w] [-fformat] [first [step]] last\n", argv0);
	exits("usage");
}

/*
 * [+-]?(([0-9]+([.][0-9]*)?)|([.][0-9]+))([Ee][+-]?[0-9]+)?
 */
void
getwidths(char *s, int *mh, int *mt)
{
	int h, t, e, m;

	h = 0;
	t = 0;
	m = 0;
	if(*s == '+' || *s == '-'){
		s++;
		m++;
	}
	if(isdigit(*s)){
		s++;
		h++;
		while(isdigit(*s)){
			s++;
			h++;
		}
		if(*s == '.'){
			s++;
			while(isdigit(*s)){
				s++;
				t++;
			}
		}
	}else if(*s == '.'){
		s++;
		if(!isdigit(*s++))
			goto out;
		while(isdigit(*s)){
			s++;
			t++;
		}
	}else
		goto out;
	if(*s|32 == 'e'){
		s++;
		e = atoi(s);
		if(e > 307)
			e = 307;
		if(e < -30)
			e = -30;
		h += e;
		t -= e;
	}
out:
	if(h <= 0)
		h = 1 + m;
	if(t < 0)
		t = 0;
	if(h > *mh)
		*mh = h;
	if(t > *mt)
		*mt = t;
	return;
}

int
headwidth(double d)
{
	int i;

	i = 1;
	if(d < 0.0){
		d = -d;
		i++;
	}
	if(d > 1.0)
		i += log10(d);
	return i;
}

void
buildfmt(void)
{
	int h, t, z;
	static char fmt[16];

	h = 0;
	t = 0;

	/* determine head and tail widths from bounds and increment */
	getwidths(zarg, &h, &t);
	if(sstep)
		getwidths(sstep, &z, &t);
	z = headwidth(final);
	if(z > h)
		h = z;

	if(constant){
		if(t > 0)
			h += t + 1;
		if(first < 0.0 || last < 0.0){
			consigned = 1;
			sprint(fmt, "%%+%d.%df\n", h, t);
		}else
			sprint(fmt, "%%%d.%df\n", h, t);
	}else
		sprint(fmt, "%%.%df\n", t);
	format = fmt;
}

void
main(int argc, char *argv[]){
	int j, n;
	char buf[512], ffmt[4096];
	double val, i, preval;

	ARGBEGIN{
	case 'w':
		if(format)
			sysfatal("-w incompatible with -f");
		constant++;
		break;
	case 'f':
		if(constant)
			sysfatal("-f incompatible with -w");
		format = EARGF(usage());
		if(format[strlen(format)-1] != '\n'){
			sprint(ffmt, "%s\n", format);
			format = ffmt;
		}
		break;
	default:
		goto out;
	}ARGEND
out:
	if(argc<1 || argc>3)
		usage();
	last = atof(argv[argc-1]);
	if(argc > 1)
		first = atof(argv[0]);
	if(argc > 2){
		sstep = argv[1];
		step = atof(argv[1]);
	}
	if(step == 0.0)
		sysfatal("zero increment");
	nsteps = (last - first) / step;
	final = floor(nsteps) * step + first;
	zarg = argv[0];
	if(!format)
		buildfmt();
	preval = step + first;
	for(i = 0; i <= nsteps; i++){
		val = i * step + first;
		n = sprint(buf, format, val);
		if(constant){
			for(j = 0; buf[j] == ' '; j++)
				buf[j] = '0';
			if(consigned && j > 0){
				buf[0] = buf[j];
				buf[j] = '0';
			}
		}
		write(1, buf, n);
		preval = val;
	}
	exits(0);
}


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-21  1:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-15 22:02 [9front] seq: fix infinite loop Sean Hinchee
2021-08-16 21:45 ` Amavect
2021-08-18 22:53   ` Eckard Brauer
2021-08-16 22:24 ` [9front] " Anthony Martin
2021-08-16 23:38   ` ori
2021-08-17  2:15     ` hiro
2021-08-21  1:30   ` Amavect
2021-08-18  2:49 ` [9front] " ori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).