zsh converts a floating-point number to string with too much precision

zsh-workers
 help / color / mirror / code / Atom feed

* zsh converts a floating-point number to string with too much precision
@ 2019-12-20  1:37 Vincent Lefevre
  2019-12-20  3:38 ` Mikael Magnusson
  2019-12-20 16:58 ` Stephane Chazelas
  0 siblings, 2 replies; 12+ messages in thread
From: Vincent Lefevre @ 2019-12-20  1:37 UTC (permalink / raw)
  To: zsh-workers

With zsh 5.7.1, I get:

zira% echo $((1.1))
1.1000000000000001

because zsh seems to first select the precision independently
from the value, i.e. 17 to be able to convert the string back
to floating point, preserving the original value, then it
outputs the closest number in this precision.

Instead, zsh should select the minimum precision so that the
inverse conversion can give the original value, i.e. it should
output 1.1 here.

FYI, GNU MPFR has the same issue with mpfr_printf and %Re (with
an empty precision field), and I regard this as a bug:
  https://sympa.inria.fr/sympa/arc/mpfr/2019-12/msg00000.html
  https://sympa.inria.fr/sympa/arc/mpfr/2019-12/msg00001.html

Note that Java does it right:

zira:~> cat tst.java
public class tst
{
  public static void main(String[] args)
  {
    double x;
    x = 0x1.1999999999999p+0;
    System.out.println(x);
    x = 0x1.199999999999ap+0;
    System.out.println(x);
    x = 0x1.199999999999bp+0;
    System.out.println(x);
  }
}
zira:~> javac tst.java
zira:~> java tst
1.0999999999999999
1.1
1.1000000000000003

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-20  1:37 zsh converts a floating-point number to string with too much precision Vincent Lefevre
@ 2019-12-20  3:38 ` Mikael Magnusson
  2019-12-20 16:58 ` Stephane Chazelas
  1 sibling, 0 replies; 12+ messages in thread
From: Mikael Magnusson @ 2019-12-20  3:38 UTC (permalink / raw)
  To: zsh-workers

On 12/20/19, Vincent Lefevre <vincent@vinc17.net> wrote:
> With zsh 5.7.1, I get:
>
> zira% echo $((1.1))
> 1.1000000000000001
>
> because zsh seems to first select the precision independently
> from the value, i.e. 17 to be able to convert the string back
> to floating point, preserving the original value, then it
> outputs the closest number in this precision.
>
> Instead, zsh should select the minimum precision so that the
> inverse conversion can give the original value, i.e. it should
> output 1.1 here.

You can use typeset -F1 one=1.1 to specify the output precision of a
parameter (note that this doesn't affect the float value stored, you
can change to -F20 later to display more decimals without
reassignment). So in your case you could count the number of digits in
the string after the . and then pass that to -F if you wanted to.

-- 
Mikael Magnusson

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-20  1:37 zsh converts a floating-point number to string with too much precision Vincent Lefevre
  2019-12-20  3:38 ` Mikael Magnusson
@ 2019-12-20 16:58 ` Stephane Chazelas
  2019-12-20 17:12   ` Roman Perepelitsa
  2019-12-21  1:00   ` Vincent Lefevre
  1 sibling, 2 replies; 12+ messages in thread
From: Stephane Chazelas @ 2019-12-20 16:58 UTC (permalink / raw)
  To: zsh-workers

2019-12-20 02:37:11 +0100, Vincent Lefevre:
> With zsh 5.7.1, I get:
> 
> zira% echo $((1.1))
> 1.1000000000000001
> 
> because zsh seems to first select the precision independently
> from the value, i.e. 17 to be able to convert the string back
> to floating point, preserving the original value, then it
> outputs the closest number in this precision.
> 
> Instead, zsh should select the minimum precision so that the
> inverse conversion can give the original value, i.e. it should
> output 1.1 here.

And what should it give for

$((1.1000000000000001)) ?

(hint, 1.1000000000000001 and 1.1 have the same "double"
representation).

See also:

https://unix.stackexchange.com/questions/422122/why-does-0-1-expand-to-0-10000000000000001-in-zsh

Reproduced below for convenience:

════════════════════════════════════════════════════════════════

TL;DR

zsh chooses a decimal representation for the double binary
numbers that it uses for evaluating floating point arithmetics
that preserves their information fully, that is safe for reinput
into its arithmetic expressions. And that is done at the expense
of cosmetic. For that, it needs 17 significant digits, and make
sure the expansion always includes a . or e so it's treated as
float on reinput.

That "full-precision" decimal representation could be seen as an
intermediary format between the binary double precision
machine-only numbers and a human-readable one. An intermediary
format that is understood by all tools that understand decimal
representations of floating point numbers.

In the case of 0.1 as used in a arithmetic expression, it so
happens that the closest 17 digit decimal representation of the
double precision binary number closest to 0.1 is
0.10000000000000001, an artefact caused by the limit of the
precision of double precision numbers and rounding.

Other shells privilege the cosmetic aspect and do lose some
information upon conversion to decimal (though still try to
preserve as much precision as possible within that additional
constraint). Both approaches have their merits and drawbacks,
see below for details.

awk doesn't have this kind of problematic as it's not a shell
and doesn't have to translate back and forth constantly between
binary and decimal representation when manipulating floating
points.

zsh's approach

zsh, like many other programming languages (including yash,
ksh93) and many tools used from the shell (like awk, printf...)
that deal with floating point numbers, perform arithmetic
operations on a binary representation of those numbers.

That's convenient and efficient because those operations are
supported by the C compiler and on most architectures are done
by the processor itself.

zsh uses the double C type for its internal representation of
real numbers..

On most architectures (and with most compilers), those are
implemented using IEEE 754 double-precision binary floating
points.

Those are implemented a bit like our 1.12e4 engineering notation
decimal numbers but in binary (base 2) instead of decimal (base
10). With the mantissa on 53 bits (1 of which implied) and the
exponent on 11 bits (and a sign bit). Those generally give you
more precision than you'd ever need.

When evaluating an arithmetic expression like 1. / 10 (which
here has a literal float constant as one of the operands), zsh
converts them from their text decimal representation to doubles
internally (using the standard strtod() function) and does the
operation which results in a new double.

1/10 can be represented with a decimal notation as 0.1 or 1e-1,
but just like we can't represent 1/3 in decimal (it would be
fine in base 3, 6 or 9), 1/10 cannot be represented in binary
(as 10 is not a power of 2). Like 1/3 is 0.333333[adlib] in
decimal, 1/10 is .0001100110011001100110011001[adlib] or
1.10011001100110011001[adlib]p-4 in binary (where p-4 stands for
2^-4, (the 4 here in decimal)).

As we can only store 52 bits worth of those 1001..., 1/10 as a
double becomes
1.1001100110011001100110011001100110011001100110011010p-4 (note
the rounding in the last 2 digits).

That's the closest representation of 1/10 that we can get with
doubles. If we convert that back to decimal, we get:

#         1         2
#12345678901234567890
.1000000000000000055511151231257827021181583404541015625

The double before that
(1.1001100110011001100110011001100110011001100110011001p-4 is:

.09999999999999999167332731531132594682276248931884765625

and the one after
(1.1001100110011001100110011001100110011001100110011011p-4):

.10000000000000001942890293094023945741355419158935546875

are not as close.

Now, zsh is before all a shell, that is, a command line
interpreter. Sooner or later it will need to pass the floating
point number that results of the arithmetic expression to a
command. In a non-shell programming-language, you'd pass your
double to the function you want to call. But in a shell, you can
only pass strings to commands. You can't pass the raw byte
values of your double as it may very well contain NUL bytes and
anyway the commands would not know what to do with them.

So you need to convert it back to a string notation that the
command understands. There are some notations like the C99
0xc.ccccccccccccccdp-7 floating point hexadecimal notation that
can easily represent a IEEE 754 binary floating point number,
but it's not widely supported yet and more generally meaningless
for most mortal humans (few people would recognise 0.1 at first
sight above). So the result of $((...)) arithmetic expansion is
actually a floating point number in decimal notation�.

Now .1000000000000000055511151231257827021181583404541015625 is
a bit lengthy and it's pointless to give that much precision
given that doubles (and so the result of arithmetic expressions)
don't have that much precision. In effect,
.1000000000000000055511151231257827021181583404541015625,
.100000000000000005551115123125782, or even 0.1 in this case
would convert back to the same double.

If we truncate (and round) to 15 digits, like yash (which also
uses doubles internally for its floating point arithmetics)
does, we do get our 0.1, but then again we get 0.1 as well for
the two other doubles, so we're losing information as we can't
distinguish those 3 different numbers. If we're truncating to 16
bits, we still get 2 of those different doubles that yield 0.1.

We'd need to keep 17 significant decimal digits to not lose
information stored in a IEEE 754 double-precision number. As
[1]the double-precision Wikipedia article puts it (quoting a
paper by William Kahan, the main architect behind IEEE 754):

  If an IEEE 754 double-precision number is converted to a
  decimal string with at least 17 significant digits, and then
  converted back to double-precision representation, the final
  result must match the original number

Conversely, if we use fewer bits, there are binary double values
for which we won't get back the same double once we convert them
back as seen in the example above.

That's what zsh does, it chooses to preserve the whole precision
of the double binary format into the decimal representation
given by the result of the arithmetic expansion, so that when
used again into something (like awk or printf "%17f" or zsh's
own arithmetic expressions...) that converts it back to a double
it comes back as the same double.

As seen in the zsh code (already there in 2000 when floating
point support was added to zsh):

     /*
    * Conversion from a floating point expression without using
    * a variable.  The best bet in this case just seems to be
    * to use the general %g format with something like the maximum
    * double precision.
    */

You'll also notice that it expands floats that turn out to have
no decimal part when truncated with a . appended to make sure
they're considered as float when used again in an arithmetic
expression:

$ zsh -c 'echo $((0.5 * 4))'
2.

If it didn't and it was reused in an arithmetic expression, it
would be treated as an integer instead of a float which would
affect the behaviour of the operations being used (for instance
2/4 is an integer division which yields 0 and 2./4 is a floating
point division which yields 0.5).

Now, that choice on the number of significant digits means that
for the case of that 0.1 as input, the
1.1001100110011001100110011001100110011001100110011010p-4 binary
double (the closest one to 0.1) becomes 0.100000000000001, which
looks bad when shown to a human. It's even worse when the error
is in the other direction like 0.3 that becomes
0.29999999999999999.

There's also the converse problem that when we pass that number
to an application that supports more precision than doubles do,
we're actually passing that 0.000000000000001 error (from the
value input by the user like 0.1) along which then becomes
significant:

$ v=$((0.1)) awk 'BEGIN{print ENVIRON["v"] == 0.1}'
1
$ v=$((0.1)) yash -c 'echo "$((v == 0.1))"'
1

OK because awk and yash use doubles just like zsh, but:

$ echo "$((0.1)) == 0.1" | bc
0
$ v=$((0.1)) ksh93 -c 'echo "$((v == 0.1))"'
0

not OK because bc uses arbitrary precision and ksh93 extended
precision on my system.

Now, if instead of 0.1 (1/10), the original decimal input had
been 0.11111111111111111 (or any other arbitrary approximation
of 1/9), the tables would turn, showing it's quite hopeless to
do equality comparisons on floats.

The human display artefact problem can be solved by specifying
the precision at the time of display (after you've done all your
calculations using the full precision), for instance by using
printf:

$ x=$((1./10)); printf '%s %g\n' $x $x
0.10000000000000001 0.1

(%g, short for %.6g like the default output format for floats in
awk). That also removes the extra trailing . on integer floats.

yash (and ksh93's) approach

yash chose to remove the artefacts at the expense of precision,
15 decimal digits is the highest number of significant decimal
digits that guarantees that there won't be this kind of artefact
when converting a number from decimal to binary and back again
to decimal like in our $((0.1)) case.

The fact that information in the binary number is lost upon
converting to decimal can cause other forms of artefacts:

$ yash -c 'x=$((1./3)); echo "$((x == 1./3)) $((1./3 == 1./3))"'
0 1

Though (in)equality comparisons are generally unsafe with
floating points. Here, we could expect x and 1./3 to be
identical as they are the result of the exact same operation.

Also:

$ yash -c  'x=$((0.5 * 3)); y=$((1.25 * 4)); echo "$((x / y))"'
0.3
$ yash -c  'x=$((0.5 * 6)); y=$((1.25 * 4)); echo "$((x / y))"'
0

(as yash doesn't always include a . or e in the decimal
representation of a floating point result, the next arithmetic
operation could end-up being either an integer operation or
floating point operation).

Or:

$ yash -c 'a=$((1e15)); echo $((a*100000))'
1e+20
$ yash -c 'a=$((1e14)); echo $((a*100000))'
-8446744073709551616

($((1e15)) expands to 1e+15 which is taken as a float, while
$((1e14)) expands to 100000000000000 which is taken as an
integer and causes the overflow because we're actually doing an
integer multiplication instead of a float multiplication).

While there are ways to address the artefact problems by
reducing the precision upon display in zsh as seen above, the
loss of precision cannot be recovered in other shells.

$ yash -c 'printf "%.17g\n" $((5./9))'
0.555555555555556

(still only 15 digits)

In any case, however how short you truncate, you can always end
up getting artefacts in the results of arithmetic expansions as
errors are inherent to floating point representations.

$ yash -c 'echo $((10.1 - 10))'
0.0999999999999996

Which is yet another illustration of why you can't really use
the equality operator with floating points:

$ zsh -c 'echo $((10.1 - 10 == 0.1))'
0
$ yash -c 'echo "$((10.1 - 10 == 0.1))"'
0

ksh93

The case of ksh93 is more complex.

ksh93 uses long doubles instead of double where available. long
doubles are only guaranteed by C to be at least as big as
doubles. In practice, depending on the compiler and
architecture, they're most often either IEEE 754
double-precision (64 bit) like doubles, IEEE 754 quadruple
precision (128 bit) or extended precision (80 bit precision, but
often stored on 128 bits) like when ksh93 is built for GNU/Linux
systems running on x86.

To represent them fully and unambiguously in decimal, you need
respectively 17, 36 or 21 significant digits.

ksh93 truncates at 18 significant digits.

I can only test on x86 architecture at the moment, but my
understanding is that on systems where long doubles are like
doubles, you'd get the same kind of artefact as with zsh (worse
as it uses 18 digits instead of 17).

Where doubles have 80 bits or 128 bits precision, you get the
same kind of problems as with yash except that the situation is
better when interacting with tools that work with doubles as
ksh93 gives them more precision than they need and would
preserve as much precision as they give it.

$ ksh93 -c 'x=$((1./3)); echo "$((x == 1. / 3))"'
0

is still a "problem" but not:

$ ksh93 -c 'x=$((1./3)) awk "BEGIN{print ENVIRON[\"x\"] == 1/3}"'
1

is OK.

Where the behaviour is suboptimal though is when typeset
-F<n>/-E<n> are used. In that case, ksh93 truncates to 15
significant digits when assigning a value to a variable even if
you request a value of <n> greater than 15:

$ ksh93 -c 'typeset -F21 x; ((x = y = 1./3)); echo "$((x == y))"'
0
$ ksh93 -c 'typeset -F21 x; ((y = 1./3)); x=$y; echo "$((x == y))"'
0

There are differences in behaviour in between ksh93, zsh and
yash when it comes to the handling on the locale's decimal radix
character (whether to use/recognise 3.14 or 3,14) which affects
the ability to reinput the result of arithmetic expansions
inside arithmetic expressions. zsh is consistent again in that
the result of expansions can always we used inside arithmetic
expressions regardless of the user's locale there.

awk

awk is one of those programming languages that is not a shell
and handles floating point numbers. The same would apply to
perl...

Its variables are not limited to strings and nowadays generally
store numbers internally as binary doubles (gawk also supports
arbitrary precision numbers as an extension). The conversion to
the string decimal notation only happens when printing a number
like in:

$ awk 'BEGIN {print 0.1}'
0.1

In which case it uses the format specified in the OFMT special
variable (%.6g by default), but can be made arbitrarily big:

$ awk -v OFMT=%.80g 'BEGIN{print 0.1}'
0.1000000000000000055511151231257827021181583404541015625

Or when there is an implicit conversion of a number to string,
like when a string operator (like concatenation, subtr(),
index()...) is used, it which case the CONVFMT variable is used
instead (except for integer numbers).

$ awk -v OFMT=%.0e -v CONVFMT=%.17g 'BEGIN{x=0.1; print x, ""x}'
1e-01 0.10000000000000001

Or when using printf explicitly.

There is usually no problem of precision lost internally as we
don't convert back and forth between decimal and binary
representation. And on output, one can decide how much or how
little precision to give out.

Conclusion

In conclusion, I'll just offer my personal opinion.

Shell floating point arithmetics is not something I use often.
Most of the time, it's through zsh's zcalc autoloadable
calculator function which prints floats with 6 digit precision
anyway. Most of the time anything past the first 3 digits after
the decimal point is just noise for this kind of usage.

Having arithmetic expansions have a high precision is a
necessity. Whether it's the full precision or as much precision
as possible while avoiding some of the artefacts probably
doesn't matter so much especially considering that nobody is
ever going to use a shell to do extensive floating point
calculations.

While it does give me comfort to know that in zsh, the
roundtripping to decimal is not going to introduce an extra
level of errors, I find more important the fact that the result
of expansions can safely be used inside arithmetic expressions,
that floats stay floats and that a script will keep working when
used in a locale where the decimal radix is , for instance.

════════════════════════════════════════════════════════════════

� zsh is the only Korn-like shell that I know that can have
arithmetic expansions be in bases other than 10, but that's only
for integer ones.

References

   Visible links
   1. https://en.wikipedia.org/wiki/Double-precision_floating-point_format#cite_ref-whyieee_1-0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-20 16:58 ` Stephane Chazelas
@ 2019-12-20 17:12   ` Roman Perepelitsa
  2019-12-21  0:50     ` Vincent Lefevre
  2019-12-21  1:00   ` Vincent Lefevre
  1 sibling, 1 reply; 12+ messages in thread
From: Roman Perepelitsa @ 2019-12-20 17:12 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, Dec 20, 2019 at 5:59 PM Stephane Chazelas
<stephane.chazelas@gmail.com> wrote:
>
> 2019-12-20 02:37:11 +0100, Vincent Lefevre:
> > With zsh 5.7.1, I get:
> >
> > zira% echo $((1.1))
> > 1.1000000000000001
> >
> > because zsh seems to first select the precision independently
> > from the value, i.e. 17 to be able to convert the string back
> > to floating point, preserving the original value, then it
> > outputs the closest number in this precision.
> >
> > Instead, zsh should select the minimum precision so that the
> > inverse conversion can give the original value, i.e. it should
> > output 1.1 here.
>
> And what should it give for
>
> $((1.1000000000000001)) ?
>
> (hint, 1.1000000000000001 and 1.1 have the same "double"
> representation).

I think what Vincent meant is that zsh should produce the shortest
string that, when parsed, results in a value equal to the original.

For your example, "1.1" is the shortest string that parses into
floating point value equal to the original, hence this (according to
Vincent) is what zsh should produce.

Many languages and libraries do this sort of thing. The roundtrip
guarantee is sometimes limited to the same machine. That is, some
implementation don't guarantee that you can serialize a floating point
value on one machine, parse it on another and get the same value.

Roman.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-20 17:12   ` Roman Perepelitsa
@ 2019-12-21  0:50     ` Vincent Lefevre
  2019-12-21  8:47       ` Stephane Chazelas
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Lefevre @ 2019-12-21  0:50 UTC (permalink / raw)
  To: zsh-workers

On 2019-12-20 18:12:18 +0100, Roman Perepelitsa wrote:
> I think what Vincent meant is that zsh should produce the shortest
> string that, when parsed, results in a value equal to the original.
> 
> For your example, "1.1" is the shortest string that parses into
> floating point value equal to the original, hence this (according to
> Vincent) is what zsh should produce.

Yes, this is exactly what I meant, and what Java's System.out.println
seems to do. This is also specified like that in XPath.

I think that's the best compromise in practice.

> Many languages and libraries do this sort of thing. The roundtrip
> guarantee is sometimes limited to the same machine. That is, some
> implementation don't guarantee that you can serialize a floating point
> value on one machine, parse it on another and get the same value.

The roundtrip guarantee is associated with the floating-point format.
If you don't know what format will be used when parsing the string,
then you need to store the exact value (this is always possible for
binary numbers written in decimal, but can take many digits, up to
something around the absolute value of the minimum exponent).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-20 16:58 ` Stephane Chazelas
  2019-12-20 17:12   ` Roman Perepelitsa
@ 2019-12-21  1:00   ` Vincent Lefevre
  1 sibling, 0 replies; 12+ messages in thread
From: Vincent Lefevre @ 2019-12-21  1:00 UTC (permalink / raw)
  To: zsh-workers

On 2019-12-20 16:58:24 +0000, Stephane Chazelas wrote:
> https://unix.stackexchange.com/questions/422122/why-does-0-1-expand-to-0-10000000000000001-in-zsh
> 
> Reproduced below for convenience:
> 
> ════════════════════════════════════════════════════════════════
> 
> TL;DR
> 
> zsh chooses a decimal representation for the double binary
> numbers that it uses for evaluating floating point arithmetics
> that preserves their information fully, that is safe for reinput
> into its arithmetic expressions.

But there are several possible decimal representations with this
requirement.

> And that is done at the expense
> of cosmetic. For that, it needs 17 significant digits, and make
> sure the expansion always includes a . or e so it's treated as
> float on reinput.

This is only one way to fulfill the requirement. Here, it was assumed
that the output precision is chosen independently from the argument
(it seems that zsh chooses printf "%.17g"). But it does not need to
be like that.

BTW, what zsh does is not documented.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-21  0:50     ` Vincent Lefevre
@ 2019-12-21  8:47       ` Stephane Chazelas
  2019-12-21  9:43         ` Roman Perepelitsa
  2019-12-21 21:28         ` Vincent Lefevre
  0 siblings, 2 replies; 12+ messages in thread
From: Stephane Chazelas @ 2019-12-21  8:47 UTC (permalink / raw)
  To: zsh-workers

2019-12-21 01:50:05 +0100, Vincent Lefevre:
> On 2019-12-20 18:12:18 +0100, Roman Perepelitsa wrote:
> > I think what Vincent meant is that zsh should produce the shortest
> > string that, when parsed, results in a value equal to the original.
> > 
> > For your example, "1.1" is the shortest string that parses into
> > floating point value equal to the original, hence this (according to
> > Vincent) is what zsh should produce.
> 
> Yes, this is exactly what I meant, and what Java's System.out.println
> seems to do. This is also specified like that in XPath.
> 
> I think that's the best compromise in practice.
[...]

OK, I think I see what you mean.

So on a system (with a compiler) where C doubles are implemented
as IEEE 754 double precision, both 1.1 and 1.1000000000000001
are represented as the same binary double (whose exact value is
1.100000000000000088817841970012523233890533447265625).

So you're saying echo $((1.1000000000000001)) and echo $((1.1))
should output 1.1, because even though 1.1000000000000001 is
closer to that value than 1.1000000000000000, zsh should pick
the latter because people prefer to see shorter number
representations and in that case it doesn't matter which one we
pick as both lead to the same double.

How would we do that?

Is there a standard C API for that?

Or would we get the output of sprintf("%.17g"), look at the last
two significant digits, if the second last is 9 or 0, then see
if rounding it and doing a strtod again yields the same double?

That seems a bit overkill (and I suspect that's not even a valid
approach).

Or should we implement the conversion to decimal string
representation from scratch without using sprintf() and adapt to
every system's double representation? or assume doubles are IEEE
754 ones as is more or less already done?

How are those other languages doing it?

-- 
Stephane

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-21  8:47       ` Stephane Chazelas
@ 2019-12-21  9:43         ` Roman Perepelitsa
  2019-12-21 17:56           ` Stephane Chazelas
  2019-12-21 21:28         ` Vincent Lefevre
  1 sibling, 1 reply; 12+ messages in thread
From: Roman Perepelitsa @ 2019-12-21  9:43 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, Dec 21, 2019 at 9:48 AM Stephane Chazelas
<stephane.chazelas@gmail.com> wrote:
> So on a system (with a compiler) where C doubles are implemented
> as IEEE 754 double precision, both 1.1 and 1.1000000000000001
> are represented as the same binary double (whose exact value is
> 1.100000000000000088817841970012523233890533447265625).
>
> So you're saying echo $((1.1000000000000001)) and echo $((1.1))
> should output 1.1, because even though 1.1000000000000001 is
> closer to that value than 1.1000000000000000, zsh should pick
> the latter because people prefer to see shorter number
> representations and in that case it doesn't matter which one we
> pick as both lead to the same double.

Correct. The best formal description of this process that I know of is
the specification of to_chars() functions in the C++ standard:

    The functions [...] ensure that the string representation consists
    of the smallest number of characters such that there is at least
    one digit before the radix point (if present) and parsing the
    representation using the corresponding from_chars function
    recovers value exactly. [Note: This guarantee applies only if
    to_chars and from_chars are executed on the same implementation.
    — end note] If there are several such representations, the
    representation with the smallest difference from the
    floating-point argument value is chosen, resolving any remaining
    ties using rounding according to round_to_nearest.

There is no simple algorithm that achieves this. I recall reading long
papers a few years back that were describing various inventions in
this field. They were pretty scary.

> Is there a standard C API for that?

Not that I know of. FWIW, neither gcc (libstdc++) nor clang (libc++)
have implemented to_chars() for floating points in the two years since
it's been added to the standard (C++17).

>
> Or would we get the output of sprintf("%.17g"), look at the last
> two significant digits, if the second last is 9 or 0, then see
> if rounding it and doing a strtod again yields the same double?

I believe printing the value with full precision and then truncating
the string this is the most popular approach. All implementations that
are faster are only marginally so, and they are much more complex. I
think this approach should work well enough for zsh. Lifting this sort
of implementation from another project would be ideal (someone would
have to find it; my apologies for not doing it myself).

Roman.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-21  9:43         ` Roman Perepelitsa
@ 2019-12-21 17:56           ` Stephane Chazelas
  2019-12-21 18:11             ` Stephane Chazelas
  0 siblings, 1 reply; 12+ messages in thread
From: Stephane Chazelas @ 2019-12-21 17:56 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

2019-12-21 10:43:43 +0100, Roman Perepelitsa:
[...]
> There is no simple algorithm that achieves this. I recall reading long
> papers a few years back that were describing various inventions in
> this field. They were pretty scary.

https://github.com/openjdk/jdk/blob/f4af0eadb6eaf9d9614431110ab7fc9c1588966d/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L424
is not exactly trivial. I enjoyed reading the comments even if I
have not clue what the the code is doing.

-- 
Stephane

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-21 17:56           ` Stephane Chazelas
@ 2019-12-21 18:11             ` Stephane Chazelas
  2019-12-21 18:20               ` Roman Perepelitsa
  0 siblings, 1 reply; 12+ messages in thread
From: Stephane Chazelas @ 2019-12-21 18:11 UTC (permalink / raw)
  To: Roman Perepelitsa, Zsh hackers list

2019-12-21 17:56:18 +0000, Stephane Chazelas:
> 2019-12-21 10:43:43 +0100, Roman Perepelitsa:
> [...]
> > There is no simple algorithm that achieves this. I recall reading long
> > papers a few years back that were describing various inventions in
> > this field. They were pretty scary.
> 
> https://github.com/openjdk/jdk/blob/f4af0eadb6eaf9d9614431110ab7fc9c1588966d/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L424
> is not exactly trivial. I enjoyed reading the comments even if I
> have not clue what the the code is doing.
[...]

Microsoft's STL implementation (to_chars) looks about as hairy:
https://github.com/microsoft/STL/blob/264b0d4a167daa9e5499af6d783c9ff22f7af03f/stl/inc/charconv

-- 
Stephane

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-21 18:11             ` Stephane Chazelas
@ 2019-12-21 18:20               ` Roman Perepelitsa
  0 siblings, 0 replies; 12+ messages in thread
From: Roman Perepelitsa @ 2019-12-21 18:20 UTC (permalink / raw)
  To: Roman Perepelitsa, Zsh hackers list

On Sat, Dec 21, 2019 at 7:11 PM Stephane Chazelas
<stephane.chazelas@gmail.com> wrote:
>
> 2019-12-21 17:56:18 +0000, Stephane Chazelas:
> > 2019-12-21 10:43:43 +0100, Roman Perepelitsa:
> > [...]
> > > There is no simple algorithm that achieves this. I recall reading long
> > > papers a few years back that were describing various inventions in
> > > this field. They were pretty scary.
> >
> > https://github.com/openjdk/jdk/blob/f4af0eadb6eaf9d9614431110ab7fc9c1588966d/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L424
> > is not exactly trivial. I enjoyed reading the comments even if I
> > have not clue what the the code is doing.
> [...]
>
> Microsoft's STL implementation (to_chars) looks about as hairy:
> https://github.com/microsoft/STL/blob/264b0d4a167daa9e5499af6d783c9ff22f7af03f/stl/inc/charconv

Note that different implementations can have different behavior in
some cases. I believe the C++ spec unambiguously defines string
representation for every number but Java may not be following that
spec (it has no reason to). I recall lengthy discussions when to_chars
was being standardized. I don't remember the details but I think there
was more than one reasonable specification and judgement call had to
be made. I pointed to to_chars from the C++ standard not because it's
the best but because it's precise and recent enough to take into
account experience from other languages. It's always nice when you can
piggyback on a standard.

Roman.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: zsh converts a floating-point number to string with too much precision
  2019-12-21  8:47       ` Stephane Chazelas
  2019-12-21  9:43         ` Roman Perepelitsa
@ 2019-12-21 21:28         ` Vincent Lefevre
  1 sibling, 0 replies; 12+ messages in thread
From: Vincent Lefevre @ 2019-12-21 21:28 UTC (permalink / raw)
  To: zsh-workers

On 2019-12-21 08:47:36 +0000, Stephane Chazelas wrote:
> OK, I think I see what you mean.
> 
> So on a system (with a compiler) where C doubles are implemented
> as IEEE 754 double precision, both 1.1 and 1.1000000000000001
> are represented as the same binary double (whose exact value is
> 1.100000000000000088817841970012523233890533447265625).
> 
> So you're saying echo $((1.1000000000000001)) and echo $((1.1))
> should output 1.1, because even though 1.1000000000000001 is
> closer to that value than 1.1000000000000000, zsh should pick
> the latter because people prefer to see shorter number
> representations and in that case it doesn't matter which one we
> pick as both lead to the same double.

I now remember that our Handbook of Floating-Point Arithmetic covers
this issue (Section 4.9.2.1 "Output conversion: from radix 2 to
radix 10" in the 2nd edition). But it just gives references

--------------------------------------------------------------------
"[...] Steele and White designed an algorithm for that. Their
algorithm was later improved by Burger and Dybvig [86], and by
Gay [211]. Gay's code is available for anyone to use, and is
very robust.(17) Faster yet more complex algorithms have been
introduced by Loitsch [394] and Andrysco et al. [16]. [...]"

(17) At the time of writing this book, it can be obtained at
http://www.netlib.org/fp/ (file dtoa.c).
--------------------------------------------------------------------

and Burger and Dybvig's high-level algorithm.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-12-21 21:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-20  1:37 zsh converts a floating-point number to string with too much precision Vincent Lefevre
2019-12-20  3:38 ` Mikael Magnusson
2019-12-20 16:58 ` Stephane Chazelas
2019-12-20 17:12   ` Roman Perepelitsa
2019-12-21  0:50     ` Vincent Lefevre
2019-12-21  8:47       ` Stephane Chazelas
2019-12-21  9:43         ` Roman Perepelitsa
2019-12-21 17:56           ` Stephane Chazelas
2019-12-21 18:11             ` Stephane Chazelas
2019-12-21 18:20               ` Roman Perepelitsa
2019-12-21 21:28         ` Vincent Lefevre
2019-12-21  1:00   ` Vincent Lefevre

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).