* zsh converts a floating-point number to string with too much precision @ 2019-12-20 1:37 Vincent Lefevre 2019-12-20 3:38 ` Mikael Magnusson 2019-12-20 16:58 ` Stephane Chazelas 0 siblings, 2 replies; 12+ messages in thread From: Vincent Lefevre @ 2019-12-20 1:37 UTC (permalink / raw) To: zsh-workers With zsh 5.7.1, I get: zira% echo $((1.1)) 1.1000000000000001 because zsh seems to first select the precision independently from the value, i.e. 17 to be able to convert the string back to floating point, preserving the original value, then it outputs the closest number in this precision. Instead, zsh should select the minimum precision so that the inverse conversion can give the original value, i.e. it should output 1.1 here. FYI, GNU MPFR has the same issue with mpfr_printf and %Re (with an empty precision field), and I regard this as a bug: https://sympa.inria.fr/sympa/arc/mpfr/2019-12/msg00000.html https://sympa.inria.fr/sympa/arc/mpfr/2019-12/msg00001.html Note that Java does it right: zira:~> cat tst.java public class tst { public static void main(String[] args) { double x; x = 0x1.1999999999999p+0; System.out.println(x); x = 0x1.199999999999ap+0; System.out.println(x); x = 0x1.199999999999bp+0; System.out.println(x); } } zira:~> javac tst.java zira:~> java tst 1.0999999999999999 1.1 1.1000000000000003 -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-20 1:37 zsh converts a floating-point number to string with too much precision Vincent Lefevre @ 2019-12-20 3:38 ` Mikael Magnusson 2019-12-20 16:58 ` Stephane Chazelas 1 sibling, 0 replies; 12+ messages in thread From: Mikael Magnusson @ 2019-12-20 3:38 UTC (permalink / raw) To: zsh-workers On 12/20/19, Vincent Lefevre <vincent@vinc17.net> wrote: > With zsh 5.7.1, I get: > > zira% echo $((1.1)) > 1.1000000000000001 > > because zsh seems to first select the precision independently > from the value, i.e. 17 to be able to convert the string back > to floating point, preserving the original value, then it > outputs the closest number in this precision. > > Instead, zsh should select the minimum precision so that the > inverse conversion can give the original value, i.e. it should > output 1.1 here. You can use typeset -F1 one=1.1 to specify the output precision of a parameter (note that this doesn't affect the float value stored, you can change to -F20 later to display more decimals without reassignment). So in your case you could count the number of digits in the string after the . and then pass that to -F if you wanted to. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-20 1:37 zsh converts a floating-point number to string with too much precision Vincent Lefevre 2019-12-20 3:38 ` Mikael Magnusson @ 2019-12-20 16:58 ` Stephane Chazelas 2019-12-20 17:12 ` Roman Perepelitsa 2019-12-21 1:00 ` Vincent Lefevre 1 sibling, 2 replies; 12+ messages in thread From: Stephane Chazelas @ 2019-12-20 16:58 UTC (permalink / raw) To: zsh-workers 2019-12-20 02:37:11 +0100, Vincent Lefevre: > With zsh 5.7.1, I get: > > zira% echo $((1.1)) > 1.1000000000000001 > > because zsh seems to first select the precision independently > from the value, i.e. 17 to be able to convert the string back > to floating point, preserving the original value, then it > outputs the closest number in this precision. > > Instead, zsh should select the minimum precision so that the > inverse conversion can give the original value, i.e. it should > output 1.1 here. And what should it give for $((1.1000000000000001)) ? (hint, 1.1000000000000001 and 1.1 have the same "double" representation). See also: https://unix.stackexchange.com/questions/422122/why-does-0-1-expand-to-0-10000000000000001-in-zsh Reproduced below for convenience: ════════════════════════════════════════════════════════════════ TL;DR zsh chooses a decimal representation for the double binary numbers that it uses for evaluating floating point arithmetics that preserves their information fully, that is safe for reinput into its arithmetic expressions. And that is done at the expense of cosmetic. For that, it needs 17 significant digits, and make sure the expansion always includes a . or e so it's treated as float on reinput. That "full-precision" decimal representation could be seen as an intermediary format between the binary double precision machine-only numbers and a human-readable one. An intermediary format that is understood by all tools that understand decimal representations of floating point numbers. In the case of 0.1 as used in a arithmetic expression, it so happens that the closest 17 digit decimal representation of the double precision binary number closest to 0.1 is 0.10000000000000001, an artefact caused by the limit of the precision of double precision numbers and rounding. Other shells privilege the cosmetic aspect and do lose some information upon conversion to decimal (though still try to preserve as much precision as possible within that additional constraint). Both approaches have their merits and drawbacks, see below for details. awk doesn't have this kind of problematic as it's not a shell and doesn't have to translate back and forth constantly between binary and decimal representation when manipulating floating points. zsh's approach zsh, like many other programming languages (including yash, ksh93) and many tools used from the shell (like awk, printf...) that deal with floating point numbers, perform arithmetic operations on a binary representation of those numbers. That's convenient and efficient because those operations are supported by the C compiler and on most architectures are done by the processor itself. zsh uses the double C type for its internal representation of real numbers.. On most architectures (and with most compilers), those are implemented using IEEE 754 double-precision binary floating points. Those are implemented a bit like our 1.12e4 engineering notation decimal numbers but in binary (base 2) instead of decimal (base 10). With the mantissa on 53 bits (1 of which implied) and the exponent on 11 bits (and a sign bit). Those generally give you more precision than you'd ever need. When evaluating an arithmetic expression like 1. / 10 (which here has a literal float constant as one of the operands), zsh converts them from their text decimal representation to doubles internally (using the standard strtod() function) and does the operation which results in a new double. 1/10 can be represented with a decimal notation as 0.1 or 1e-1, but just like we can't represent 1/3 in decimal (it would be fine in base 3, 6 or 9), 1/10 cannot be represented in binary (as 10 is not a power of 2). Like 1/3 is 0.333333[adlib] in decimal, 1/10 is .0001100110011001100110011001[adlib] or 1.10011001100110011001[adlib]p-4 in binary (where p-4 stands for 2^-4, (the 4 here in decimal)). As we can only store 52 bits worth of those 1001..., 1/10 as a double becomes 1.1001100110011001100110011001100110011001100110011010p-4 (note the rounding in the last 2 digits). That's the closest representation of 1/10 that we can get with doubles. If we convert that back to decimal, we get: # 1 2 #12345678901234567890 .1000000000000000055511151231257827021181583404541015625 The double before that (1.1001100110011001100110011001100110011001100110011001p-4 is: .09999999999999999167332731531132594682276248931884765625 and the one after (1.1001100110011001100110011001100110011001100110011011p-4): .10000000000000001942890293094023945741355419158935546875 are not as close. Now, zsh is before all a shell, that is, a command line interpreter. Sooner or later it will need to pass the floating point number that results of the arithmetic expression to a command. In a non-shell programming-language, you'd pass your double to the function you want to call. But in a shell, you can only pass strings to commands. You can't pass the raw byte values of your double as it may very well contain NUL bytes and anyway the commands would not know what to do with them. So you need to convert it back to a string notation that the command understands. There are some notations like the C99 0xc.ccccccccccccccdp-7 floating point hexadecimal notation that can easily represent a IEEE 754 binary floating point number, but it's not widely supported yet and more generally meaningless for most mortal humans (few people would recognise 0.1 at first sight above). So the result of $((...)) arithmetic expansion is actually a floating point number in decimal notation�. Now .1000000000000000055511151231257827021181583404541015625 is a bit lengthy and it's pointless to give that much precision given that doubles (and so the result of arithmetic expressions) don't have that much precision. In effect, .1000000000000000055511151231257827021181583404541015625, .100000000000000005551115123125782, or even 0.1 in this case would convert back to the same double. If we truncate (and round) to 15 digits, like yash (which also uses doubles internally for its floating point arithmetics) does, we do get our 0.1, but then again we get 0.1 as well for the two other doubles, so we're losing information as we can't distinguish those 3 different numbers. If we're truncating to 16 bits, we still get 2 of those different doubles that yield 0.1. We'd need to keep 17 significant decimal digits to not lose information stored in a IEEE 754 double-precision number. As [1]the double-precision Wikipedia article puts it (quoting a paper by William Kahan, the main architect behind IEEE 754): If an IEEE 754 double-precision number is converted to a decimal string with at least 17 significant digits, and then converted back to double-precision representation, the final result must match the original number Conversely, if we use fewer bits, there are binary double values for which we won't get back the same double once we convert them back as seen in the example above. That's what zsh does, it chooses to preserve the whole precision of the double binary format into the decimal representation given by the result of the arithmetic expansion, so that when used again into something (like awk or printf "%17f" or zsh's own arithmetic expressions...) that converts it back to a double it comes back as the same double. As seen in the zsh code (already there in 2000 when floating point support was added to zsh): /* * Conversion from a floating point expression without using * a variable. The best bet in this case just seems to be * to use the general %g format with something like the maximum * double precision. */ You'll also notice that it expands floats that turn out to have no decimal part when truncated with a . appended to make sure they're considered as float when used again in an arithmetic expression: $ zsh -c 'echo $((0.5 * 4))' 2. If it didn't and it was reused in an arithmetic expression, it would be treated as an integer instead of a float which would affect the behaviour of the operations being used (for instance 2/4 is an integer division which yields 0 and 2./4 is a floating point division which yields 0.5). Now, that choice on the number of significant digits means that for the case of that 0.1 as input, the 1.1001100110011001100110011001100110011001100110011010p-4 binary double (the closest one to 0.1) becomes 0.100000000000001, which looks bad when shown to a human. It's even worse when the error is in the other direction like 0.3 that becomes 0.29999999999999999. There's also the converse problem that when we pass that number to an application that supports more precision than doubles do, we're actually passing that 0.000000000000001 error (from the value input by the user like 0.1) along which then becomes significant: $ v=$((0.1)) awk 'BEGIN{print ENVIRON["v"] == 0.1}' 1 $ v=$((0.1)) yash -c 'echo "$((v == 0.1))"' 1 OK because awk and yash use doubles just like zsh, but: $ echo "$((0.1)) == 0.1" | bc 0 $ v=$((0.1)) ksh93 -c 'echo "$((v == 0.1))"' 0 not OK because bc uses arbitrary precision and ksh93 extended precision on my system. Now, if instead of 0.1 (1/10), the original decimal input had been 0.11111111111111111 (or any other arbitrary approximation of 1/9), the tables would turn, showing it's quite hopeless to do equality comparisons on floats. The human display artefact problem can be solved by specifying the precision at the time of display (after you've done all your calculations using the full precision), for instance by using printf: $ x=$((1./10)); printf '%s %g\n' $x $x 0.10000000000000001 0.1 (%g, short for %.6g like the default output format for floats in awk). That also removes the extra trailing . on integer floats. yash (and ksh93's) approach yash chose to remove the artefacts at the expense of precision, 15 decimal digits is the highest number of significant decimal digits that guarantees that there won't be this kind of artefact when converting a number from decimal to binary and back again to decimal like in our $((0.1)) case. The fact that information in the binary number is lost upon converting to decimal can cause other forms of artefacts: $ yash -c 'x=$((1./3)); echo "$((x == 1./3)) $((1./3 == 1./3))"' 0 1 Though (in)equality comparisons are generally unsafe with floating points. Here, we could expect x and 1./3 to be identical as they are the result of the exact same operation. Also: $ yash -c 'x=$((0.5 * 3)); y=$((1.25 * 4)); echo "$((x / y))"' 0.3 $ yash -c 'x=$((0.5 * 6)); y=$((1.25 * 4)); echo "$((x / y))"' 0 (as yash doesn't always include a . or e in the decimal representation of a floating point result, the next arithmetic operation could end-up being either an integer operation or floating point operation). Or: $ yash -c 'a=$((1e15)); echo $((a*100000))' 1e+20 $ yash -c 'a=$((1e14)); echo $((a*100000))' -8446744073709551616 ($((1e15)) expands to 1e+15 which is taken as a float, while $((1e14)) expands to 100000000000000 which is taken as an integer and causes the overflow because we're actually doing an integer multiplication instead of a float multiplication). While there are ways to address the artefact problems by reducing the precision upon display in zsh as seen above, the loss of precision cannot be recovered in other shells. $ yash -c 'printf "%.17g\n" $((5./9))' 0.555555555555556 (still only 15 digits) In any case, however how short you truncate, you can always end up getting artefacts in the results of arithmetic expansions as errors are inherent to floating point representations. $ yash -c 'echo $((10.1 - 10))' 0.0999999999999996 Which is yet another illustration of why you can't really use the equality operator with floating points: $ zsh -c 'echo $((10.1 - 10 == 0.1))' 0 $ yash -c 'echo "$((10.1 - 10 == 0.1))"' 0 ksh93 The case of ksh93 is more complex. ksh93 uses long doubles instead of double where available. long doubles are only guaranteed by C to be at least as big as doubles. In practice, depending on the compiler and architecture, they're most often either IEEE 754 double-precision (64 bit) like doubles, IEEE 754 quadruple precision (128 bit) or extended precision (80 bit precision, but often stored on 128 bits) like when ksh93 is built for GNU/Linux systems running on x86. To represent them fully and unambiguously in decimal, you need respectively 17, 36 or 21 significant digits. ksh93 truncates at 18 significant digits. I can only test on x86 architecture at the moment, but my understanding is that on systems where long doubles are like doubles, you'd get the same kind of artefact as with zsh (worse as it uses 18 digits instead of 17). Where doubles have 80 bits or 128 bits precision, you get the same kind of problems as with yash except that the situation is better when interacting with tools that work with doubles as ksh93 gives them more precision than they need and would preserve as much precision as they give it. $ ksh93 -c 'x=$((1./3)); echo "$((x == 1. / 3))"' 0 is still a "problem" but not: $ ksh93 -c 'x=$((1./3)) awk "BEGIN{print ENVIRON[\"x\"] == 1/3}"' 1 is OK. Where the behaviour is suboptimal though is when typeset -F<n>/-E<n> are used. In that case, ksh93 truncates to 15 significant digits when assigning a value to a variable even if you request a value of <n> greater than 15: $ ksh93 -c 'typeset -F21 x; ((x = y = 1./3)); echo "$((x == y))"' 0 $ ksh93 -c 'typeset -F21 x; ((y = 1./3)); x=$y; echo "$((x == y))"' 0 There are differences in behaviour in between ksh93, zsh and yash when it comes to the handling on the locale's decimal radix character (whether to use/recognise 3.14 or 3,14) which affects the ability to reinput the result of arithmetic expansions inside arithmetic expressions. zsh is consistent again in that the result of expansions can always we used inside arithmetic expressions regardless of the user's locale there. awk awk is one of those programming languages that is not a shell and handles floating point numbers. The same would apply to perl... Its variables are not limited to strings and nowadays generally store numbers internally as binary doubles (gawk also supports arbitrary precision numbers as an extension). The conversion to the string decimal notation only happens when printing a number like in: $ awk 'BEGIN {print 0.1}' 0.1 In which case it uses the format specified in the OFMT special variable (%.6g by default), but can be made arbitrarily big: $ awk -v OFMT=%.80g 'BEGIN{print 0.1}' 0.1000000000000000055511151231257827021181583404541015625 Or when there is an implicit conversion of a number to string, like when a string operator (like concatenation, subtr(), index()...) is used, it which case the CONVFMT variable is used instead (except for integer numbers). $ awk -v OFMT=%.0e -v CONVFMT=%.17g 'BEGIN{x=0.1; print x, ""x}' 1e-01 0.10000000000000001 Or when using printf explicitly. There is usually no problem of precision lost internally as we don't convert back and forth between decimal and binary representation. And on output, one can decide how much or how little precision to give out. Conclusion In conclusion, I'll just offer my personal opinion. Shell floating point arithmetics is not something I use often. Most of the time, it's through zsh's zcalc autoloadable calculator function which prints floats with 6 digit precision anyway. Most of the time anything past the first 3 digits after the decimal point is just noise for this kind of usage. Having arithmetic expansions have a high precision is a necessity. Whether it's the full precision or as much precision as possible while avoiding some of the artefacts probably doesn't matter so much especially considering that nobody is ever going to use a shell to do extensive floating point calculations. While it does give me comfort to know that in zsh, the roundtripping to decimal is not going to introduce an extra level of errors, I find more important the fact that the result of expansions can safely be used inside arithmetic expressions, that floats stay floats and that a script will keep working when used in a locale where the decimal radix is , for instance. ════════════════════════════════════════════════════════════════ � zsh is the only Korn-like shell that I know that can have arithmetic expansions be in bases other than 10, but that's only for integer ones. References Visible links 1. https://en.wikipedia.org/wiki/Double-precision_floating-point_format#cite_ref-whyieee_1-0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-20 16:58 ` Stephane Chazelas @ 2019-12-20 17:12 ` Roman Perepelitsa 2019-12-21 0:50 ` Vincent Lefevre 2019-12-21 1:00 ` Vincent Lefevre 1 sibling, 1 reply; 12+ messages in thread From: Roman Perepelitsa @ 2019-12-20 17:12 UTC (permalink / raw) To: Zsh hackers list On Fri, Dec 20, 2019 at 5:59 PM Stephane Chazelas <stephane.chazelas@gmail.com> wrote: > > 2019-12-20 02:37:11 +0100, Vincent Lefevre: > > With zsh 5.7.1, I get: > > > > zira% echo $((1.1)) > > 1.1000000000000001 > > > > because zsh seems to first select the precision independently > > from the value, i.e. 17 to be able to convert the string back > > to floating point, preserving the original value, then it > > outputs the closest number in this precision. > > > > Instead, zsh should select the minimum precision so that the > > inverse conversion can give the original value, i.e. it should > > output 1.1 here. > > And what should it give for > > $((1.1000000000000001)) ? > > (hint, 1.1000000000000001 and 1.1 have the same "double" > representation). I think what Vincent meant is that zsh should produce the shortest string that, when parsed, results in a value equal to the original. For your example, "1.1" is the shortest string that parses into floating point value equal to the original, hence this (according to Vincent) is what zsh should produce. Many languages and libraries do this sort of thing. The roundtrip guarantee is sometimes limited to the same machine. That is, some implementation don't guarantee that you can serialize a floating point value on one machine, parse it on another and get the same value. Roman. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-20 17:12 ` Roman Perepelitsa @ 2019-12-21 0:50 ` Vincent Lefevre 2019-12-21 8:47 ` Stephane Chazelas 0 siblings, 1 reply; 12+ messages in thread From: Vincent Lefevre @ 2019-12-21 0:50 UTC (permalink / raw) To: zsh-workers On 2019-12-20 18:12:18 +0100, Roman Perepelitsa wrote: > I think what Vincent meant is that zsh should produce the shortest > string that, when parsed, results in a value equal to the original. > > For your example, "1.1" is the shortest string that parses into > floating point value equal to the original, hence this (according to > Vincent) is what zsh should produce. Yes, this is exactly what I meant, and what Java's System.out.println seems to do. This is also specified like that in XPath. I think that's the best compromise in practice. > Many languages and libraries do this sort of thing. The roundtrip > guarantee is sometimes limited to the same machine. That is, some > implementation don't guarantee that you can serialize a floating point > value on one machine, parse it on another and get the same value. The roundtrip guarantee is associated with the floating-point format. If you don't know what format will be used when parsing the string, then you need to store the exact value (this is always possible for binary numbers written in decimal, but can take many digits, up to something around the absolute value of the minimum exponent). -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-21 0:50 ` Vincent Lefevre @ 2019-12-21 8:47 ` Stephane Chazelas 2019-12-21 9:43 ` Roman Perepelitsa 2019-12-21 21:28 ` Vincent Lefevre 0 siblings, 2 replies; 12+ messages in thread From: Stephane Chazelas @ 2019-12-21 8:47 UTC (permalink / raw) To: zsh-workers 2019-12-21 01:50:05 +0100, Vincent Lefevre: > On 2019-12-20 18:12:18 +0100, Roman Perepelitsa wrote: > > I think what Vincent meant is that zsh should produce the shortest > > string that, when parsed, results in a value equal to the original. > > > > For your example, "1.1" is the shortest string that parses into > > floating point value equal to the original, hence this (according to > > Vincent) is what zsh should produce. > > Yes, this is exactly what I meant, and what Java's System.out.println > seems to do. This is also specified like that in XPath. > > I think that's the best compromise in practice. [...] OK, I think I see what you mean. So on a system (with a compiler) where C doubles are implemented as IEEE 754 double precision, both 1.1 and 1.1000000000000001 are represented as the same binary double (whose exact value is 1.100000000000000088817841970012523233890533447265625). So you're saying echo $((1.1000000000000001)) and echo $((1.1)) should output 1.1, because even though 1.1000000000000001 is closer to that value than 1.1000000000000000, zsh should pick the latter because people prefer to see shorter number representations and in that case it doesn't matter which one we pick as both lead to the same double. How would we do that? Is there a standard C API for that? Or would we get the output of sprintf("%.17g"), look at the last two significant digits, if the second last is 9 or 0, then see if rounding it and doing a strtod again yields the same double? That seems a bit overkill (and I suspect that's not even a valid approach). Or should we implement the conversion to decimal string representation from scratch without using sprintf() and adapt to every system's double representation? or assume doubles are IEEE 754 ones as is more or less already done? How are those other languages doing it? -- Stephane ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-21 8:47 ` Stephane Chazelas @ 2019-12-21 9:43 ` Roman Perepelitsa 2019-12-21 17:56 ` Stephane Chazelas 2019-12-21 21:28 ` Vincent Lefevre 1 sibling, 1 reply; 12+ messages in thread From: Roman Perepelitsa @ 2019-12-21 9:43 UTC (permalink / raw) To: Zsh hackers list On Sat, Dec 21, 2019 at 9:48 AM Stephane Chazelas <stephane.chazelas@gmail.com> wrote: > So on a system (with a compiler) where C doubles are implemented > as IEEE 754 double precision, both 1.1 and 1.1000000000000001 > are represented as the same binary double (whose exact value is > 1.100000000000000088817841970012523233890533447265625). > > So you're saying echo $((1.1000000000000001)) and echo $((1.1)) > should output 1.1, because even though 1.1000000000000001 is > closer to that value than 1.1000000000000000, zsh should pick > the latter because people prefer to see shorter number > representations and in that case it doesn't matter which one we > pick as both lead to the same double. Correct. The best formal description of this process that I know of is the specification of to_chars() functions in the C++ standard: The functions [...] ensure that the string representation consists of the smallest number of characters such that there is at least one digit before the radix point (if present) and parsing the representation using the corresponding from_chars function recovers value exactly. [Note: This guarantee applies only if to_chars and from_chars are executed on the same implementation. — end note] If there are several such representations, the representation with the smallest difference from the floating-point argument value is chosen, resolving any remaining ties using rounding according to round_to_nearest. There is no simple algorithm that achieves this. I recall reading long papers a few years back that were describing various inventions in this field. They were pretty scary. > Is there a standard C API for that? Not that I know of. FWIW, neither gcc (libstdc++) nor clang (libc++) have implemented to_chars() for floating points in the two years since it's been added to the standard (C++17). > > Or would we get the output of sprintf("%.17g"), look at the last > two significant digits, if the second last is 9 or 0, then see > if rounding it and doing a strtod again yields the same double? I believe printing the value with full precision and then truncating the string this is the most popular approach. All implementations that are faster are only marginally so, and they are much more complex. I think this approach should work well enough for zsh. Lifting this sort of implementation from another project would be ideal (someone would have to find it; my apologies for not doing it myself). Roman. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-21 9:43 ` Roman Perepelitsa @ 2019-12-21 17:56 ` Stephane Chazelas 2019-12-21 18:11 ` Stephane Chazelas 0 siblings, 1 reply; 12+ messages in thread From: Stephane Chazelas @ 2019-12-21 17:56 UTC (permalink / raw) To: Roman Perepelitsa; +Cc: Zsh hackers list 2019-12-21 10:43:43 +0100, Roman Perepelitsa: [...] > There is no simple algorithm that achieves this. I recall reading long > papers a few years back that were describing various inventions in > this field. They were pretty scary. https://github.com/openjdk/jdk/blob/f4af0eadb6eaf9d9614431110ab7fc9c1588966d/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L424 is not exactly trivial. I enjoyed reading the comments even if I have not clue what the the code is doing. -- Stephane ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-21 17:56 ` Stephane Chazelas @ 2019-12-21 18:11 ` Stephane Chazelas 2019-12-21 18:20 ` Roman Perepelitsa 0 siblings, 1 reply; 12+ messages in thread From: Stephane Chazelas @ 2019-12-21 18:11 UTC (permalink / raw) To: Roman Perepelitsa, Zsh hackers list 2019-12-21 17:56:18 +0000, Stephane Chazelas: > 2019-12-21 10:43:43 +0100, Roman Perepelitsa: > [...] > > There is no simple algorithm that achieves this. I recall reading long > > papers a few years back that were describing various inventions in > > this field. They were pretty scary. > > https://github.com/openjdk/jdk/blob/f4af0eadb6eaf9d9614431110ab7fc9c1588966d/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L424 > is not exactly trivial. I enjoyed reading the comments even if I > have not clue what the the code is doing. [...] Microsoft's STL implementation (to_chars) looks about as hairy: https://github.com/microsoft/STL/blob/264b0d4a167daa9e5499af6d783c9ff22f7af03f/stl/inc/charconv -- Stephane ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-21 18:11 ` Stephane Chazelas @ 2019-12-21 18:20 ` Roman Perepelitsa 0 siblings, 0 replies; 12+ messages in thread From: Roman Perepelitsa @ 2019-12-21 18:20 UTC (permalink / raw) To: Roman Perepelitsa, Zsh hackers list On Sat, Dec 21, 2019 at 7:11 PM Stephane Chazelas <stephane.chazelas@gmail.com> wrote: > > 2019-12-21 17:56:18 +0000, Stephane Chazelas: > > 2019-12-21 10:43:43 +0100, Roman Perepelitsa: > > [...] > > > There is no simple algorithm that achieves this. I recall reading long > > > papers a few years back that were describing various inventions in > > > this field. They were pretty scary. > > > > https://github.com/openjdk/jdk/blob/f4af0eadb6eaf9d9614431110ab7fc9c1588966d/src/java.base/share/classes/jdk/internal/math/FloatingDecimal.java#L424 > > is not exactly trivial. I enjoyed reading the comments even if I > > have not clue what the the code is doing. > [...] > > Microsoft's STL implementation (to_chars) looks about as hairy: > https://github.com/microsoft/STL/blob/264b0d4a167daa9e5499af6d783c9ff22f7af03f/stl/inc/charconv Note that different implementations can have different behavior in some cases. I believe the C++ spec unambiguously defines string representation for every number but Java may not be following that spec (it has no reason to). I recall lengthy discussions when to_chars was being standardized. I don't remember the details but I think there was more than one reasonable specification and judgement call had to be made. I pointed to to_chars from the C++ standard not because it's the best but because it's precise and recent enough to take into account experience from other languages. It's always nice when you can piggyback on a standard. Roman. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-21 8:47 ` Stephane Chazelas 2019-12-21 9:43 ` Roman Perepelitsa @ 2019-12-21 21:28 ` Vincent Lefevre 1 sibling, 0 replies; 12+ messages in thread From: Vincent Lefevre @ 2019-12-21 21:28 UTC (permalink / raw) To: zsh-workers On 2019-12-21 08:47:36 +0000, Stephane Chazelas wrote: > OK, I think I see what you mean. > > So on a system (with a compiler) where C doubles are implemented > as IEEE 754 double precision, both 1.1 and 1.1000000000000001 > are represented as the same binary double (whose exact value is > 1.100000000000000088817841970012523233890533447265625). > > So you're saying echo $((1.1000000000000001)) and echo $((1.1)) > should output 1.1, because even though 1.1000000000000001 is > closer to that value than 1.1000000000000000, zsh should pick > the latter because people prefer to see shorter number > representations and in that case it doesn't matter which one we > pick as both lead to the same double. I now remember that our Handbook of Floating-Point Arithmetic covers this issue (Section 4.9.2.1 "Output conversion: from radix 2 to radix 10" in the 2nd edition). But it just gives references -------------------------------------------------------------------- "[...] Steele and White designed an algorithm for that. Their algorithm was later improved by Burger and Dybvig [86], and by Gay [211]. Gay's code is available for anyone to use, and is very robust.(17) Faster yet more complex algorithms have been introduced by Loitsch [394] and Andrysco et al. [16]. [...]" (17) At the time of writing this book, it can be obtained at http://www.netlib.org/fp/ (file dtoa.c). -------------------------------------------------------------------- and Burger and Dybvig's high-level algorithm. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: zsh converts a floating-point number to string with too much precision 2019-12-20 16:58 ` Stephane Chazelas 2019-12-20 17:12 ` Roman Perepelitsa @ 2019-12-21 1:00 ` Vincent Lefevre 1 sibling, 0 replies; 12+ messages in thread From: Vincent Lefevre @ 2019-12-21 1:00 UTC (permalink / raw) To: zsh-workers On 2019-12-20 16:58:24 +0000, Stephane Chazelas wrote: > https://unix.stackexchange.com/questions/422122/why-does-0-1-expand-to-0-10000000000000001-in-zsh > > Reproduced below for convenience: > > ════════════════════════════════════════════════════════════════ > > TL;DR > > zsh chooses a decimal representation for the double binary > numbers that it uses for evaluating floating point arithmetics > that preserves their information fully, that is safe for reinput > into its arithmetic expressions. But there are several possible decimal representations with this requirement. > And that is done at the expense > of cosmetic. For that, it needs 17 significant digits, and make > sure the expansion always includes a . or e so it's treated as > float on reinput. This is only one way to fulfill the requirement. Here, it was assumed that the output precision is chosen independently from the argument (it seems that zsh chooses printf "%.17g"). But it does not need to be like that. BTW, what zsh does is not documented. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-12-21 21:29 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-20 1:37 zsh converts a floating-point number to string with too much precision Vincent Lefevre 2019-12-20 3:38 ` Mikael Magnusson 2019-12-20 16:58 ` Stephane Chazelas 2019-12-20 17:12 ` Roman Perepelitsa 2019-12-21 0:50 ` Vincent Lefevre 2019-12-21 8:47 ` Stephane Chazelas 2019-12-21 9:43 ` Roman Perepelitsa 2019-12-21 17:56 ` Stephane Chazelas 2019-12-21 18:11 ` Stephane Chazelas 2019-12-21 18:20 ` Roman Perepelitsa 2019-12-21 21:28 ` Vincent Lefevre 2019-12-21 1:00 ` Vincent Lefevre
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).