Dear Tung,

converting strings to floats and keeping exact bounds is not that complicated. The steps are:

1.) Convert the digits to an integer. You can stop when you have 1 or 2 digits more than the mantissa can hold. E.g. if you create a 32 or 64 bit float, a 32 bit or 64 bit integer is enough, because it is quite a bit longer than the float mantissa. The lower and upper bounds are the truncated value and the truncated value + 1. If the decimal number fits into the mantissa, the bounds are the same. For the truncated digits you just have to count them to correct the decimal exponent. Also you need to take care of the decimal point to get the proper decimal exponent.

2.) Convert the integer to a float. If the integer is smaller than the mantissa length, this is trivial. If it is longer, you divide by 2 until it fits. The lower limit from step 1 is rounded down, the upper limit is rounded up while dividing. The number of divide steps is limited by the difference of the integer size you used and the floats mantissa size (that is the exponent size of the float +1). To make it faster, you can also do a binary intersection search of the right power of 2, but it might not be worth it, because dividing integers by 2 is cheap and the max loop count is small.

3.) In order to handle the decimal point and decimal exponents, you need tables of 10^(2^n). Use the binary representation of the decimal exponent to find out which powers to multiply. For each power you need a lower and upper bound value (as a floating point number). You can compute these tables with libraries supporting higher precision numbers and store the numbers (best in hex float notation – which is standard C). In an IEEE floating point implementation, you can choose the rounding direction (up down middle). When you multiply the factors of the lower and upper bounds, you set the rounding direction accordingly. The C function for this is fesetround. The exponent tables are fairly small, because 10^(2^n) is larger than the maximal floating point number for rather modest n (for double precision, n=8 = 10^256 should be the largest value you need).

That’s it. Since you multiply several numbers (at most 10), each of which is rounded up /down, your range might be a bit larger than necessary. For getting the minimum bound you need multi precision arithmetic, but the method above works with just target precision arithmetic and gives reliable bounds.

Note that for integers and some other exactly representable numbers, this method gives identical lower and upper bounds.

Best regards,

Michael

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928