caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: David Chase <chase@world.std.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] string_of_float less accurate than sprintf "%f" ?
Date: Mon, 06 May 2002 10:19:34 -0400	[thread overview]
Message-ID: <5.1.0.14.0.20020504185035.027211b0@pop.theWorld.com> (raw)
In-Reply-To: <20020504105324.A15588@pauillac.inria.fr>

[-- Attachment #1: Type: text/plain, Size: 3540 bytes --]

At 10:53 AM 5/4/2002 +0200, Xavier Leroy wrote:
>I'm not taking sides here, just noticing that Java takes the computer
>engineering viewpoint and C (and Caml, by inheritance of
>implementation :-) takes the physicist's viewpoint...

I think it is more likely that C "just happened".  Among
other things, White and Steele had not yet written their
paper on converting machine numbers to strings at the time
C was created.

The computer engineering viewpoint (in this case) is the correct
one for GENERAL use, because it guarantees that no matter
how many times you parse and unparse a floating point number,
you keep the same value.  This seems like a good thing to me.
It may be a value that is an approximation to an inaccurately
measured quantity, but after the initial input-to-FP translation,
the wiggling stops.  This does have one problem, in that the
number of digits printed may vary quite a bit (e.g., between
1.25 and 1.333333333333333) depending upon whether a number
has an exact binary representation.

Because of this problem, and because the number of actually
meaningful digits may vary, it makes plenty of sense to have
ways to print that do allow finer control.  I am not sure that
either C or Java is worth blindly following here.  Java's
decimal format code is a bit baroque in its corner cases
(depending upon how you encode the # and 0 and . and , in its
input, you can describe normal notation, scientific notation,
and engineering-scientific notation.  Unfortunately, the
input notation allows you to say many things that are
nonsensical, and it is not immediately evident what is
desired by looking at a format string).

It's also not entirely clear what rounding you are supposed
to get when you ask for less-precise printing.  The possibilities
include:

1.   round-toward-zero
2.   round-toward-positive-infinity
3.   round-toward-negative-infinity
4.   round-toward-infinity
5.   round-to-nearest-even-if-tie (1.5 --> 2, 2.5 --> 2)
6.   round-to-nearest-rti-if-tie  (1.5 --> 2, 2.5 --> 3, -1.5 --> -2)

1-3,5 correspond to modes often found supported by hardware.
5     is the default answer from numerical people; I think it
      loses the smallest amount of information without introducing
      a bias.
6     is what I think most people expect to see.

What often gets implemented (e.g., by Sun in their implementation
of java.text.DecimalFormat) is actually a combination of two of
these -- first the number is formatted out to "full" (adequately
precise) form (this assumes some sort of rounding), and then that
is reduced in size using some more rounding.  Double-rounding is
bad -- in the worst case, you might arrive at 1.45 rounding (once)
to 1.5 rounding (twice) to 2.0, which is clearly wrong.  The
magnitude of the error is generally much smaller in actual
formatting (1.4999999999995 rounding to 1.5, e.g.) but it is still
an error, and it is avoidable at low or no cost.  Gdtoa from netlib
will do this for you -- I have used it myself, for an implementation
of java.text.DecimalFormat -- and it provides control over the
rounding of the output as well.  (I've attached a test program that
illustrates this, crudely.  It requires gdtoa, obviously.)

But (on the other hand) I really haven't a good idea how
someone might go about elegantly specifying desired rounding
in formatting.  It does matter -- people working with money
(at least in the US) have very definite opinions about how
half of anything is supposed to round (it rounds away from zero,
towards the nearest infinity).

David Chase

[-- Attachment #2: tfp.c --]
[-- Type: text/plain, Size: 1657 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "gdtoa.h"

FPI fpi_near = {
  53, 1-1023-53+1, 2046-1023-53+1, 1, 0
};

FPI fpi_zero = {
  53, 1-1023-53+1, 2046-1023-53+1, 0, 0
};

FPI fpi_up = {
  53, 1-1023-53+1, 2046-1023-53+1, 2, 0
};

FPI fpi_down = {
  53, 1-1023-53+1, 2046-1023-53+1, 3, 0
};

typedef __int64 s64;

char * mygdtoa( double d, int mode, int ndigits, FPI * amper_fpi) {
  s64 l = * (s64*)&d;
  int i0 = (int) (l >> 32);
  int i1 = (int) l;
  int m0 = i0 & 0xfffff;
  int e0 = (i0 >> 20) & 0x7ff; 
  int isNeg = i0 < 0;
  int mantissa[2];
  int decpt;
  int kind = STRTOG_Normal;
  char * result;
  char * rve;
  int length;

  if (e0 == 0x7ff) {
    if ((m0 | i1) != 0) {
      return "NaN";
    } else {
      return isNeg ? "-Infinity" : "Infinity";
    }
  }
  if ((m0 | i1 | e0) == 0) {
    return isNeg ? "-0.0" : "0.0";
  }

  if (e0 != 0) m0 |= 0x100000;
  else e0 = 1;

  e0 -= 0x3ff + 52;

  mantissa[0] = i1;
  mantissa[1] = m0;

  
  result = gdtoa(amper_fpi, e0, mantissa, &kind, mode, ndigits, &decpt, &rve);

  length = rve - result;
  return result;
}

test1(double x, int n) {
  printf("near %d %s\n", n, mygdtoa(x,2,n, &fpi_near)); 
  printf("zero %d %s\n", n, mygdtoa(x,2,n, &fpi_zero)); 
  printf("up   %d %s\n", n, mygdtoa(x,2,n, &fpi_up)); 
  printf("down %d %s\n", n, mygdtoa(x,2,n, &fpi_down)); 
  
}

void test (double x) {
  test1(x,1); 
  test1(x,5); 
  test1(x,6); 
  test1(x,7); 
  test1(x,8); 

  test1(-x,5); 
  test1(-x,6); 
  test1(-x,7); 
  test1(-x,8); 

  printf("\n");
}

int main(int argc, char ** argv) {
  test (0.40999995);
  test (0.40444449);
  test (1.5);
  test (2.5);
}

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



  parent reply	other threads:[~2002-05-06 14:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-04-30  8:21 Beck01, Wolfgang
2002-05-02 12:44 ` John Max Skaller
2002-05-02 12:54   ` Francois Thomasset
2002-05-05 17:33     ` John Max Skaller
2002-05-02 13:28   ` David Chase
2002-05-05 18:19     ` John Max Skaller
2002-05-02 13:46   ` jeanmarc.eber
2002-05-03 14:41   ` Oliver Bandel
     [not found]   ` <Pine.LNX.3.95.1020503162341.541E-100000@first.in-berlin.de >
2002-05-03 18:28     ` David Chase
2002-05-04  8:53 ` Xavier Leroy
2002-05-05  0:31   ` David McClain
2002-05-06 14:19   ` David Chase [this message]
2002-05-06 18:21     ` David McClain
2002-05-03 19:25 David Chase

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5.1.0.14.0.20020504185035.027211b0@pop.theWorld.com \
    --to=chase@world.std.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).