At 2019-10-18T19:20:35-0400, Arthur Krewat wrote:
> I didn't have an 8087 floating point accelerator, so I wrote my
> assembler example to use two 16-bit words of integers, combining them
> for a 31-bit integer value with sign.
> 
> Now mind you, the C version used real floating point, and a software
> floating point library with no hardware accelerator. At that point, I
> realized C was the way to go. It had passed my experiment with flying
> colors. The C compiler, I believe, was from Computer Innovations,
> Copyright (c)1981,82,83,84,85
> 
> The reason this is similar to Ken's statement above: In the assembler
> version, the cube would deform quite a bit before the run would
> finish. A 31-bit integer didn't accurately reflect the result of the
> math. Over time, that slight inaccuracy really added up. The accuracy
> of the C version using floats was spot on.  So while I basically
> cheated for the assembler version, causing the deformation of the cube
> over time, the C version was 100% accurate even though it was slower.
> 
> I wonder, is there something inherently different between PDP-11/7
> floats and Intel's leading to the inaccuracy Ken mentions? Was the
> PDP-11 (or the -7) floating point that much different than IEEE-754 ?

It sounds like it could be a simple matter of precision to me.

It takes 32 bits to store a single-precision floating point value.

Double-precision requires 64.  In IEEE 754, the significand is 53 bits
(52 bits plus the implicit leading 1).

I can never remember the C type promotion rules without looking them up,
but IIRC at least in some circumstances C promotes floats to doubles, at
least for intermediate results.  And the software floating-point library
you used could well have done the same, or perhaps it used doubles all
the way internally.  Either of these could have prevented accumulated
roundoff.

I've heard, with a level of conviction somewhere between folklore and
formal demonstration[1], that for many practical numerical problems,
single-precision is just not quite good enough, but double-precision is
ample.  Somewhere between 24 and 53 bits of significant, perhaps, there
is a sweet spot.

The wisdom I've absorbed is, if you have to do floating-point, use
doubles, unless you can clearly and convincingly articulate why you
absolutely need more precision, or can get away with less.  (For same 3D
game-rendering applications, half-precision is adequate.)

A non-quantified "single-precision will be faster" declaration should be
understood to include a lot of "!!1!11" punctuation after it, and such
people should be handled as delicately as any other Gentoo user.

Regards,
Branden

[1] Example: Ben Klemens, _21st-Century C_, O'Reilly.