"floating point conversion algorithm will perform several runs until the error b...

collapsar · on Jan 5, 2011

almost every floating point value given as a string in decimal notation will not be representable in binary given a floating point format with fixed precision. if the number to convert lies within the range of your binary fp precision (eg iee 754, 52 bits of mantissa, 11 bits exponent), it will in general be sandwiched between two numbers that are exactly representable in binary. the task is to implement an algorithm which choose between these 2 numbers in the best possible way (eg without introducing systematic bias).

the apple engineers simply rely on the compiler library: they collect the mantissa digits into up to 2 integers, thus generating an exact binary representation for mantissse of up to 18 decimal digits. thereafter they coalesce these integers into a double variable having an implicit cast from int to double - that's where the library or compiler-generated sequence of machine instructions take over.

the pgp guys actually do the same thing. however, after this step they continue to adjust the result for obtaining the best approximation in the sense above. it can be proven mathematically ([1]) that this task _cannot_ be performed for all inputs using arithmetic with any given precision. [1] also provides a near-optimal non-iterative solution which the author of the php conversion routine has improved upon. however the computations involve floating point (processor) arithmetic which suffers from a specific error in intel fpus. due to this error intermediate results are altered in an unfortunate way so that the adjustment algorithm no longerterminated in spite of the theoretical guarantee.

some more words about the 18-digit mantissa mentioned at the beginning. the apple code cuts off the mantissa after 18 decimal digits and its authors claim this operation won't affect the result. that would not be a problem if the input strings would be given with normalized mantissae (first digit not a zero), as the error was introduced in the 18th decimal / approximately 60th binary digit while ieee 754 double format only caters for 52 matissa digits. as the strtod.c docs do not detail the 'internal fp representation', this might be problematic (it would be eg. for 80-bit extended precision on x87 coprocessors featuring 64 mantissa bits). however, as they allow non-normalized mantissa values, their conversion runs foul on strings with 'many' leading zeros, ie they'd convert 0.0000000000000000009999E+0 to 0.

hope all this makes kind of sense.

greets, carsten

[1] http://dx.doi.org/10.1145/989393.989430 William D. Clinger, How to read floating point numbers accurately, ACM SIGPLAN Notices, v.25 n.6, p.92-101, Jun. 1990

[2] http://en.wikipedia.org/wiki/Double_precision_floating-point... fp formats for starters