Home > Issue 3- December > IEEE 754 Floating Point Arithmetic Meets Its Challenger

IEEE 754 Floating Point Arithmetic Meets Its Challenger

[AnythingPopup id=”6″]

Dr. John L. Gustafson is an applied physicist and mathematician who is a Visiting Scientist at ASTAR and Professor at NUS. He is a former Director at Intel Labs and former Chief Product Architect at AMD.
A pioneer in high-performance computing, he introduced cluster computing in 1985 and first demonstrated scalable massively parallel performance on real applications in 1988.
This became known as Gustafson’s Law, for which he won the inaugural ACM Gordon Bell Prize. He is also a recipient of the IEEE Computer Society’s Golden Core Award.

POSIT is knocking on the doors of IEEE 754 standard for Floating Point Arithmetic promising “honest”, accurate answers as compared to compromised, “precisely inaccurate” answers provided by the latter. A battle is brewing among numeral analysts and semiconductor engineers on whether to embrace this bold new approach or not. This is a quick look at POSIT based on an interview with its inventor of POSIT, Prof. John Gustafson, A*STAR Computational Resources Centre and National University of Singapore (joint appointment), Singapore.


If Prof. John Gustafson has his way, you may be talking about Tera POPs instead of Tera FLOPS with respect to your “High Performance Computing” prowess to crunch very large float numbers. The buzz is now out that Prof. Gustafson’s POSIT data type is a serious contender to usurp the nice perch enjoyed by IEEE 754 specification for floating point arithmetic which is the industry standard.


“POSIT, stands for, as per the Oxford American Dictionary, a statement that is made on the assumption that it will prove to be true,” says Prof. Gustafson, who is travelling the world meeting academics and researchers and coopting them into his new scheme for floating number arithmetic. ACCS hosted Prof. Gustafson’s distinguished lectures in India at the Indian Institute of Science, PES University and IIT Madras in November 2017.
“This idea has been knocking around inside my head for the past 30 years. I have put in my complete energy to take this forward the last seven years,” says Prof Gustafson. Ever since he saw the William Kahan’s scheme to handle floating point arithmetic by computers which later became the IEEE 754 Standard, Prof. Gustafson then a researcher in IBM, has thought that there should be a better way to do float arithmetic. I always thought, “It just did not sound natural and logical. Why should we go through such roundabout way to achieve half precision and end up in best guess result.”
“Can we start calling it ‘posits’,” quips Prof. Gustafson when I call a number a float. In a way highlighting the amount of re-education he has to do to get his idea fly. “Until I wrote a book called ‘End of Error’,” people didn’t take POSITs seriously. When the book with that title appeared people were curious to know who can be so preposterous as to predict end of error; it was like predicting end of the world,” recalls Prof Gustafson. While the End of Error became a bestseller, a rare fare for a mathematical book, and it also drove Prof Gustafson’s biggest critique William Kahan to finally react. Kahan’s lengthy rebuttal, which is a google away, finally brought focus to the claim of POSIT.
“Did you ever get to talk to Kahan in person about this,” I ask Prof. Gustafson and he points me to a debate which is recorded and available on Youtube. “He is a tough person to talk to,” says Prof Gustafson.
So what exactly is Prof. Gustafson talking about and why is Kahan angry. To get behind the story we have to go into some of the idiosyncracies of the Floating Point (IEEE 754) and compare it with what Prof Gustafson proposes with his repertoire of Unums I, II and III and Posit. So fasten your seatbelts for a quick roundup of these concepts.

The shortcomings of IEEE 754


Call it a compromise to get things moving or not but IEEE 754, which has become the industry default for floating point arithmetic with all the chip makers adhering to its scheme, has some annoying behaviours. Much of it these annoyances aren’t appreciated unless we are working with higher precision numbers. The most cited being the rounding error. Rounding error has played havoc since Floating Point is dependent on the user machine’s precision. So it attempts to fit every given number into the allowed number of bits by rounding the number using any of the four principles – rounding up to the nearest next number, rounding down to nearest previous number or rounding to zero. Some times, it simply drops the last bit without warning to fit the number to the machine’s precision.

Examples of numeric Errors:

Consider a number, 44.85 x 106 is represented as a normalized floating point as .4485 x 108 Pictorially, one can imagine the number to be represented as below.

Example 1 showing rounding errors


Let us assume a given computer can handle a mantissa of up to 4 digits and an exponent part up to 2 digits. Any arithmetic operations that increases the mantissa to more than 4 digits will end up in a rounding up or down condition and if this rounding is far from our expected value, it obviously results in an error, called a rounding
Similarly, any operation that increases the number of digits to represent the exponent part beyond 2 will result in an overflow (if the sign bit is positive) or underflow (if the sign bit is negative) error.
The second is the overflow and underflow where the exponent values do not fit the bits allocated for it and it raises an unexpected error.
The IEEE 754 standard’s biggest drawback is that it sometimes does not obey arithmetic laws of associativity and commutativity. So, a proven algorithm with different addition and multiplication priorities may show different results on different machines.
Non-conformance of Law of Associativity & Distributivity

Since rounding up or rounding down is the essential part of the Floating Point operations, associative and distributive laws do not hold good. Hence,
(a + b) ± c (a – c ) + b
a(b – c) (ab – ac)

Example 2 showing non associativity


One can safely say that the floating point arithmetic has brought about an interesting sub field where one studies the Errors in Numbers. Due to the way it is defined, perhaps good for its time, the results we get from Floating Point arithmetic is not accurate but an approximation. Of course, depending upon the precision of the user’s machine, we can set this approximation closest to real value.

Not so much for researchers and high precision engineers where every digit far away from the decimal point matters.You can, in fact, read an interesting case of how a double precision IEEE 754 standard treats the operation 0.1 + 0.2 giving an answer of 0.30000000000000004 while one would expect a simple 0.3. See below a discussion in contrast to how Type I Unum handles this.

Pages ( 1 of 2 ): 1 2Next »

Leave a Comment:

Your email address will not be published. Required fields are marked *