Floating Point Addition: X + C == X ?

08 May 2023

For any real number X and any positive real number C, X + C is always greater than X.

In floating point arithmetic however, “X + C == X” can hold true in some cases.

// Doesn't panic! Try it in Rust Playground
assert!(std::f64::consts::PI + 1E-26 == std::f64::consts::PI);

When doing IEEE 754 Floating point addition of floats X & C, C (assuming exponent of C is less than exponent of X) is adjusted to have the same exponent as X, and after that the addition is finally performed.

// HEX: 0x400921FB54442D18
const X: f64 = std::f64::consts::PI;
println!("0x{:X}", X.to_bits());
// BIN: 0b100000000001001001000011111101101010100010001000010110100011000
println!("0b{:b}", X.to_bits());

const C: f64 = 1E-26_f64;
// => HEX: 0x3A88C240C4AECB14
println!("0x{:X}", C.to_bits());
// => BIN: 0b11101010001000110000100100000011000100101011101100101100010100
println!("0b{:b}", C.to_bits());

float.exposed is a cool web-app that helps visualize and fiddle with bits of floating point numbers. Using the app, it’s easy to identify fields of a floating point number (sign, exponent, significand).

X => 2^(1) * 0b1.1001001000011111101101010100010001000010110100011000
C => 2^(-87) * 0b1.1000110000100100000011000100101011101100101100010100 => 2^(1) * (0b1.1000110000100100000011000100101011101100101100010100 » 88) => 2^(1) * (0b0.0000000000000000000000000000000000000000000000000000) (mantissa lost due to Truncation Error)

To add X & C, C’s representation needs to be adjusted to have the identical exponent as X. During adjustment, bits that consist mantissa of C is lost due to Truncation Error. So that’s why floating point addition (X + C) can equal X.

TODO: What are the exact steps that FPU hardware follow to implement floating point addition?

References

Floating Point Numbers (Part2: Fp Addition) - Computerphile

JOE1994

Floating Point Addition: X + C == X ?

References