Floating Point Addition: X + C == X ?
For any real number X
and any positive real number C
, X + C
is always greater than X
.
In floating point arithmetic however, “X + C == X
” can hold true in some cases.
// Doesn't panic! Try it in Rust Playground
assert!(std::f64::consts::PI + 1E-26 == std::f64::consts::PI);
When doing IEEE 754 Floating point addition of floats X & C, C (assuming exponent of C is less than exponent of X) is adjusted to have the same exponent as X, and after that the addition is finally performed.
// HEX: 0x400921FB54442D18
const X: f64 = std::f64::consts::PI;
println!("0x{:X}", X.to_bits());
// BIN: 0b100000000001001001000011111101101010100010001000010110100011000
println!("0b{:b}", X.to_bits());
const C: f64 = 1E-26_f64;
// => HEX: 0x3A88C240C4AECB14
println!("0x{:X}", C.to_bits());
// => BIN: 0b11101010001000110000100100000011000100101011101100101100010100
println!("0b{:b}", C.to_bits());
float.exposed is a cool web-app that helps visualize and fiddle with bits of floating point numbers. Using the app, it’s easy to identify fields of a floating point number (sign, exponent, significand).
- X => 2^(1) * 0b1.1001001000011111101101010100010001000010110100011000
- C => 2^(-87) * 0b1.1000110000100100000011000100101011101100101100010100 => 2^(1) * (0b1.1000110000100100000011000100101011101100101100010100 » 88) => 2^(1) * (0b0.0000000000000000000000000000000000000000000000000000) (mantissa lost due to Truncation Error)
To add X & C, C’s representation needs to be adjusted to have the identical exponent as X.
During adjustment, bits that consist mantissa of C is lost due to Truncation Error.
So that’s why floating point addition (X + C)
can equal X
.
TODO: What are the exact steps that FPU hardware follow to implement floating point addition?