Floating point representation

Numerical form

s: Sign bit / 0 - non-negative, 1 - negative M: Significand M / normally a fractional value in range [1.0, 2.0) E: Exponent E / weights value by power of two

Precisions

Single precision - 32 bits

  • 8 exp bits, 23 frac bits Double precision - 64 bits
  • 11 exp bits, 52 frac bits Extended precision - 80 bits
  • 15 exp bits, 63 frac bits
  • 1 bit wasted

Normalized Values

Condition

  • exp != 000…0
  • exp != 111…1 Exponent coded as a biased value
  • E = Exp - Bias
  • Bias : Bias value (=, where k is the number of exp bits)
    • Single precision ) 127
    • Double precision ) 1023 Significand coded with with implied leading 1

Example

float f = 2003.0

Denormalized Values

Condition

  • exp = 000…0 Value
  • Exponent value E = 1 - Bias
  • Significand value M = 0.xxx…x No implied leading 1 Case 1 - exp = 000…0, frac = 000…0
  • Represents value 0.0
  • Two zeros (+0, -0) Case 2 - exp = 000…0, frac != 000…0
  • Numbers very close to 0.0
  • For gradual underflow / Otherwise, the value su
  • Gradual underflow : possible numeric values are spaced evenly near 0.0

Special Values

Condition

  • exp = 111…1 Case 1 - exp = 111…1, frac = 000…0
  • Infinity Case 2 - exp = 111…1, frac != 000…0
  • Not-a-number