Floating point representation
Numerical form
s: Sign bit / 0 - non-negative, 1 - negative M: Significand M / normally a fractional value in range [1.0, 2.0) E: Exponent E / weights value by power of two
Precisions
Single precision - 32 bits
- 8 exp bits, 23 frac bits Double precision - 64 bits
- 11 exp bits, 52 frac bits Extended precision - 80 bits
- 15 exp bits, 63 frac bits
- 1 bit wasted
Normalized Values
Condition
- exp != 000…0
- exp != 111…1 Exponent coded as a biased value
- E = Exp - Bias
- Bias : Bias value (=, where k is the number of exp bits)
- Single precision ) 127
- Double precision ) 1023 Significand coded with with implied leading 1
Example
float f = 2003.0
Denormalized Values
Condition
- exp = 000…0 Value
- Exponent value E = 1 - Bias
- Significand value M = 0.xxx…x No implied leading 1 Case 1 - exp = 000…0, frac = 000…0
- Represents value 0.0
- Two zeros (+0, -0) Case 2 - exp = 000…0, frac != 000…0
- Numbers very close to 0.0
- For gradual underflow / Otherwise, the value su
- Gradual underflow : possible numeric values are spaced evenly near 0.0
Special Values
Condition
- exp = 111…1 Case 1 - exp = 111…1, frac = 000…0
- Infinity Case 2 - exp = 111…1, frac != 000…0
- Not-a-number