Floating-point

From ScienceZero

Jump to: navigation, search

Almost all modern computers approximates real numbers by using floating point arithmetic as defined in the IEEE 754 standard.


[edit] Single Precision

The IEEE 754 single precision number requires 32 bits of storage.

 0 1      8 9                     31
 S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
  • S - Sign bit
  • E - Exponent
  • F - Fraction


The value of the 32 bit word:

  • If E = 255 and F is nonzero, then V = NaN ("Not a number")
  • If E = 255 and F is zero and S is 1, then V = -Infinity
  • If E = 255 and F is zero and S is 0, then V = Infinity
  • If 0<E<255 then V = (-1)**S * 2 ** (E-127) * (1.F) where "1.F" represents the binary number created by prefixing F with an implicit leading 1 and a binary point.
  • If E = 0 and F is nonzero, then V = (-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.
  • If E = 0 and F is zero and S is 1, then V = -0
  • If E = 0 and F is zero and S is 0, then V = 0

[edit] Double Precision

The IEEE 754 single precision number requires 64 bits of storage.

 0 1         11 12                                                  63
 S EEEEEEEEEEE   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
  • S - Sign bit
  • E - Exponent
  • F - Fraction


The value of the 64 bit word:

  • If E = 2047 and F is nonzero, then V = NaN ("Not a number")
  • If E = 2047 and F is zero and S is 1, then V = -Infinity
  • If E = 2047 and F is zero and S is 0, then V = Infinity
  • If 0<E<2047 then V = (-1)**S * 2 ** (E-1023) * (1.F) where "1.F" represents the binary number created by prefixing F with an implicit leading 1 and a binary point.
  • If E = 0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (0.F) These are "unnormalized" values.
  • If E = 0 and F is zero and S is 1, then V = -0
  • If E = 0 and F is zero and S is 0, then V = 0
Views
Personal tools
Science Zero