Doubleprecision floatingpoint format is a computer number format that occupies 8 bytes (64 bits) in computer memory and represents a wide dynamic range of values by using floating point.
Computers with 32bit storage locations use two memory locations to store a 64bit doubleprecision number (a single storage location can hold a singleprecision number). Doubleprecision floatingpoint format usually refers to binary64, as specified by the IEEE 754 standard, not to the 64bit decimal format decimal64.
IEEE 754 doubleprecision binary floatingpoint format: binary64
Doubleprecision binary floatingpoint is a commonly used format on PCs, due to its wider range over singleprecision floating point, in spite of its performance and bandwidth cost. As with singleprecision floatingpoint format, it lacks precision on integer numbers when compared with an integer format of the same size. It is commonly known simply as double. The IEEE 754 standard specifies a binary64 as having:
This gives 15–17 significant decimal digits precision. If a decimal string with at most 15 significant digits is converted to IEEE 754 double precision representation and then converted back to a string with the same number of significant digits, then the final string should match the original. If an IEEE 754 double precision is converted to a decimal string with at least 17 significant digits and then converted back to double, then the final number must match the original.^{[1]}
The format is written with the significand having an implicit integer bit of value 1 (except for special datums, see the exponent encoding below). With the 52 bits of the fraction significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log_{10}(2) ≈ 15.955). The bits are laid out as follows:
The real value assumed by a given 64bit doubleprecision datum with a given biased exponent e and a 52bit fraction is

(1)^{\text{sign}}(1.b_{51}b_{50}...b_{0})_2 \times 2^{e1023}
or

(1)^{\text{sign}}\left(1 + \sum_{i=1}^{52} b_{52i} 2^{i} \right)\times 2^{e1023}
Between 2^{52}=4,503,599,627,370,496 and 2^{53}=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 2^{53} to 2^{54}, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 2^{51} to 2^{52}, the spacing is 0.5, etc.
The spacing as a fraction of the numbers in the range from 2^{n} to 2^{n+1} is 2^{n−52}. The maximum relative rounding error when rounding a number to the nearest representable one (the machine epsilon) is therefore 2^{−53}.
The 11 bit width of the exponent allows the representation of numbers with a decimal exponent between 10^{−308} and 10^{308}, with full 15–17 decimal digits precision. By compromising precision, subnormal representation allows values smaller than 10^{−323}.
Exponent encoding
The doubleprecision binary floatingpoint exponent is encoded using an offsetbinary representation, with the zero offset being 1023; also known as exponent bias in the IEEE 754 standard. Examples of such representations would be:

E_{min} (1) = −1022

E (50) = −973

E_{max} (2046) = 1023
Thus, as defined by the offsetbinary representation, in order to get the true exponent the exponent bias of 1023 has to be subtracted from the written exponent.
The exponents 000_{16}
and 7ff_{16}
have a special meaning:

000_{16}
is used to represent a signed zero (if M=0) and subnormals (if M≠0); and

7ff_{16}
is used to represent ∞ (if M=0) and NaNs (if M≠0),
where M is the fraction mantissa. All bit patterns are valid encoding.
Except for the above exceptions, the entire doubleprecision number is described by:
(1)^{\text{sign}} \times 2^{\text{exponent}  \text{exponent bias}} \times 1.\text{mantissa}
In the case of subnormals (E=0) the doubleprecision number is described by:
(1)^{\text{sign}} \times 2^{1  \text{exponent bias}} \times 0.\text{mantissa}
Endianness
Although the ubiquitous x86 of today use littleendian storage for all types of data (integer, floating point, BCD), there have been a few historical machines where floating point numbers were represented in bigendian form while integers were represented in littleendian form.^{[2]} There are old ARM processors that have half littleendian, half bigendian floating point representation for doubleprecision numbers: both 32bit words are stored in littleendian like integer registers, but the most significant one first. Because there have been many floating point formats with no "network" standard representation for them, there is no formal standard for transferring floating point values between diverse systems. It may therefore appear strange that the widespread IEEE 754 floating point standard does not specify endianness.^{[3]} Theoretically, this means that even standard IEEE floating point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may in practice safely assume that the endianness is the same for floating point numbers as for integers, making the conversion straightforward regardless of data type. (Small embedded systems using special floating point formats may be another matter however.)
Doubleprecision examples
3ff0 0000 0000 0000_{16} = 1
3ff0 0000 0000 0001_{16} ≈ 1.0000000000000002, the smallest number > 1
3ff0 0000 0000 0002_{16} ≈ 1.0000000000000004
4000 0000 0000 0000_{16} = 2
c000 0000 0000 0000_{16} = –2
0000 0000 0000 0001_{16} = 2^{−1022−52} = 2^{−1074}
≈ 4.9406564584124654 × 10^{−324} (Min subnormal positive double)
000f ffff ffff ffff_{16} = 2^{−1022} − 2^{−1022−52}
≈ 2.2250738585072009 × 10^{−308} (Max subnormal double)
0010 0000 0000 0000_{16} = 2^{−1022}
≈ 2.2250738585072014 × 10^{−308} (Min normal positive double)
7fef ffff ffff ffff_{16} = (1 + (1 − 2^{−52})) × 2^{1023}
≈ 1.7976931348623157 × 10^{308} (Max Double)
0000 0000 0000 0000_{16} = 0
8000 0000 0000 0000_{16} = –0
7ff0 0000 0000 0000_{16} = ∞
fff0 0000 0000 0000_{16} = −∞
3fd5 5555 5555 5555_{16} ≈ 1/3
By default, 1/3 rounds down, instead of up like single precision, because of the odd number of bits in the significand.
In more detail:
Given the hexadecimal representation 3FD5 5555 5555 5555_{16},
Sign = 0
Exponent = 3FD_{16} = 1021
Exponent Bias = 1023 (constant value; see above)
Significand = 5 5555 5555 5555_{16}
Value = 2^{(Exponent − Exponent Bias)} × 1.Significand – Note the Significand must not be converted to decimal here
= 2^{−2} × (15 5555 5555 5555_{16} × 2^{−52})
= 2^{−54} × 15 5555 5555 5555_{16}
= 0.333333333333333314829616256247390992939472198486328125
≈ 1/3
Execution speed with doubleprecision arithmetic
Using double precision floatingpoint variables and mathematical functions (e.g., sin(), cos(), atan2(), log(), exp(), sqrt()) are slower than working with their single precision counterparts. One area of computing where this is a particular issue is for parallel code running on GPUs. For example when using NVIDIA's CUDA platform, on gaming cards, calculations with double precision take 3 to 24 times longer to complete than calculations using single precision.^{[4]}
See also
Notes and references

^ William Kahan (1 October 1987). "Lecture Notes on the Status of IEEE Standard 754 for Binary FloatingPoint Arithmetic".

^ "Floating point formats".

^ "pack – convert a list into a binary representation".

^ http://www.tomshardware.com/reviews/geforcegtxtitangk110review,34383.html
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.