In computing, half precision is a binary floatingpoint computer number format that occupies 16 bits (two bytes in modern computers) in computer memory.
In IEEE 7542008 the 16bit base 2 format is officially referred to as binary16. It is intended for storage (of many floatingpoint values where higher precision need not be stored), not for performing arithmetic computations.
Halfprecision floating point is a relatively new binary floatingpoint format. Nvidia defined the half datatype in the Cg language, released in early 2002, and was the first to implement 16bit floating point in silicon, with the GeForce FX, released in late 2002.^{[1]} ILM was searching for an image format that could handle dynamic ranges, but without the hard drive and memory cost of floatingpoint representations that are commonly used for floatingpoint computation (single and double precision).^{[2]} The hardwareaccelerated programmable shading group lead by John Airey at SGI (Silicon Graphics) invented the s10e5 data type in 1997 as part of the 'bali' design effort. This is described in a SIGGRAPH 2000 paper^{[3]} (see section 4.3) and further documented in US patent 7518615.^{[4]}
This format is used in several computer graphics environments including OpenEXR, JPEG XR, OpenGL, Cg, and D3DX. The advantage over 8bit or 16bit binary integers is that the increased dynamic range allows for more detail to be preserved in highlights and shadows for images. The advantage over 32bit singleprecision binary formats is that it requires half the storage and bandwidth (at the expense of precision and range).^{[2]}
IEEE 754 halfprecision binary floatingpoint format: binary16
The IEEE 754 standard specifies a binary16 as having the following format:
The format is laid out as follows:
The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus only 10 bits of the significand appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log_{10}(2^{11}) ≈ 3.311 decimal digits).
Exponent encoding
The halfprecision binary floatingpoint exponent is encoded using an offsetbinary representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard.

E_{min} = 00001_{2} − 01111_{2} = −14

E_{max} = 11110_{2} − 01111_{2} = 15

Exponent bias = 01111_{2} = 15
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent.
The stored exponents 00000_{2} and 11111_{2} are interpreted specially.
Exponent

Significand zero

Significand nonzero

Equation

00000_{2}

zero, −0

subnormal numbers

(−1)^{signbit} × 2^{−14} × 0.significantbits_{2}

00001_{2}, ..., 11110_{2}

normalized value

(−1)^{signbit} × 2^{exponent−15} × 1.significantbits_{2}

11111_{2}

±infinity

NaN (quiet, signalling)

The minimum strictly positive (subnormal) value is 2^{−24} ≈ 5.96 × 10^{−8}. The minimum positive normal value is 2^{−14} ≈ 6.10 × 10^{−5}. The maximum representable value is (2−2^{−10}) × 2^{15} = 65504.
Half precision examples
These examples are given in bit representation of the floatingpoint value. This includes the sign bit, (biased) exponent, and significand.
0 01111 0000000000 = 1
0 01111 0000000001 = 1 + 2^{−10} = 1.0009765625 (next smallest float after 1)
1 10000 0000000000 = −2
0 11110 1111111111 = 65504 (max half precision)
0 00001 0000000000 = 2^{−14} ≈ 6.10352 × 10^{−5} (minimum positive normal)
0 00000 1111111111 = 2^{−14}  2^{−24} ≈ 6.09756 × 10^{−5} (maximum subnormal)
0 00000 0000000001 = 2^{−24} ≈ 5.96046 × 10^{−8} (minimum positive subnormal)
0 00000 0000000000 = 0
1 00000 0000000000 = −0
0 11111 0000000000 = infinity
1 11111 0000000000 = −infinity
0 01101 0101010101 = 0.333251953125 ≈ 1/3
By default, 1/3 rounds down like for double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101...
which is less than 1/2 of a unit in the last place.
Precision limitations on decimal values in [0, 1]

Decimals between 2^{−24} (minimum positive subnormal) and 2^{14} (maximum subnormal)

fixed interval 2^{24}

Decimals between 2^{−14} (minimum positive normal) and 2^{13}: fixed interval 2^{24}

Decimals between 2^{−13} and 2^{12}: fixed interval 2^{23}

Decimals between 2^{−12} and 2^{11}: fixed interval 2^{22}

Decimals between 2^{−11} and 2^{10}: fixed interval 2^{21}

Decimals between 2^{−10} and 2^{9}: fixed interval 2^{20}

Decimals between 2^{−9} and 2^{8}: fixed interval 2^{19}

Decimals between 2^{−8} and 2^{7}: fixed interval 2^{18}

Decimals between 2^{−7} and 2^{6}: fixed interval 2^{17}

Decimals between 2^{−6} and 2^{5}: fixed interval 2^{16}

Decimals between 2^{−5} and 2^{4}: fixed interval 2^{15}

Decimals between 2^{−4} and 2^{3}: fixed interval 2^{14}

Decimals between 2^{−3} and 2^{2}: fixed interval 2^{13}

Decimals between 2^{−2} and 2^{1}: fixed interval 2^{12}

Decimals between 2^{−1} and 1: fixed interval 2^{11}

Decimals between 1 and 2: fixed interval 2^{10} (1+2^{10} is the next smallest float after 1)
Precision limitations on integer values

Integers between 0 and 2048 can be exactly represented

Integers between 2049 and 4096 round to a multiple of 2 (even number)

Integers between 4097 and 8192 round to a multiple of 4

Integers between 8193 and 16384 round to a multiple of 8

Integers between 16385 and 32768 round to a multiple of 16

Integers between 32769 and 65519 round to a multiple of 32

Integers equal to or above 65520 are rounded to "infinity".
See also
References

^ Nvidia

^ ^{a} ^{b} http://www.openexr.com/about.html

^ http://people.csail.mit.edu/ericchan/bib/pdf/p425peercy.pdf

^ http://www.google.com/patents/US7518615
External links

Minifloats (in Survey of FloatingPoint Formats)

OpenEXR site

Half precision constants from D3DX

OpenGL treatment of half precision

Fast Half Float Conversions

Analog devices variant (fourbit exponent)

C source code to convert between IEEE double, single, and half precision can be found here

C# source code implementing a halfprecision floatingpoint data type can be found here

Java source code for halfprecision floatingpoint conversion

Half precision floating point for one of the extended GCC features

[1]
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.