I want to keep this post fairly brief, so I will only give minimal background on floating point numbers. If you need a refresher on floating point representation, I recommend starting with the Wikipedia entry on floating point, and for more detail about NVIDIA GPU floating point, check out this excellent white paper. The Wikipedia entry on denormal numbers is a good start for this post, so I’ll begin by paraphrasing it.

## What’s a denormal?

*Normal* floating point values have no leading zeros in the mantissa (or significand). The mantissa is normalized and any leading zeros are moved into the exponent. *Subnormal numbers* (or *denormal* numbers) are floating point numbers where this normalized representation would result in an exponent that is too small (not representable). So unlike normal floating point numbers, subnormal numbers have leading zeros in the mantissa. Doing this loses significant digits, but not as much as if the mantissa is flushed to zero on underflow. This allows what is known as “gradual underflow” when a result is very small, and helps avoid catastrophic division-by-zero errors.

Denormal numbers can incur extra computational cost. The Wikipedia entry explains that some platforms implement denormal numbers in software, while others handle them in hardware. On NVIDIA GPUs starting with the Fermi architecture (Compute Capability 2.0 and higher), denormal numbers are handled in hardware for operations that go through the Fused Multiply-Add pipeline (including adds and multiplies), and so these operations don’t carry performance overhead. But multi-instruction sequences such as square root and (notably for my example later in this post) reciprocal square root, must do extra work and take a slower path for denormal values (including the cost of detecting them). Continue reading