3_RA

hrafnulf13
Oct 14, 2020
3 min read

Updated: Oct 20, 2020

Floating point issues and limitations

In programming, a floating point number, for example -123.45 * 10^(-6), is expressed as -123.45E-6 [7]. So, the floating number representation of a number has two part: the first part represents a signed fixed point number called mantissa [4]. The second part of designates the position of the decimal (or binary) point and is called the exponent [4].

It can be written as M*B^E, where

M is fraction mantissa or significand
E is the exponent.
B is the base, in decimal case B = 10.

Below is the physical representation of the floating point number in the register (including the sign). A floating-point binary number is represented in the same way except that the base is 2 for the exponent.

As an example, a 32-bit word representing a floating-point number [7]:

|S |E |M |

|1 bit.... |8 bits.......|23 bits....|

Representing:

(-1)^S * M * 2^E [7].

Note that [7]:

The implied base is 2 (not explicitly shown in the representation).
The exponent can be represented in signed 2's complement (but also see biased notation later).
The implied decimal point is between the exponent field E and the significand field M.
More bits in field E mean larger range of values representable.
More bits in field M mean higher precision.
Zero is represented by all bits equal to 0.

So after normalization and adding 2's complement [7], the number is represented as:

(-1)^S * (1. + M) * 2^(E-Bias) [7]

However, there are problems arising from the floating point arithmetic:

Rounding errors [10]. Since floating-point numbers have a limited number of digits, they cannot represent all real numbers accurately. Thus, when there are more digits than the format stores, the leftover ones are omitted - thus the number is rounded. There are three reasons why this can be important:

Too many significant digits.The advantage of floating point is that leading and trailing zeroes (within the range of the exponent part) don’t need to be stored. Despite that, if there are still more digits than the significand can store, rounding becomes necessary. In other words, if number requires more precision than the format can provide, some of portion of the precision will be sacrificed.
Periodical digits. Irreducible fractions where the denominator has a prime factor that does not occur in the base require an infinite number of digits that periodically repeated after a certain point. This can happen for very simple fractions.
Non-rational numbers cannot be represented as a regular fraction, and in positional notation (no matter what base) they require an infinite number of non-recurring digits.

Comparison [11]. Due to rounding errors, the majority of floating-point numbers become slightly imprecise. If this imprecision stays small, it can usually be ignored. Yet, it also means that numbers that expected to be equal ( if calculated the same result via different methods) might differ slightly, hence an equality test can fail.

Error propagation [1-3, 12, 13]. Despite the fact that the errors in single floating-point numbers are very small, simple calculations on them can increase the error. In general:

Multiplication and division are “safe” operations
Addition and subtraction are dangerous:
- When numbers of different magnitudes are involved, digits of the smaller-magnitude number are lost.
- When numbers very close to each other are subtracted, the result’s less significant digits consist mostly of rounding errors - the more rounding errors the closer the original numbers were.
The losses of precision can be inevitable and benign (the lost digits are insignificant for the final result) or catastrophic (the loss is magnified and distorts the result drastically).
When more calculations are done (iterative algorithms etc.) it is important to consider this kind of problem.
Calculation method can be stable (reduces rounding errors) or unstable (increases rounding errors). Usually there are both stable and unstable solutions for a problem.

Statistics 2020-2021

MSc Cybersecurity, Sapienza University

3_RA

Floating point issues and limitations

References

Recent Posts

Comments