1.

Does Numpy Support Nan?

Answer»

nan, short for “not a number”, is a special floating point value defined by the IEEE-754 specification along with inf (infinity) and other values and behaviours. In theory, IEEE nan was specifically designed to address the problem of missing values, but the reality is that different platforms behave differently, making life more difficult. On some platforms, the presence of nanslows calculations 10-100 times. For integer data, no nan value exists. Some platforms, notably older Crays and VAX machines, don’t support nan whatsoever.

Despite all these issues NumPy (and SciPy) endeavor to support IEEE-754 behaviour (based on NumPy’s predecessor numarray). The most significant challenge is a lack of cross-platform support within Python itself. Because NumPy is written to take advantage of C99, which supports IEEE-754, it can side-step such issues internally, but users may still face problems when, for example, comparing values within Python interpreter. In fact, NumPy currently assumes IEEE-754 behavior of the underlying floats, a decision that may have to be revisited when the VAX community rises up in rebellion.

Those wishing to avoid potential headaches will be interested in an alternative solution which has a long history in NumPy’s predecessors – MASKED arrays. Masked arrays are standard arrays with a second “mask” array of the same shape to indicate whether the value is PRESENT or missing. Masked arrays are the domain of the numpy.ma module, and CONTINUE the cross-platform Numeric/numarray tradition. See “Cookbook/Matplotlib/Plotting values with masked arrays” (TODO) for example, to avoid plotting missing data in Matplotlib. Despite their additional memory REQUIREMENT, masked arrays are faster than nans on MANY floating point units. 

nan, short for “not a number”, is a special floating point value defined by the IEEE-754 specification along with inf (infinity) and other values and behaviours. In theory, IEEE nan was specifically designed to address the problem of missing values, but the reality is that different platforms behave differently, making life more difficult. On some platforms, the presence of nanslows calculations 10-100 times. For integer data, no nan value exists. Some platforms, notably older Crays and VAX machines, don’t support nan whatsoever.

Despite all these issues NumPy (and SciPy) endeavor to support IEEE-754 behaviour (based on NumPy’s predecessor numarray). The most significant challenge is a lack of cross-platform support within Python itself. Because NumPy is written to take advantage of C99, which supports IEEE-754, it can side-step such issues internally, but users may still face problems when, for example, comparing values within Python interpreter. In fact, NumPy currently assumes IEEE-754 behavior of the underlying floats, a decision that may have to be revisited when the VAX community rises up in rebellion.

Those wishing to avoid potential headaches will be interested in an alternative solution which has a long history in NumPy’s predecessors – masked arrays. Masked arrays are standard arrays with a second “mask” array of the same shape to indicate whether the value is present or missing. Masked arrays are the domain of the numpy.ma module, and continue the cross-platform Numeric/numarray tradition. See “Cookbook/Matplotlib/Plotting values with masked arrays” (TODO) for example, to avoid plotting missing data in Matplotlib. Despite their additional memory requirement, masked arrays are faster than nans on many floating point units. 



Discussion

No Comment Found