fastermath

A library of optimized math functions targeted at 32-bit and 64-bit x86 Linux systems

Stars
24

fastermath

A library of math functions targeted at 32-bit and 64-bit x86 Linux systems.

The purpose of this library is to provide faster drop in replacements for selected functions of the standard math library 'libm'. These functions are written so they can be more optimized by compilers and all special case tests for increased consistency and accuracy have been removed. They are based on the corresponding implementations from the Cephes math library by Stephen L. Moshier. The code has been simplified perusing internal compiler facilities wherever possible and assuming little endian IEEE-754 single and double precision math.

Installation

This package is x86 and Linux specific and makes heavy use of advanced features of the GNU compiler toolchain and thus requires a somewhat recent GNU C compiler (tested with 4.4.x to 4.7.x) or a compatible Intel compiler (tested with 13.0.x).

To compile simply type "make ". This will read a configuration file in the directory config/ and will create an object directory for it and compile all binaries in it. The config file naming convention is: --.inc If you type "make" without any target will list all availalable configurations that are compatible with the current platform. Typing "make all" will try to compile all of them.

Compilation should produce two libraries (libfastermath.so & libfastermath.a), two utility programs (genspline & tester) and a shared object (fastermath.so) in the configuration specific object directory.

If the CPPFLAGS contains the flag -DLIBM_ALIAS the libraries contain entries which alias the internal functions directly to the lib counterparts.

Usage

The library contains math functions that are mostly equivalent with their counterparts in libm. They have the same name, but the string 'fm_' prepended. So renaming the functions calls, e.g. log(), with the library version, i.e. fm_log() will call this library instead of libm. The fastermath.so shared object would redirect function calls from the libm version to the fastermath variant when the shared object is activated using LD_PRELOAD.

How it works

The normal math functions contain a lot of tests for over- and underflow and go through a lot of pains to handle borderline cases and maximize accuracy. The functions in the fastermath library skip those, since in most cases, those are not needed, but const a significant amount CPU time. Similarly, the streamlined code can be optimized better by compilers. In the case of logarithms, it has been found that using a cubic spline interpolation is significantly faster than using the usual rational function expansion from math libraries like cephes or glibc.

How fast is it?

That depends on the compiler, the code and the CPU. Speedup on 64-bit CPUs is usually higher, speedup on Intel CPUs is often higher than on AMD CPUs. Exponentials are 1.5-3 times faster, logarithms 2-4 times faster. When you have well vectorizing code (like the tester in this package), the Intel compiler can vectorize the loops and then use the short vector math library (SVML), which can handle multiple function calls concurrently and achieve much higher speedup.