Simple C++ code to benchmark fast division algorithms
BSD-3-CLAUSE License
Simple C++ code to benchmark fast division algorithms relying on constant divisors.
The code is a companion to the paper Integer Division by Constants: Optimal Bounds in the sense that it illustrates the results. It should be viewed as a research artefact.
The code is made of a single header file (fast_division.h
).
Copy fast_division.h
in your C++ project. You can then use it as follows.
#include "fast_division/fast_division.h"
// for any compile-time constant 'divisor' we have:
assert(fast_division::divide32<divisor>::quotient(n) == n / divisor);
assert(fast_division::divide32<divisor>::remainder(n) == n % divisor)
assert(fast_division::divide32<divisor>::is_divisible(n) == (n % divisor == 0));
cmake -B build && cmake --build build
./build/benchmark/benchmark
On an Apple M1 processor with clang 12, we get:
./build/benchmark/benchmark
divisor = 19
std division : 0.91 ns/ops (+/- 3.8 %)
fast division : 0.31 ns/ops (+/- 1.0 %)
std remainder : 0.76 ns/ops (+/- 0.7 %)
fast remainder : 0.62 ns/ops (+/- 4.7 %)
divisor = 123456
std division : 0.84 ns/ops (+/- 0.7 %)
fast division : 0.31 ns/ops (+/- 1.1 %)
std remainder : 0.62 ns/ops (+/- 0.6 %)
fast remainder : 0.31 ns/ops (+/- 16.0 %)
divisor = 4096
std division : 0.58 ns/ops (+/- 1.8 %)
fast division : 0.58 ns/ops (+/- 1.8 %)
std remainder : 0.72 ns/ops (+/- 0.7 %)
fast remainder : 0.72 ns/ops (+/- 0.6 %)
On an AMD Zen 2 processor with GCC 10, we get:
$ ./build/benchmark/benchmark
divisor = 19
std division : 0.87 ns/ops (+/- 1.0 %)
fast division : 0.44 ns/ops (+/- 3.4 %)
std remainder : 1.15 ns/ops (+/- 0.1 %)
fast remainder : 0.82 ns/ops (+/- 0.3 %)
divisor = 123456
std division : 0.59 ns/ops (+/- 2.9 %)
fast division : 0.43 ns/ops (+/- 4.3 %)
std remainder : 0.76 ns/ops (+/- 10.5 %)
fast remainder : 0.64 ns/ops (+/- 1.0 %)
divisor = 4096
std division : 0.29 ns/ops (+/- 0.4 %)
fast division : 0.29 ns/ops (+/- 0.3 %)
std remainder : 0.29 ns/ops (+/- 0.4 %)
fast remainder : 0.29 ns/ops (+/- 0.4 %)
Results will vary depending on your compiler and processor. They also depend on the divisor.
The benchmark measures throughput, not latency.
For a practical library with performance claims, see fastmod and the manuscript: