Optimized Thread-Safe RefCounting modern C++ implementaion. 1.5x faster than trivial version.
MIT License
#define cnter counter
Unsafe
Ignore thread safety.Trivial
: Use atomics directly.Optimized
: Use 2 kinds of counter:
ref_cnter
s are in the same thread, we can use local cnter whose overhead is very small.I test the performance of different ref_cnter
s.
Options:
- LIBC++(LLVM)
- STD=C++20
- OPTIM: O3
- COMPILER: CLANG-8.0
Optimized version is 1.5x faster.
Acctually multi-threading performance is very hard to messure(as the overhead of creating a thread is a bit huge to the ref_cnter
's operations). So I tested it in single-thread case. But they're equal as the aim of benchmark is how many benefits we can take when we are "local".
Please add the
ref_cnter
class before using the following codes.
constexpr std::size_t sz = 5000;
static void opt_refcnt(benchmark::State& state) {
using namespace ganler::opt;
ref_cnter cnter;
ref_cnter cnter_;
benchmark::DoNotOptimize(cnter);
benchmark::DoNotOptimize(cnter_);
for (auto _ : state) {
{ // Test passed.
cnter = cnter_;
cnter_ = cnter;
}
}
}
BENCHMARK(opt_refcnt);
static void trivial_refcnt(benchmark::State& state) {
using namespace ganler::trivial;
ref_cnter cnter;
ref_cnter cnter_;
benchmark::DoNotOptimize(cnter);
benchmark::DoNotOptimize(cnter_);
for (auto _ : state) {
{ // Test passed.
cnter = cnter_;
cnter_ = cnter;
}
}
}
// Register the function as a benchmark
BENCHMARK(trivial_refcnt);
We can see that the assembly of the benchmark:
Note that
405b86
is related to the for-loop stuff.
Acctually I cannot understand the results of GCC 9.1. Optimized version is just 1.1x faster.
(Maybe gcc considered the optimization for local atomic operation).
(But the optimized version is as fast as the empty loop).