IO Buffering performance in Rust

Bart Massey

This code is originally by /u/bruce3434 on this Reddit thread. The fundamental issue was that dropping a BufWriter on top of StdoutLocked sped the code up by a factor of 2× even though the writes contained no newlines. This Reddit comment explains what is going on; this codebase is the underlying code being measured.

glacial.rs uses unlocked Stdout. This is really slow due to all the locking.
slow.rs uses StdoutLocked. This is still pretty slow, for reasons explained in the comment above.
fast.rs uses a BufWriter atop StdoutLocked. This is the version that is 2× faster than the slow version.
speedy.rs uses a BufWriter atop a raw UNIX File. It is a little faster than the fast version, but is portable only to UNIX systems and has an unsafe in it.
turbo.c is the original inspiration and about the fastest, a C implementation authored by DEC05EBA. Its speedup tricks are used by the other fast versions here.
turbo.rs is a fairly straightforward port of turbo.c, which avoids standard library routines for things in favor of hand-calculation. turbo.rs is about 30% slower than turbo.c.
lightning.cpp is a port of turbo.rs authored by DEC05EBA and contributed by Hossain Adnan. It uses a manual buffer. It is comparable in performance to turbo.c.
lightning.rs is a port of lightning.cpp contributed by Hossain Adnan. It uses a manual buffer currently backed by std::Vec::<u8> along with POSIX write(). It's about 30% slower than turbo.rs.
ludicrous.rs is a version by DEC05EBA that uses a handmade buffer. It is about 10% slower than turbo.c.
serious.rs (not actually serious) is a C-like Rust implementation with tons of unsafe employing all the tricks. It is the same speed as turbo.c (currently insignificantly faster, actually), which is reasonable given that it's even uglier and no safer. "You can write FORTRAN in any language."

Many of these will run only on a POSIX system. I have tried them only on Linux.

Compiler choice matters for the faster C / C++ benchmarks here. clang / gcc and clang++ / g++ will give different answers. By default clang and clang++ are used to increase comparability with Rust's LLVM toolchain.

Perfomance Comparison

To run the benchmarks:
- Install Hyperfine with cargo install hyperfine
- Build the Rust benchmarks with cargo build --release
- Say make bench
The results will be available in BENCH.md. Here are my results from 2022-11-29 on an AMD Ryzen 9 3900X with rustc 1.64.0 and clang / clang++ 14.0.6. They are not significantly different than when run several years ago on older hardware.
To check that the benchmarks produce the same output say make check. The md5sums should match.