Fast integer compression in C using the StreamVByte codec
APACHE-2.0 License
StreamVByte is a new integer compression technique that applies SIMD instructions (vectorization) to Google's Group Varint approach. The net result is faster than other byte-oriented compression techniques.
The approach is patent-free, the code is available under the Apache License.
It includes fast differential coding.
It assumes a recent Intel processor (most Intel and AMD processors released after 2010) or an ARM processor with NEON instructions (which is almost all of them except for the tiny cores). Big-endian processors are unsupported at this time, but they are getting to be extremely rare.
The code should build using most standard-compliant C99 compilers. The provided makefile expects a Linux-like system. We have a CMake build.
For high performance, you should have either a 64-bit ARM processor or a 64-bit x64 system with SSE 4.1 support. SSE 4.1 was added to Intel processors in 2007 so it is almost certain that your Intel or AMD processor supports it.
This library is used by
See examples/example.c
for an example.
Short code sample:
// suppose that datain is an array of uint32_t integers
size_t compsize = streamvbyte_encode(datain, N, compressedbuffer); // encoding
// here the result is stored in compressedbuffer using compsize bytes
streamvbyte_decode(compressedbuffer, recovdata, N); // decoding (fast)
If the values are sorted, then it might be preferable to use differential coding:
// suppose that datain is an array of uint32_t integers
size_t compsize = streamvbyte_delta_encode(datain, N, compressedbuffer,0); // encoding
// here the result is stored in compressedbuffer using compsize bytes
streamvbyte_delta_decode(compressedbuffer, recovdata, N,0); // decoding (fast)
You have to know how many integers were coded when you decompress. You can store this information along with the compressed stream.
During decoding, the library may read up to STREAMVBYTE_PADDING
extra bytes
from the input buffer (these bytes are read but never used).
We expect a recent CMake. Please make sure that your version of CMake is up-to-date or you may need to adapt our instructions.
The cmake build system also offers a libstreamvbyte_static
static library
(libstreamvbyte_static
under linux) in addition to
libstreamvbyte
shared library (libstreamvbyte.so
under linux).
-DCMAKE_INSTALL_PREFIX:PATH=/path/to/install
is optional.
Defaults to /usr/local{include,lib}
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX:PATH=/path/to/install \
-DSTREAMVBYTE_ENABLE_EXAMPLES=ON \
-DSTREAMVBYTE_ENABLE_TESTS=ON -B build
cmake --build build
# run the tests like:
ctest --test-dir build
cmake --install build
After building, you may run our benchmark as follows:
./build/test/perf
The benchmarks are not currently built under Windows.
make
./unit
You can install the library (as a dynamic library) on your machine if you have root access:
sudo make install
To uninstall, simply type:
sudo make uninstall
It is recommended that you try make dyntest
before proceeding.
You can try to benchmark the speed in this manner:
make perf
./perf
Make sure to run make test
before, as a sanity test.
We do not directly support signed integers, but you can use fast functions to convert signed integers to unsigned integers.
#include "streamvbyte_zigzag.h"
zigzag_encode(mysignedints, myunsignedints, number); // mysignedints => myunsignedints
zigzag_decode(myunsignedints, mysignedints, number); // myunsignedints => mysignedints
By default, Stream VByte uses 1, 2, 3 or 4 bytes per integer.
In the case where you expect many of your integers to be zero, you might try
the streamvbyte_encode_0124
and streamvbyte_decode_0124
which use
0, 1, 2, or 4 bytes per integer.
We specify the format as follows.
We do not store how many integers (count
) are compressed
in the compressed data per se. If you want to store
the data stream (e.g., to disk), you need to add this
information. It is intentionally left out because, in
applications, it is often the case that there are better
ways to store this count.
There are two streams:
We can interpret the control bytes as a sequence of 2-bit words. The first 2-bit word is made of the least significant 2 bits in the first byte, and so forth. There are four 2-bit words written in each byte.
Starting from the first 2-bit word, we have corresponding sequence in the data bytes, written in sequence from the beginning:
The data bytes are stored using a little-endian encoding.
Consider the following example:
control bytes: [0x40 0x55 ... ]
data bytes: [0x00 0x64 0xc8 0x2c 0x01 0x90 0x01 0xf4 0x01 0x58 0x02 0xbc 0x02 ...]
The first control byte is 0x40 or the four 2-bit words : 00 00 00 01
.
The second control byte is 0x55 or the four 2-bit words : 01 01 01 01
.
Thus the first three values are given by the first three bytes:
0x00, 0x64, 0xc8
(or 0, 100, 200 in base 10). The five next values are stored
using two bytes each: 0x2c 0x01, 0x90 0x01, 0xf4 0x01, 0x58 0x02, 0xbc 0x02
.
As little endian integers, these are to be interpreted as 300, 400, 500, 600, 700.
Thus, to recap, the sequence of integers (0,100,200,300,400,500,600,700) gets encoded as the 15 bytes 0x40 0x55 0x00 0x64 0xc8 0x2c 0x01 0x90 0x01 0xf4 0x01 0x58 0x02 0xbc 0x02
.
If the count
is not divisible by four, then we include a final partial group where we use zero 2-bit corresponding to no data byte.