zstd

Zstandard - Fast real-time compression algorithm

OTHER License

Downloads
142.7K
Stars
22.3K
Committers
349

Bot releases are visible (Hide)

zstd - Zstandard v1.5.6 - Chrome Edition Latest Release

Published by Cyan4973 7 months ago

This release highlights the deployment of Google Chrome 123, introducing zstd-encoding for Web traffic, introduced as a preferable option for compression of dynamic contents. With limited web server support for zstd-encoding due to its novelty, we are launching an updated Zstandard version to facilitate broader adoption.

New stable parameter ZSTD_c_targetCBlockSize

Using zstd compression for large documents over the Internet, data is segmented into smaller blocks of up to 128 KB, for incremental updates. This is crucial for applications like Chrome that process parts of documents as they arrive. However, on slow or congested networks, there can be some brief unresponsiveness in the middle of a block transmission, delaying update. To mitigate such scenarios, libzstd introduces the new parameter ZSTD_c_targetCBlockSize, enabling the division of blocks into even smaller segments to enhance initial byte delivery speed. Activating this feature incurs a cost, both runtime (equivalent to -2% speed at level 8) and a slight compression efficiency decrease (<0.1%), but offers some interesting latency reduction, notably beneficial in areas with less powerful network infrastructure.

Granular binary size selection

libzstd provides build customization, including options to compile only the compression or decompression modules, minimizing binary size. Enhanced in v1.5.6 (source), it now allows for even finer control by enabling selective inclusion or exclusion of specific components within these modules. This advancement aids applications needing precise binary size management.

Miscellaneous Enhancements

This release includes various minor enhancements and bug fixes to enhance user experience. Key updates include an expanded list of recognized compressed file suffixes for the --exclude-compressed flag, improving efficiency by skipping presumed incompressible content. Furthermore, compatibility has been broadened to include additional chipsets (sparc64, ARM64EC, risc-v) and operating systems (QNX, AIX, Solaris, HP-UX).

Change Log

api: Promote ZSTD_c_targetCBlockSize to Stable API by @felixhandte
api: new experimental ZSTD_d_maxBlockSize parameter, to reduce streaming decompression memory, by @terrelln
perf: improve performance of param ZSTD_c_targetCBlockSize, by @Cyan4973
perf: improved compression of arrays of integers at high compression, by @Cyan4973
lib: reduce binary size with selective built-time exclusion, by @felixhandte
lib: improved huffman speed on small data and linux kernel, by @terrelln
lib: accept dictionaries with partial literal tables, by @terrelln
lib: fix CCtx size estimation with external sequence producer, by @embg
lib: fix corner case decoder behaviors, by @Cyan4973 and @aimuz
lib: fix zdict prototype mismatch in static_only mode, by @ldv-alt
lib: fix several bugs in magicless-format decoding, by @embg
cli: add common compressed file types to --exclude-compressed by @daniellerozenblit (requested by @dcog989)
cli: fix mixing -c and -o commands with --rm, by @Cyan4973
cli: fix erroneous exclusion of hidden files with --output-dir-mirror by @felixhandte
cli: improved time accuracy on BSD, by @felixhandte
cli: better errors on argument parsing, by @KapJI
tests: better compatibility with older versions of grep, by @Cyan4973
tests: lorem ipsum generator as default content generator, by @Cyan4973
build: cmake improvements by @terrelln, @sighingnow, @gjasny, @JohanMabille, @Saverio976, @gruenich, @teo-tsirpanis
build: bazel support, by @jondo2010
build: fix cross-compiling for AArch64 with lld by @jcelerier
build: fix Apple platform compatibility, by @nidhijaju
build: fix Visual 2012 and lower compatibility, by @Cyan4973
build: improve win32 support, by @DimitriPapadopoulos
build: better C90 compliance for zlibWrapper, by @emaste
port: make: fat binaries on macos, by @mredig
port: ARM64EC compatibility for Windows, by @dunhor
port: QNX support by @klausholstjacobsen
port: MSYS2 and Cygwin makefile installation and test support, by @QBos07
port: risc-v support validation in CI, by @Cyan4973
port: sparc64 support validation in CI, by @Cyan4973
port: AIX compatibility, by @likema
port: HP-UX compatibility, by @likema
doc: Improved specification accuracy, by @elasota
bug: Fix and deprecate ZSTD_generateSequences (#3981), by @terrelln

Full change list (auto-generated)

New Contributors

Full Changelog: https://github.com/facebook/zstd/compare/v1.5.5...v1.5.6

zstd - Zstandard v1.5.5

Published by Cyan4973 over 1 year ago

Zstandard v1.5.5 Release Note

This is a quick fix release. The primary focus is to correct a rare corruption bug in high compression mode, detected by @danlark1 . The probability to generate such a scenario by random chance is extremely low. It evaded months of continuous fuzzer tests, due to the number and complexity of simultaneous conditions required to trigger it. Nevertheless, @danlark1 from Google shepherds such a humongous amount of data that he managed to detect a reproduction case (corruptions are detected thanks to the checksum), making it possible for @terrelln to investigate and fix the bug. Thanks !
While the probability might be very small, corruption issues are nonetheless very serious, so an update to this version is highly recommended, especially if you employ high compression modes (levels 16+).

When the issue was detected, there were a number of other improvements and minor fixes already in the making, hence they are also present in this release. Let’s detail the main ones.

Improved memory usage and speed for the --patch-from mode

V1.5.5 introduces memory-mapped dictionaries, by @daniellerozenblit, for both posix #3486 and windows #3557.

This feature allows zstd to memory-map large dictionaries, rather than requiring to load them into memory. This can make a pretty big difference for memory-constrained environments operating patches for large data sets.
It's mostly visible under memory pressure, since mmap will be able to release less-used memory and continue working.
But even when memory is plentiful, there are still measurable memory benefits, as shown in the graph below, especially when the reference turns out to be not completely relevant for the patch.

mmap_memory_usage

This feature is automatically enabled for --patch-from compression/decompression when the dictionary is larger than the user-set memory limit. It can also be manually enabled/disabled using --mmap-dict or --no-mmap-dict respectively.

Additionally, @daniellerozenblit introduces significant speed improvements for --patch-from.

An I/O optimization in #3486 greatly improves --patch-from decompression speed on Linux, typically by +50% on large files (~1GB).

patch-from_IO_optimization

Compression speed is also taken care of, with a dictionary-indexing speed optimization introduced in #3545. It wildly accelerates --patch-from compression, typically doubling speed on large files (~1GB), sometimes even more depending on exact scenario.

patch_from_compression_speed_optimization

This speed improvement comes at a slight regression in compression ratio, and is therefore enabled only on non-ultra compression strategies.

Speed improvements of middle-level compression for specific scenarios

The row-hash match finder introduced in version 1.5.0 for levels 5-12 has been improved in version 1.5.5, enhancing its speed in specific corner-case scenarios.

The first optimization (#3426) accelerates streaming compression using ZSTD_compressStream on small inputs by removing an expensive table initialization step. This results in remarkable speed increases for very small inputs.

The following scenario measures compression speed of ZSTD_compressStream at level 9 for different sample sizes on a linux platform running an i7-9700k cpu.

sample size v1.5.4 (MB/s) v1.5.5 (MB/s) improvement
100 1.4 44.8 x32
200 2.8 44.9 x16
500 6.5 60.0 x9.2
1K 12.4 70.0 x5.6
2K 25.0 111.3 x4.4
4K 44.4 139.4 x3.2
... ... ...
1M 97.5 99.4 +2%

The second optimization (#3552) speeds up compression of incompressible data by a large multiplier. This is achieved by increasing the step size and reducing the frequency of matching when no matches are found, with negligible impact on the compression ratio. It makes mid-level compression essentially inexpensive when processing incompressible data, typically, already compressed data (note: this was already the case for fast compression levels).

The following scenario measures compression speed of ZSTD_compress compiled with gcc-9 for a ~10MB incompressible sample on a linux platform running an i7-9700k cpu.

level v1.5.4 (MB/s) v1.5.5 (MB/s) improvement
3 3500 3500 not a row-hash level (control)
5 400 2500 x6.2
7 380 2200 x5.8
9 176 1880 x10
11 67 1130 x16
13 89 89 not a row-hash level (control)

Miscellaneous

There are other welcome speed improvements in this package.

For example, @felixhandte managed to increase processing speed of small files by carefully reducing the nb of system calls (#3479). This can easily translate into +10% speed when processing a lot of small files in batch.

The Seekable format received a bit of care. It's now much faster when splitting data into very small blocks (#3544). In an extreme scenario reported by @P-E-Meunier, it improves processing speed by x90. Even for more "common" settings, such as using 4KB blocks on some "normally" compressible data like enwik, it still provides a healthy x2 processing speed benefit. Moreover, @dloidolt merged an optimization that reduces the nb of I/O seek() events during reads (decompression), which is also beneficial for speed.

The release is not limited to speed improvements, several loose ends and corner cases were also fixed in this release. For a more detailed list of changes, please take a look at the changelog.

Change Log

  • fix: fix rare corruption bug affecting the high compression mode, reported by @danlark1 (#3517, @terrelln)
  • perf: improve mid-level compression speed (#3529, #3533, #3543, @yoniko and #3552, @terrelln)
  • lib: deprecated bufferless block-level API (#3534) by @terrelln
  • cli: mmap large dictionaries to save memory, by @daniellerozenblit
  • cli: improve speed of --patch-from mode (~+50%) (#3545) by @daniellerozenblit
  • cli: improve i/o speed (~+10%) when processing lots of small files (#3479) by @felixhandte
  • cli: zstd no longer crashes when requested to write into write-protected directory (#3541) by @felixhandte
  • cli: fix decompression into block device using -o (#3584, @Cyan4973) reported by @georgmu
  • build: fix zstd CLI compiled with lzma support but not zlib support (#3494) by @Hello71
  • build: fix cmake does no longer require 3.18 as minimum version (#3510) by @kou
  • build: fix MSVC+ClangCL linking issue (#3569) by @tru
  • build: fix zstd-dll, version of zstd CLI that links to the dynamic library (#3496) by @yoniko
  • build: fix MSVC warnings (#3495) by @embg
  • doc: updated zstd specification to clarify corner cases, by @Cyan4973
  • doc: document how to create fat binaries for macos (#3568) by @rickmark
  • misc: improve seekable format ingestion speed (~+100%) for very small chunk sizes (#3544) by @Cyan4973
  • misc: tests/fullbench can benchmark multiple files (#3516) by @dloidolt

Full change list (auto-generated)

New Contributors

Full Changelog: https://github.com/facebook/zstd/compare/v1.5.4...v1.5.5

zstd - Zstandard v1.5.4

Published by Cyan4973 over 1 year ago

Zstandard v1.5.4 is a pretty big release benefiting from one year of work, spread over > 650 commits. It offers significant performance improvements across multiple scenarios, as well as new features (detailed below). There is a crop of little bug fixes too, a few ones targeting the 32-bit mode are important enough to make this release a recommended upgrade.

Various Speed improvements

This release has accumulated a number of scenario-specific improvements, that cumulatively benefit a good portion of installed base in one way or another.

Among the easier ones to describe, the repository has received several contributions for arm optimizations, notably from @JunHe77 and @danlark1. And @terrelln has improved decompression speed for non-x64 systems, including arm. The combination of this work is visible in the following example, using an M1-Pro (aarch64 architecture) :

cpu function corpus v1.5.2 v1.5.4 Improvement
M1 Pro decompress silesia.tar 1370 MB/s 1480 MB/s + 8%
Galaxy S22 decompress silesia.tar 1150 MB/s 1200 MB/s + 4%

Middle compression levels (5-12) receive some care too, with @terrelln improving the dispatch engine, and @danlark1 offering NEON optimizations. Exact speed up vary depending on platform, cpu, compiler, and compression level, though one can expect gains ranging from +1 to +10% depending on scenarios.

cpu function corpus v1.5.2 v1.5.4 Improvement
i7-9700k compress -6 silesia.tar 110 MB/s 121 MB/s +10%
Galaxy S22 compress -6 silesia.tar 98 MB/s 103 MB/s +5%
M1 Pro compress -6 silesia.tar 122 MB/s 130 MB/s +6.5%
i7-9700k compress -9 silesia.tar 64 MB/s 70 MB/s +9.5%
Galaxy S22 compress -9 silesia.tar 51 MB/s 52 MB/s +1%
M1 Pro compress -9 silesia.tar 77 MB/s 86 MB/s +11.5%
i7-9700k compress -12 silesia.tar 31.6 MB/s 31.8 MB/s +0.5%
Galaxy S22 compress -12 silesia.tar 20.9 MB/s 22.1 MB/s +5%
M1 Pro compress -12 silesia.tar 36.1 MB/s 39.7 MB/s +10%

Speed of the streaming compression interface has been improved by @embg in scenarios involving large files (where size is a multiple of the windowSize parameter). The improvement is mostly perceptible at high speeds (i.e. ~level 1). In the following sample, the measurement is taken directly at ZSTD_compressStream() function call, using a dedicated benchmark tool tests/fullbench.

cpu function corpus v1.5.2 v1.5.4 Improvement
i7-9700k ZSTD_compressStream() -1 silesia.tar 392 MB/s 429 MB/s +9.5%
Galaxy S22 ZSTD_compressStream() -1 silesia.tar 380 MB/s 430 MB/s +13%
M1 Pro ZSTD_compressStream() -1 silesia.tar 476 MB/s 539 MB/s +13%

Finally, dictionary compression speed has received a good boost by @embg. Exact outcome varies depending on system and corpus. The following result is achieved by cutting the enwik8 compression corpus into 1KB blocks, generating a dictionary from these blocks, and then benchmarking the compression speed at level 1.

cpu function corpus v1.5.2 v1.5.4 Improvement
i7-9700k dictionary compress enwik8 -B1K 125 MB/s 165 MB/s +32%
Galaxy S22 dictionary compress enwik8 -B1K 138 MB/s 166 MB/s +20%
M1 Pro dictionary compress enwik8 -B1K 155 MB/s 195 MB/s +25 %

There are a few more scenario-specifics improvements listed in the changelog section below.

I/O Performance improvements

The 1.5.4 release improves IO performance of zstd CLI, by using system buffers (macos) and adding a new asynchronous I/O capability, enabled by default on large files (when threading is available). The user can also explicitly control this capability with the --[no-]asyncio flag . These new threads remove the need to block on IO operations. The impact is mostly noticeable when decompressing large files (>= a few MBs), though exact outcome depends on environment and run conditions.
Decompression speed gets significant gains due to its single-threaded serial nature and the high speeds involved. In some cases we observe up to double performance improvement (local Mac machines) and a wide +15-45% benefit on Intel Linux servers (see table for details).
On the compression side of things, we’ve measured up to 5% improvements. The impact is lower because compression is already partially asynchronous via the internal MT mode (see release v1.3.4).

The following table shows the elapsed run time for decompressions of silesia and enwik8 on several platforms - some Skylake-era Linux servers and an M1 MacbookPro. It compares the time it takes for version v1.5.2 to version v1.5.4 with asyncio on and off.

platform corpus v1.5.2 v1.5.4-no-asyncio v1.5.4 Improvement
Xeon D-2191A CentOS8 enwik8 280 MB/s 280 MB/s 324 MB/s +16%
Xeon D-2191A CentOS8 silesia.tar 303 MB/s 302 MB/s 386 MB/s +27%
i7-1165g7 win10 enwik8 270 MB/s 280 MB/s 350 MB/s +27%
i7-1165g7 win10 silesia.tar 450 MB/s 440 MB/s 580 MB/s +28%
i7-9700K Ubuntu20 enwik8 600 MB/s 604 MB/s 829 MB/s +38%
i7-9700K Ubuntu20 silesia.tar 683 MB/s 678 MB/s 991 MB/s +45%
Galaxy S22 enwik8 360 MB/s 420 MB/s 515 MB/s +70%
Galaxy S22 silesia.tar 310 MB/s 320 MB/s 580 MB/s +85%
MBP M1 enwik8 428 MB/s 734 MB/s 815 MB/s +90%
MBP M1 silesia.tar 465 MB/s 875 MB/s 1001 MB/s +115%

Support of externally-defined sequence producers

libzstd can now support external sequence producers via a new advanced registration function ZSTD_registerSequenceProducer() (#3333).
This API allows users to provide their own custom sequence producer which libzstd invokes to process each block. The produced list of sequences (literals and matches) is then post-processed by libzstd to produce valid compressed blocks.

This block-level offload API is a more granular complement of the existing frame-level offload API compressSequences() (introduced in v1.5.1). It offers an easier migration story for applications already integrated with libzstd: the user application continues to invoke the same compression functions ZSTD_compress2() or ZSTD_compressStream2() as usual, and transparently benefits from the specific properties of the external sequence producer. For example, the sequence producer could be tuned to take advantage of known characteristics of the input, to offer better speed / ratio.

One scenario that becomes possible is to combine this capability with hardware-accelerated matchfinders, such as the Intel® QuickAssist accelerator (Intel® QAT) provided in server CPUs such as the 4th Gen Intel® Xeon® Scalable processors (previously codenamed Sapphire Rapids). More details to be provided in future communications.

Change Log

perf: +20% faster huffman decompression for targets that can't compile x64 assembly (#3449, @terrelln)
perf: up to +10% faster streaming compression at levels 1-2 (#3114, @embg)
perf: +4-13% for levels 5-12 by optimizing function generation (#3295, @terrelln)
pref: +3-11% compression speed for arm target (#3199, #3164, #3145, #3141, #3138, @JunHe77 and #3139, #3160, @danlark1)
perf: +5-30% faster dictionary compression at levels 1-4 (#3086, #3114, #3152, @embg)
perf: +10-20% cold dict compression speed by prefetching CDict tables (#3177, @embg)
perf: +1% faster compression by removing a branch in ZSTD_fast_noDict (#3129, @felixhandte)
perf: Small compression ratio improvements in high compression mode (#2983, #3391, @Cyan4973 and #3285, #3302, @daniellerozenblit)
perf: small speed improvement by better detecting STATIC_BMI2 for clang (#3080, @TocarIP)
perf: Improved streaming performance when ZSTD_c_stableInBuffer is set (#2974, @Cyan4973)
cli: Asynchronous I/O for improved cli speed (#2975, #2985, #3021, #3022, @yoniko)
cli: Change zstdless behavior to align with zless (#2909, @binhdvo)
cli: Keep original file if -c or --stdout is given (#3052, @dirkmueller)
cli: Keep original files when result is concatenated into a single output with -o (#3450, @Cyan4973)
cli: Preserve Permissions and Ownership of regular files (#3432, @felixhandte)
cli: Print zlib/lz4/lzma library versions with -vv (#3030, @terrelln)
cli: Print checksum value for single frame files with -lv (#3332, @Cyan4973)
cli: Print dictID when present with -lv (#3184, @htnhan)
cli: when stderr is not the console, disable status updates, but preserve final summary (#3458, @Cyan4973)
cli: support --best and --no-name in gzip compatibility mode (#3059, @dirkmueller)
cli: support for posix high resolution timer clock_gettime(), for improved benchmark accuracy (#3423, @Cyan4973)
cli: improved help/usage (-h, -H) formatting (#3094, @dirkmueller and #3385, @jonpalmisc)
cli: Fix better handling of bogus numeric values (#3268, @ctkhanhly)
cli: Fix input consists of multiple files and stdin (#3222, @yoniko)
cli: Fix tiny files passthrough (#3215, @cgbur)
cli: Fix for -r on empty directory (#3027, @brailovich)
cli: Fix empty string as argument for --output-dir-* (#3220, @embg)
cli: Fix decompression memory usage reported by -vv --long (#3042, @u1f35c, and #3232, @zengyijing)
cli: Fix infinite loop when empty input is passed to trainer (#3081, @terrelln)
cli: Fix --adapt doesn't work when --no-progress is also set (#3354, @terrelln)
api: Support for External Sequence Producer (#3333, @embg)
api: Support for in-place decompression (#3432, @terrelln)
api: New ZSTD_CCtx_setCParams() function, set all parameters defined in a ZSTD_compressionParameters structure (#3403, @Cyan4973)
api: Streaming decompression detects incorrect header ID sooner (#3175, @Cyan4973)
api: Window size resizing optimization for edge case (#3345, @daniellerozenblit)
api: More accurate error codes for busy-loop scenarios (#3413, #3455, @Cyan4973)
api: Fix limit overflow in compressBound and decompressBound (#3362, #3373, Cyan4973) reported by @nigeltao
api: Deprecate several advanced experimental functions: streaming (#3408, @embg), copy (#3196, @mileshu)
bug: Fix corruption that rarely occurs in 32-bit mode with wlog=25 (#3361, @terrelln)
bug: Fix for block-splitter (#3033, @Cyan4973)
bug: Fixes for Sequence Compression API (#3023, #3040, @Cyan4973)
bug: Fix leaking thread handles on Windows (#3147, @animalize)
bug: Fix timing issues with cmake/meson builds (#3166, #3167, #3170, @Cyan4973)
build: Allow user to select legacy level for cmake (#3050, @shadchin)
build: Enable legacy support by default in cmake (#3079, @niamster)
build: Meson build script improvements (#3039, #3120, #3122, #3327, #3357, @eli-schwartz and #3276, @neheb)
build: Add aarch64 to supported architectures for zstd_trace (#3054, @ooosssososos)
build: support AIX architecture (#3219, @qiongsiwu)
build: Fix ZSTD_LIB_MINIFY build macro, which now reduces static library size by half (#3366, @terrelln)
build: Fix Windows issues with Multithreading translation layer (#3364, #3380, @yoniko) and ARM64 target (#3320, @cwoffenden)
build: Fix cmake script (#3382, #3392, @terrelln and #3252 @Tachi107 and #3167 @Cyan4973)
doc: Updated man page, providing more details for --train mode (#3112, @Cyan4973)
doc: Add decompressor errata document (#3092, @terrelln)
misc: Enable Intel CET (#2992, #2994, @hjl-tools)
misc: Fix contrib/ seekable format (#3058, @yhoogstrate and #3346, @daniellerozenblit)
misc: Improve speed of the one-file library generator (#3241, @wahern and #3005, @cwoffenden)

PR list (generated by Github)

New Contributors

Full Automated Changelog: https://github.com/facebook/zstd/compare/v1.5.2...v1.5.4

zstd - Zstandard v1.5.2

Published by felixhandte over 2 years ago

Zstandard v1.5.2 is a bug-fix release, addressing issues that were raised with the v1.5.1 release.

In particular, as a side-effect of the inclusion of assembly code in our source tree, binary artifacts were being marked as needing an executable stack on non-amd64 architectures. This release corrects that issue. More context is available in #2963.

This release also corrects a performance regression that was introduced in v1.5.0 that slows down compression of very small data when using the streaming API. Issue #2966 tracks that topic.

In addition there are a number of smaller improvements and fixes.

Full Changelist

New Contributors

Full Changelog: https://github.com/facebook/zstd/compare/v1.5.1...v1.5.2

zstd - Zstandard v1.5.1

Published by Cyan4973 almost 3 years ago

Notice : it has been brought to our attention that the v1.5.1 library might be built with an executable stack on non-x64 architectures, which could end up being flagged as problematic by some systems with thorough security settings which disallow executable stack. We are currently reviewing the issue. Be aware of it if you build libzstd for non-x64 architecture.

Zstandard v1.5.1 is a maintenance release, bringing a good number of small refinements to the project. It also offers a welcome crop of performance improvements, as detailed below.

Performance Improvements

Speed improvements for fast compression (levels 1–4)

PRs #2749, #2774, and #2921 refactor single-segment compression for ZSTD_fast and ZSTD_dfast, which back compression levels 1 through 4 (as well as the negative compression levels). Speedups in the ~3-5% range are observed. In addition, the compression ratio of ZSTD_dfast (levels 3 and 4) is slightly improved.

Rebalanced middle compression levels

v1.5.0 introduced major speed improvements for mid-level compression (from 5 to 12), while preserving roughly similar compression ratio. As a consequence, the speed scale became tilted towards faster speed. Unfortunately, the difference between successive levels was no longer regular, and there is a large performance gap just after the impacted range, between levels 12 and 13.

v1.5.1 tries to rebalance parameters so that compression levels can be roughly associated to their former speed budget. Consequently, v1.5.1 mid compression levels feature speeds closer to former v1.4.9 (though still sensibly faster) and receive in exchange an improved compression ratio, as shown in below graph.

comparing v1.4.9 vs v1.5.0 vs 1.5.1on x64 (i7-9700k)

comparing v1.4.9 vs v1.5.0 vs 1.5.1 on arm64 (snapdragon 855)

Note that, since middle levels only experience a rebalancing, save some special cases, no significant performance differences between versions v1.5.0 and v1.5.1 should be expected: levels merely occupy different positions on the same curve. The situation is a bit different for fast levels (1-4), for which v1.5.1 delivers a small but consistent performance benefit on all platforms, as described in previous paragraph.

Huffman Improvements

Our Huffman code was significantly revamped in this release. Both encoding and decoding speed were improved. Additionally, encoding speed for small inputs was improved even further. Speed is measured on the Silesia corpus by compressing with level 1 and extracting the literals left over after compression. Then compressing and decompressing the literals from each block. Measurements are done on an Intel i9-9900K @ 3.6 GHz.

Compiler Scenario v1.5.0 Speed v1.5.1 Speed Delta
gcc-11 Literal compression - 128KB block 748 MB/s 927 MB/s +23.9%
clang-13 Literal compression - 128KB block 810 MB/s 927 MB/s +14.4%
gcc-11 Literal compression - 4KB block 223 MB/s 321 MB/s +44.0%
clang-13 Literal compression - 4KB block 224 MB/s 310 MB/s +38.2%
gcc-11 Literal decompression - 128KB block 1164 MB/s 1500 MB/s +28.8%
clang-13 Literal decompression - 128KB block 1006 MB/s 1504 MB/s +49.5%

Overall impact on (de)compression speed depends on the compressibility of the data. Compression speed improves from 1-4%, and decompression speed improves from 5-15%.

PR #2722 implements the Huffman decoder in assembly for x86-64 with BMI2 enabled. We detect BMI2 support at runtime, so this speedup applies to all x86-64 builds running on CPUs that support BMI2. This improves Huffman decoding speed by about 40%, depending on the scenario. PR #2733 improves Huffman encoding speed by 10% for clang and 20% for gcc. PR #2732 drastically speeds up the HUF_sort() function, which speeds up Huffman tree building for compression. This is a significant speed boost for small inputs, measuring in at a 40% improvement for 4K inputs.

Binary Size and Build Speed

zstd binary size grew significantly in v1.5.0 due to the new code added for middle compression level speed optimizations. In this release we recover the binary size, and in the process also significantly speed up builds, especially with sanitizers enabled.

Measured on x86-64 compiled with -O3 we measure libzstd.a size. We regained 161 KB of binary size on gcc, and 293 KB of binary size on clang. Note that these binary sizes are listed for the whole library, optimized for speed over size. The decoder only, with size saving options enabled, and compiled with -Os or -Oz can be much smaller.

Version gcc-11 size clang-13 size
v1.5.1 1177 KB 1167 KB
v1.5.0 1338 KB 1460 KB
v1.4.9 1137 KB 1151 KB

Change log

Featured user-visible changes

  • perf: rebalanced compression levels, to better match intended speed/level curve, by @senhuang42 and @cyan4973
  • perf: faster huffman decoder, using x64 assembly, by @terrelln
  • perf: slightly faster high speed modes (strategies fast & dfast), by @felixhandte
  • perf: smaller binary size and faster compilation times, by @terrelln and @nolange
  • perf: new row64 mode, used notably at highest lazy2 levels 11-12, by @senhuang42
  • perf: faster mid-level compression speed in presence of highly repetitive patterns, by @senhuang42
  • perf: minor compression ratio improvements for small data at high levels, by @cyan4973
  • perf: reduced stack usage (mostly useful for Linux Kernel), by @terrelln
  • perf: faster compression speed on incompressible data, by @bindhvo
  • perf: on-demand reduced ZSTD_DCtx state size, using build macro ZSTD_DECODER_INTERNAL_BUFFER, at a small cost of performance, by @bindhvo
  • build: allows hiding static symbols in the dynamic library, using build macro, by @skitt
  • build: support for m68k (Motorola 68000's), by @cyan4973
  • build: improved AIX support, by @Helflym
  • build: improved meson unofficial build, by @eli-schwartz
  • cli : fix : forward mtime to output file, by @felixhandte
  • cli : custom memory limit when training dictionary (#2925), by @embg
  • cli : report advanced parameters information when compressing in very verbose mode (-vv), by @Svetlitski-FB
  • cli : advanced commands in the form --long-param= can accept negative value arguments, by @binhdvo

PR full list

New Contributors

Full Changelog: https://github.com/facebook/zstd/compare/v1.5.0...v1.5.1

zstd - Zstandard v1.5.0

Published by senhuang42 over 3 years ago

v1.5.0 is a major release featuring large performance improvements as well as API changes.

Performance

Improved Middle-Level Compression Speed

1.5.0 introduces a new default match finder for the compression strategies greedy, lazy, and lazy2, (which map to levels 5-12 for inputs larger than 256K). The optimization brings a massive improvement in compression speed with slight perturbations in compression ratio (< 0.5%) and equal or decreased memory usage.

Benchmarked with gcc, on an i9-9900K:

level silesia.tar speed delta enwik7 speed delta
5 +25% +25%
6 +50% +50%
7 +40% +40%
8 +40% +50%
9 +50% +65%
10 +65% +80%
11 +85% +105%
12 +110% +140%

On heavily loaded machines with significant cache contention, we have internally measured even larger gains: 2-3x+ speed at levels 5-7. 🚀

The biggest gains are achieved on files typically larger than 128KB. On files smaller than 16KB, by default we revert back to the legacy match finder which becomes the faster one. This default policy can be overriden manually: the new match finder can be forcibly enabled with the advanced parameter ZSTD_c_useRowMatchFinder, or through the CLI option --[no-]row-match-finder.

Note: only CPUs that support SSE2 realize the full extent of this improvement.

Improved High-Level Compression Ratio

Improving compression ratio via block splitting is now enabled by default for high compression levels (16+). The amount of benefit varies depending on the workload. Compressing archives comprised of heavily differing files will see more improvement than compression of single files that don’t vary much entropically (like text files/enwik). At levels 16+, we observe no measurable regression to compression speed.

level 22 compression

file ratio 1.4.9 ratio 1.5.0 ratio % delta
silesia.tar 4.021 4.041 +0.49%
calgary.tar 3.646 3.672 +0.71%
enwik7 3.579 3.579 +0.0%

The block splitter can be forcibly enabled on lower compression levels as well with the advanced parameter ZSTD_c_splitBlocks. When forcibly enabled at lower levels, speed regressions can become more notable. Additionally, since more compressed blocks may be produced, decompression speed on these blobs may also see small regressions.

Faster Decompression Speed

The decompression speed of data compressed with large window settings (such as --long or --ultra) has been significantly improved in this version. The gains vary depending on compiler brand and version, with clang generally benefiting the most.

The following benchmark was measured by compressing enwik9 at level --ultra -22 (with a 128 MB window size) on a core i7-9700K.

Compiler version D. Speed improvement
gcc-7 +15%
gcc-8 +10 %
gcc-9 +5%
gcc-10 +1%
clang-6 +21%
clang-7 +16%
clang-8 +16%
clang-9 +18%
clang-10 +16%
clang-11 +15%

Average decompression speed for “normal” payload is slightly improved too, though the impact is less impressive. Once again, mileage varies depending on exact compiler version, payload, and even compression level. In general, a majority of scenarios see benefits ranging from +1 to +9%. There are also a few outliers here and there, from -4% to +13%. The average gain across all these scenarios stands at ~+4%.

Library Updates

Dynamic Library Supports Multithreading by Default

It was already possible to compile libzstd with multithreading support. But it was an active operation. By default, the make build script would build libzstd as a single-thread-only library.

This changes in v1.5.0.
Now the dynamic library (typically libzstd.so.1 on Linux) supports multi-threaded compression by default.
Note that this property is not extended to the static library (typically libzstd.a on Linux) because doing so would have impacted the build script of existing client applications (requiring them to add -pthread to their recipe), thus potentially breaking their build. In order to avoid this disruption, the static library remains single-threaded by default.
Luckily, this build disruption does not extend to the dynamic library, which can be built with multi-threading support while existing applications linking to libzstd.so and expecting only single-thread capabilities will be none the wiser, and remain completely unaffected.

The idea is that starting from v1.5.0, applications can expect the dynamic library to support multi-threading should they need it, which will progressively lead to increased adoption of this capability overtime.
That being said, since the locally deployed dynamic library may, or may not, support multi-threading compression, depending on local build configuration, it’s always better to check this capability at runtime. For this goal, it’s enough to check the return value when changing parameter ZSTD_c_nbWorkers , and if it results in an error, then multi-threading is not supported.

Q: What if I prefer to keep the libraries in single-thread mode only ?
The target make lib-nomt will ensure this outcome.

Q: Actually, I want both static and dynamic library versions to support multi-threading !
The target make lib-mt will generate this outcome.

Promotions to Stable

Moving up to the higher digit 1.5 signals an opportunity to extend the stable portion of zstd public API.
This update is relatively minor, featuring only a few non-controversial newcomers.

ZSTD_defaultCLevel() indicates which level is default (applied when selecting level 0). It completes existing
ZSTD_minCLevel() and ZSTD_maxCLevel().
Similarly, ZSTD_getDictID_fromCDict() is a straightforward equivalent to already promoted ZSTD_getDictID_fromDDict().

Deprecations

Zstd-1.4.0 stabilized a new advanced API which allows users to pass advanced parameters to zstd. We’re now deprecating all the old experimental APIs that are subsumed by the new advanced API. They will be considered for removal in the next Zstd major release zstd-1.6.0. Note that only experimental symbols are impacted. Stable functions, like ZSTD_initCStream(), remain fully supported.

The deprecated functions are listed below, together with the migration. All the suggested migrations are stable APIs, meaning that once you migrate, the API will be supported forever. See the documentation for the deprecated functions for more details on how to migrate.

Header File Locations

Zstd has slightly re-organized the library layout to move all public headers to the top level lib/ directory. This is for consistency, so all public headers are in lib/ and all private headers are in a sub-directory. If you build zstd from source, this may affect your build system.

  • lib/common/zstd_errors.h has moved to lib/zstd_errors.h.
  • lib/dictBuilder/zdict.h has moved to lib/zdict.h.

Single-File Library

We have moved the scripts in contrib/single_file_libs to build/single_file_libs. These scripts, originally contributed by @cwoffenden, produce a single compilation-unit amalgamation of the zstd library, which can be convenient for integrating Zstandard into other source trees. This move reflects a commitment on our part to support this tool and this pattern of using zstd going forward.

Windows Release Artifact Format

We are slightly changing the format of the Windows release .zip files, to match our other release artifacts. The .zip files now bundle everything in a single folder whose name matches the archive name. The contents of that folder exactly match what was previously included in the root of the archive.

Signed Releases

We have created a signing key for the Zstandard project. This release and all future releases will be signed by this key. See #2520 for discussion.

Changelog

  • api: Various functions promoted from experimental to stable API: (#2579-#2581, @senhuang42)
    • ZSTD_defaultCLevel()
    • ZSTD_getDictID_fromCDict()
  • api: Several experimental functions have been deprecated and will emit a compiler warning (#2582, @senhuang42)
    • ZSTD_compress_advanced()
    • ZSTD_compress_usingCDict_advanced()
    • ZSTD_compressBegin_advanced()
    • ZSTD_compressBegin_usingCDict_advanced()
    • ZSTD_initCStream_srcSize()
    • ZSTD_initCStream_usingDict()
    • ZSTD_initCStream_usingCDict()
    • ZSTD_initCStream_advanced()
    • ZSTD_initCStream_usingCDict_advanced()
    • ZSTD_resetCStream()
  • api: ZSTDMT_NBWORKERS_MAX reduced to 64 for 32-bit environments (#2643, @Cyan4973)
  • perf: Significant speed improvements for middle compression levels (#2494, @senhuang42 & @terrelln)
  • perf: Block splitter to improve compression ratio, enabled by default for high compression levels (#2447, @senhuang42)
  • perf: Decompression loop refactor, speed improvements on clang and for --long modes (#2614 #2630, @Cyan4973)
  • perf: Reduced stack usage during compression and decompression entropy stage (#2522 #2524, @terrelln)
  • bug: Make the number of physical CPU cores detection more robust (#2517, @PaulBone)
  • bug: Improve setting permissions of created files (#2525, @felixhandte)
  • bug: Fix large dictionary non-determinism (#2607, @terrelln)
  • bug: Fix various dedicated dictionary search bugs (#2540 #2586, @senhuang42 @felixhandte)
  • bug: Fix non-determinism test failures on Linux i686 (#2606, @terrelln)
  • bug: Fix UBSAN error in decompression (#2625, @terrelln)
  • bug: Fix superblock compression divide by zero bug (#2592, @senhuang42)
  • bug: Ensure ZSTD_estimateCCtxSize*() monotonically increases with compression level (#2538, @senhuang42)
  • doc: Improve zdict.h dictionary training API documentation (#2622, @terrelln)
  • doc: Note that public ZSTD_free*() functions accept NULL pointers (#2521, @animalize)
  • doc: Add style guide docs for open source contributors (#2626, @Cyan4973)
  • tests: Better regression test coverage for different dictionary modes (#2559, @senhuang42)
  • tests: Better test coverage of index reduction (#2603, @terrelln)
  • tests: OSS-Fuzz coverage for seekable format (#2617, @senhuang42)
  • tests: Test coverage for ZSTD threadpool API (#2604, @senhuang42)
  • build: Dynamic library built multithreaded by default (#2584, @senhuang42)
  • build: Move zstd_errors.h and zdict.h to lib/ root (#2597, @terrelln)
  • build: Single file library build script moved to build/ directory (#2618, @felixhandte)
  • build: Allow ZSTDMT_JOBSIZE_MIN to be configured at compile-time, reduce default to 512KB (#2611, @Cyan4973)
  • build: Fixed Meson build (#2548, @SupervisedThinking & @kloczek)
  • build: ZBUFF_*() is no longer built by default (#2583, @senhuang42)
  • build: Fix excessive compiler warnings with clang-cl and CMake (#2600, @nickhutchinson)
  • build: Detect presence of md5 on Darwin (#2609, @felixhandte)
  • build: Avoid SIGBUS on armv6 (#2633, @bmwiedmann)
  • cli: --progress flag added to always display progress bar (#2595, @senhuang42)
  • cli: Allow reading from block devices with --force (#2613, @felixhandte)
  • cli: Fix CLI filesize display bug (#2550, @Cyan4973)
  • cli: Fix windows CLI --filelist end-of-line bug (#2620, @Cyan4973)
  • contrib: Various fixes for linux kernel patch (#2539, @terrelln)
  • contrib: Seekable format - Decompression hanging edge case fix (#2516, @senhuang42)
  • contrib: Seekable format - New seek table-only API (#2113 #2518, @mdittmer @Cyan4973)
  • contrib: Seekable format - Fix seek table descriptor check when loading (#2534, @foxeng)
  • contrib: Seekable format - Decompression fix for large offsets, (#2594, @azat)
  • misc: Automatically published release tarballs available on Github (#2535, @felixhandte)
zstd - Zstandard v1.4.9

Published by felixhandte over 3 years ago

This is an incremental release which includes various improvements and bug-fixes.

>2x Faster Long Distance Mode

Long Distance Mode (LDM) --long just got a whole lot faster thanks to optimizations by @mpu in #2483! These optimizations preserve the compression ratio but drastically speed up compression. It is especially noticeable in multithreaded mode, because the long distance match finder is not parallelized. Benchmarking with zstd -T0 -1 --long=31 on an Intel I9-9900K at 3.2 GHz we see:

File v1.4.8 MB/s v1.4.9 MB/s Improvement
silesia.tar 308 692 125%
linux-versions* 312 667 114%
enwik9 294 747 154%

* linux-versions is a concatenation of the linux 4.0, 5.0, and 5.10 git archives.

New Experimental Decompression Feature: ZSTD_d_refMultipleDDicts

If the advanced parameter ZSTD_d_refMultipleDDicts is enabled, then multiple calls to ZSTD_refDDict() will be honored in the corresponding DCtx. Example usage:

ZSTD_DCtx* dctx = ZSTD_createDCtx();
ZSTD_DCtx_setParameter(dctx, ZSTD_d_refMultipleDDicts, ZSTD_rmd_refMultipleDDicts);
ZSTD_DCtx_refDDict(dctx, ddict1);
ZSTD_DCtx_refDDict(dctx, ddict2);
ZSTD_DCtx_refDDict(dctx, ddict3);
...
ZSTD_decompress...

Decompression of multiple frames, each with their own dictID, is now possible with a single ZSTD_decompress call. As long as the dictID from each frame header references one of the dictIDs within the DCtx, then the corresponding dictionary will be used to decompress that particular frame. Note that this feature is disabled with a statically-allocated DCtx.

Changelog

  • bug: Use umask() to Constrain Created File Permissions (#2495, @felixhandte)
  • bug: Make Simple Single-Pass Functions Ignore Advanced Parameters (#2498, @terrelln)
  • api: Add (De)Compression Tracing Functionality (#2482, @terrelln)
  • api: Support References to Multiple DDicts (#2446, @senhuang42)
  • api: Add Function to Generate Skippable Frame (#2439, @senhuang42)
  • perf: New Algorithms for the Long Distance Matcher (#2483, @mpu)
  • perf: Performance Improvements for Long Distance Matcher (#2464, @mpu)
  • perf: Don't Shrink Window Log when Streaming with a Dictionary (#2451, @terrelln)
  • cli: Fix --output-dir-mirror's Rejection of ..-Containing Paths (#2512, @felixhandte)
  • cli: Allow Input From Console When -f/--force is Passed (#2466, @felixhandte)
  • cli: Improve Help Message (#2500, @senhuang42)
  • tests: Avoid Using stat -c on NetBSD (#2513, @felixhandte)
  • tests: Correctly Invoke md5 Utility on NetBSD (#2492, @niacat)
  • tests: Remove Flaky Tests (#2455, #2486, #2445, @Cyan4973)
  • build: Zstd CLI Can Now be Linked to Dynamic libzstd (#2457, #2454 @Cyan4973)
  • build: Avoid Using Static-Only Symbols (#2504, @skitt)
  • build: Fix Fuzzer Compiler Detection & Update UBSAN Flags (#2503, @terrelln)
  • build: Explicitly Hide Static Symbols (#2501, @skitt)
  • build: CMake: Enable Only C for lib/ and programs/ Projects (#2498, @concatime)
  • build: CMake: Use configure_file() to Create the .pc File (#2462, @lazka)
  • build: Add Guards for _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE (#2444, @indygreg)
  • build: Improve zlibwrapper Makefile (#2437, @Cyan4973)
  • contrib: Add recover_directory Program (#2473, @terrelln)
  • doc: Change License Year to 2021 (#2452 & #2465, @terrelln & @senhuang42)
  • doc: Fix Typos (#2459, @ThomasWaldmann)
zstd - Zstandard v1.4.8 - hotfix

Published by Cyan4973 almost 4 years ago

This is a minor hotfix for v1.4.7,
where an internal buffer unalignment bug was detected by @bmwiedemann .
The issue is of no consequence for x64 and arm64 targets,
but could become a problem for cpus relying on strict alignment, such as mips or older arm designs.
Additionally, some targets, like 32-bit x86 cpus, do not care much about alignment, but the code does, and will detect the misalignment and return an error code. Some other less common platforms, such as s390x, also seem to trigger the same issue.

While it's a minor fix, this update is nonetheless recommended.

zstd - Zstandard v1.4.7

Published by Cyan4973 almost 4 years ago

Note : this version features a minor bug, which can be present on systems others than x64 and arm64. Update v1.4.8 is recommended for all other platforms.

v1.4.7 unleashes several months of improvements across many axis, from performance to various fixes, to new capabilities, of which a few are highlighted below. It’s a recommended upgrade.

(Note: if you ever wondered what happened to v1.4.6, it’s an internal release number reserved for synchronization with Linux Kernel)

Improved --long mode

--long mode makes it possible to analyze vast quantities of data in reasonable time and memory budget. The --long mode algorithm runs on top of the regular match finder, and both contribute to the final compressed outcome.
However, the fact that these 2 stages were working independently resulted in minor discrepancies at highest compression levels, where the cost of each decision must be carefully monitored. For this reason, in situations where the input is not a good fit for --long mode (no large repetition at long distance), enabling it could reduce compression performance, even if by very little, compared to not enabling it (at high compression levels). This situation made it more difficult to "just always enable" the --long mode by default.
This is fixed in this version. For compression levels 16 and up, usage of --long will now never regress compared to compression without --long. This property made it possible to ramp up --long mode contribution to the compression mix, improving its effectiveness.

The compression ratio improvements are most notable when --long mode is actually useful. In particular, --patch-from (which implicitly relies on --long) shows excellent gains from the improvements. We present some brief results here (tested on Macbook Pro 16“, i9).

long_v145_v147

Since --long mode is now always beneficial at high compression levels, it’s now automatically enabled for any window size >= 128MB and up.

Faster decompression of small blocks

This release includes optimizations that significantly speed up decompression of small blocks and small data. The decompression speed gains will vary based on the block size according to the table below:

Block Size Decompression Speed Improvement
1 KB ~+30%
2 KB ~+30%
4 KB ~+25%
8 KB ~+15%
16 KB ~+10%
32 KB ~+5%

These optimizations come from improving the process of reading the block header, and building the Huffman and FSE decoding tables. zstd’s default block size is 128 KB, and at this block size the time spent decompressing the data dominates the time spent reading the block header and building the decoding tables. But, as blocks become smaller, the cost of reading the block header and building decoding tables becomes more prominent.

CLI improvements

The CLI received several noticeable upgrades with this version.
To begin with, zstd can accept a new parameter through environment variable, ZSTD_NBTHREADS . It’s useful when zstd is called behind an application (tar, or a python script for example). Also, users which prefer multithreaded compression by default can now set a desired nb of threads with their environment. This setting can still be overridden on demand via command line.
A new command --output-dir-mirror makes it possible to compress a directory containing subdirectories (typically with -r command) producing one compressed file per source file, and reproduce the arborescence into a selected destination directory.
There are other various improvements, such as more accurate warning and error messages, full equivalence between conventions --long-command=FILE and --long-command FILE, fixed confusion risks between stdin and user prompt, or between console output and status message, as well as a new short execution summary when processing multiple files, cumulatively contributing to a nicer command line experience.

New experimental features

Shared Thread Pool

By default, each compression context can be set to use a maximum nb of threads.
In complex scenarios, there might be multiple compression contexts, working in parallel, and each using some nb of threads. In such cases, it might be desirable to control the total nb of threads used by all these compression contexts altogether.

This is now possible, by making all these compression contexts share the same threadpool. This capability is expressed thanks to a new advanced compression parameter, ZSTD_CCtx_refThreadPool(), contributed by @marxin. See its documentation for more details.

Faster Dictionary Compression

This release introduces a new experimental dictionary compression algorithm, applicable to mid-range compression levels, employing strategies such as ZSTD_greedy, ZSTD_lazy, and ZSTD_lazy2. This new algorithm can be triggered by selecting the compression parameter ZSTD_c_enableDedicatedDictSearch during ZSTD_CDict creation (experimental section).

Benchmarks show the new algorithm providing significant compression speed gains :

Level Hot Dict Cold Dict
5 ~+17% ~+30%
6 ~+12% ~+45%
7 ~+13% ~+40%
8 ~+16% ~+50%
9 ~+19% ~+65%
10 ~+24% ~+70%

We hope it will help making mid-levels compression more attractive for dictionary scenarios. See the documentation for more details. Feedback is welcome!

New Sequence Ingestion API

We introduce a new entry point, ZSTD_compressSequences(), which makes it possible for users to define their own sequences, by whatever mechanism they prefer, and present them to this new entry point, which will generate a single zstd-compressed frame, based on provided sequences.

So for example, users can now feed to the function an array of externally generated ZSTD_Sequence:
[(offset: 5, matchLength: 4, litLength: 10), (offset: 7, matchLength: 6, litLength: 3), ...] and the function will output a zstd compressed frame based on these sequences.

This experimental API has currently several limitations (and its relevant params exist in the “experimental” section). Notably, this API currently ignores any repeat offsets provided, instead always recalculating them on the fly. Additionally, there is no way to forcibly specify existence of certain zstd features, such as RLE or raw blocks.
If you are interested in this new entry point, please refer to zstd.h for more detailed usage instructions.

Changelog

There are many other features and improvements in this release, and since we can’t highlight them all, they are listed below:

  • perf: stronger --long mode at high compression levels, by @senhuang42
  • perf: stronger --patch-from at high compression levels, thanks to --long improvements
  • perf: faster decompression speed for small blocks, by @terrelln
  • perf: faster dictionary compression at medium compression levels, by @felixhandte
  • perf: small speed & memory usage improvements for ZSTD_compress2(), by @terrelln
  • perf: minor generic decompression speed improvements, by @helloguo
  • perf: improved fast compression speeds with Visual Studio, by @animalize
  • cli : Set nb of threads with environment variable ZSTD_NBTHREADS, by @senhuang42
  • cli : new --output-dir-mirror DIR command, by @xxie24 (#2219)
  • cli : accept decompressing files with *.zstd suffix
  • cli : --patch-from can compress stdin when used with --stream-size, by @bimbashrestha (#2206)
  • cli : provide a condensed summary by default when processing multiple files
  • cli : fix : stdin input can no longer be confused with user prompt
  • cli : fix : console output no longer mixes stdout and status messages
  • cli : improve accuracy of several error messages
  • api : new sequence ingestion API, by @senhuang42
  • api : shared thread pool: control total nb of threads used by multiple compression jobs, by @marxin
  • api : new ZSTD_getDictID_fromCDict(), by @LuAPi
  • api : zlibWrapper only uses public API, and is compatible with dynamic library, by @terrelln
  • api : fix : multithreaded compression has predictable output even in special cases (see #2327) (issue not present on cli)
  • api : fix : dictionary compression correctly respects dictionary compression level (see #2303) (issue not present on cli)
  • api : fix : return dstSize_tooSmall error whenever appropriate
  • api : fix : ZSTD_initCStream_advanced() with static allocation and no dictionary
  • build: fix cmake script when employing path including spaces, by @terrelln
  • build: new ZSTD_NO_INTRINSICS macro to avoid explicit intrinsics
  • build: new STATIC_BMI2 macro for compile time detection of BMI2 on MSVC, by @Niadb (#2258)
  • build: improved compile-time detection of aarch64/neon platforms, by @bsdimp
  • build: Fix building on AIX 5.1, by @likema
  • build: compile paramgrill with cmake on Windows, requested by @mirh
  • build: install pkg-config file with CMake and MinGW, by @tonytheodore (#2183)
  • build: Install DLL with CMake on Windows, by @BioDataAnalysis (#2221)
  • build: fix : cli compilation with uclibc
  • misc: Improve single file library and include dictBuilder, by @cwoffenden
  • misc: Fix single file library compilation with Emscripten, by @yoshihitoh (#2227)
  • misc: Add freestanding translation script in contrib/freestanding_lib, by @terrelln
  • doc : clarify repcode updates in format specification, by @felixhandte
zstd - Zstandard v1.4.5

Published by Cyan4973 over 4 years ago

Zstd v1.4.5 Release Notes

This is a fairly important release which includes performance improvements and new major CLI features. It also fixes a few corner cases, making it a recommended upgrade.

Faster Decompression Speed

Decompression speed has been improved again, thanks to great contributions from @terrelln.
As usual, exact mileage varies depending on files and compilers.
For x64 cpus, expect a speed bump of at least +5%, and up to +10% in favorable cases.
ARM cpus receive more benefit, with speed improvements ranging from +15% vicinity, and up to +50% for certain SoCs and scenarios (ARM‘s situation is more complex due to larger differences in SoC designs).

For illustration, some benchmarks run on a modern x64 platform using zstd -b compiled with gcc v9.3.0 :

v1.4.4 v1.4.5
silesia.tar 1568 MB/s 1653 MB/s
--- --- ---
enwik8 1374 MB/s 1469 MB/s
calgary.tar 1511 MB/s 1610 MB/s

Same platform, using clang v10.0.0 compiler :

v1.4.4 v1.4.5
silesia.tar 1439 MB/s 1496 MB/s
--- --- ---
enwik8 1232 MB/s 1335 MB/s
calgary.tar 1361 MB/s 1457 MB/s

Simplified integration

Presuming a project needs to integrate libzstd's source code (as opposed to linking a pre-compiled library), the /lib source directory can be copy/pasted into target project. Then the local build system must setup a few include directories. Some setups are automatically provided in prepared build scripts, such as Makefile, but any other 3rd party build system must do it on its own.
This integration is now simplified, thanks to @felixhandte, by making all dependencies within /lib relative, meaning it’s only necessary to setup include directories for the *.h header files that are directly included into target project (typically zstd.h). Even that task can be circumvented by copy/pasting the *.h into already established include directories.

Alternatively, if you are a fan of one-file integration strategy, @cwoffenden has extended his one-file decoder script into a full feature one-file compression library. The script create_single_file_library.sh will generate a file zstd.c, which contains all selected elements from the library (by default, compression and decompression). It’s then enough to import just zstd.h and the generated zstd.c into target project to access all included capabilities.

--patch-from

Zstandard CLI is introducing a new command line option --patch-from, which leverages existing compressors, dictionaries and long range match finder to deliver a high speed engine for producing and applying patches to files.

--patch-from is based on dictionary compression. It will consider a previous version of a file as a dictionary, to better compress a new version of same file. This operation preserves fast zstd speeds at lower compression levels. To this ends, it also increases the previous maximum limit for dictionaries from 32 MB to 2 GB, and automatically uses the long range match finder when needed (though it can also be manually overruled).
--patch-from can also be combined with multi-threading mode at a very minimal compression ratio loss.

Example usage:

# create the patch
zstd --patch-from=<oldfile> <newfile> -o <patchfile>

# apply the patch
zstd -d --patch-from=<oldfile> <patchfile> -o <newfile>`

Benchmarks:
We compared zstd to bsdiff, a popular industry grade diff engine. Our test corpus were tarballs of different versions of source code from popular GitHub repositories. Specifically:

`repos = {
    # ~31mb (small file)
    "zstd": {"url": "https://github.com/facebook/zstd", "dict-branch": "refs/tags/v1.4.2", "src-branch": "refs/tags/v1.4.3"},
    # ~273mb (medium file)
    "wordpress": {"url": "https://github.com/WordPress/WordPress", "dict-branch": "refs/tags/5.3.1", "src-branch": "refs/tags/5.3.2"},
    # ~1.66gb (large file)
    "llvm": {"url": "https://github.com/llvm/llvm-project", "dict-branch": "refs/tags/llvmorg-9.0.0", "src-branch": "refs/tags/llvmorg-9.0.1"}
}`

--patch-from on level 19 (with chainLog=30 and targetLength=4kb) is comparable with bsdiff when comparing patch sizes.
patch-size-bsdiff-vs-zstd-19

--patch-from greatly outperforms bsdiff in speed even on its slowest setting of level 19 boasting an average speedup of ~7X. --patch-from is >200X faster on level 1 and >100X faster (shown below) on level 3 vs bsdiff while still delivering patch sizes less than 0.5% of the original file size.

speed-bsdiff-vs-zstd-19

speed-bsdiff-vs-zstd-19-1

And of course, there is no change to the fast zstd decompression speed.

Addendum :

After releasing --patch-from, we were made aware of two other popular diff engines by the community: SmartVersion and Xdelta. We ran some additional benchmarks for them and here are our primary takeaways. All three tools are excellent diff engines with clear advantages (especially in speed) over the popular bsdiff. Patch sizes for both binary and text data produced by all three are pretty comparable with Xdelta underperforming Zstd and SmartVersion only slightly [1]. For patch creation speed, Xdelta is the clear winner for text data and Zstd is the clear winner for binary data [2]. And for Patch Extraction Speed (ie. decompression), Zstd is fastest in all scenarios [3]. See wiki for details.

--filelist=

Finally, --filelist= is a new CLI capability, which makes it possible to pass a list of files to operate upon from a file,
as opposed to listing all target files solely on the command line.
This makes it possible to prepare a list offline, save it into a file, and then provide the prepared list to zstd.
Another advantage is that this method circumvents command line size limitations, which can become a problem when operating on very large directories (such situation can typically happen with shell expansion).
In contrast, passing a very large list of filenames from within a file is free of such size limitation.

Full List

  • perf: Improved decompression speed (x64 >+5%, ARM >+15%), by @terrelln
  • perf: Automatically downsizes ZSTD_DCtx when too large for too long (#2069, by @bimbashrestha)
  • perf: Improved fast compression speed on aarch64 (#2040, ~+3%, by @caoyzh)
  • perf: Small level 1 compression speed gains (depending on compiler)
  • fix: Compression ratio regression on huge files (> 3 GB) using high levels (--ultra) and multithreading, by @terrelln
  • api: ZDICT_finalizeDictionary() is promoted to stable (#2111)
  • api: new experimental parameter ZSTD_d_stableOutBuffer (#2094)
  • build: Generate a single-file libzstd library (#2065, by @cwoffenden)
  • build: Relative includes, no longer require -I flags for zstd lib subdirs (#2103, by @felixhandte)
  • build: zstd now compiles cleanly under -pedantic (#2099)
  • build: zstd now compiles with make-4.3
  • build: Support mingw cross-compilation from Linux, by @Ericson2314
  • build: Meson multi-thread build fix on windows
  • build: Some misc icc fixes backed by new ci test on travis
  • cli: New --patch-from command, create and apply patches from files, by @bimbashrestha
  • cli: --filelist= : Provide a list of files to operate upon from a file
  • cli: -b can now benchmark multiple files in decompression mode
  • cli: New --no-content-size command
  • cli: New --show-default-cparams command
  • misc: new diagnosis tool, checked_flipped_bits, in contrib/, by @felixhandte
  • misc: Extend largeNbDicts benchmark to compression
  • misc: experimental edit-distance match finder in contrib/
  • doc: Improved beginner CONTRIBUTING.md docs
  • doc: New issue templates for zstd
zstd - Zstandard v1.4.4

Published by Cyan4973 almost 5 years ago

This release includes some major performance improvements and new CLI features, which make it a recommended upgrade.

Faster Decompression Speed

Decompression speed has been substantially improved, thanks to @terrelln. Exact mileage obviously varies depending on files and scenarios, but the general expectation is a bump of about +10%. The benefit is considered applicable to all scenarios, and will be perceptible for most usages.

Some benchmark figures for illustration:

v1.4.3 v1.4.4
silesia.tar 1440 MB/s 1600 MB/s
enwik8 1225 MB/s 1390 MB/s
calgary.tar 1360 MB/s 1530 MB/s

Faster Compression Speed when Re-Using Contexts

In server workloads (characterized by very high compression volume of relatively small inputs), the allocation and initialization of zstd's internal datastructures can become a significant part of the cost of compression. For this reason, zstd has long had an optimization (which we recommended for large-scale users, perhaps with something like this): when you provide an already-used ZSTD_CCtx to a compression operation, zstd tries to re-use the existing data structures, if possible, rather than re-allocate and re-initialize them.

Historically, this optimization could avoid re-allocation most of the time, but required an exact match of internal parameters to avoid re-initialization. In this release, @felixhandte removed the dependency on matching parameters, allowing the full context re-use optimization to be applied to effectively all compressions. Practical workloads on small data should expect a ~3% speed-up.

In addition to improving average performance, this change also has some nice side-effects on the extremes of performance.

  • On the fast end, it is now easier to get optimal performance from zstd. In particular, it is no longer necessary to do careful tracking and matching of contexts to compressions based on detailed parameters (as discussed for example in #1796). Instead, straightforwardly reusing contexts is now optimal.
  • Second, this change ameliorates some rare, degenerate scenarios (e.g., high volume streaming compression of small inputs with varying, high compression levels), in which it was possible for the allocation and initialization work to vastly overshadow the actual compression work. These cases are up to 40x faster, and now perform in-line with similar happy cases.

Dictionaries and Large Inputs

In theory, using a dictionary should always be beneficial. However, due to some long-standing implementation limitations, it can actually be detrimental. Case in point: by default, dictionaries are prepared to compress small data (where they are most useful). When this prepared dictionary is used to compress large data, there is a mismatch between the prepared parameters (targeting small data) and the ideal parameters (that would target large data). This can cause dictionaries to counter-intuitively result in a lower compression ratio when compressing large inputs.

Starting with v1.4.4, using a dictionary with a very large input will no longer be detrimental. Thanks to a patch from @senhuang42, whenever the library notices that input is sufficiently large (relative to dictionary size), the dictionary is re-processed, using the optimal parameters for large data, resulting in improved compression ratio.

The capability is also exposed, and can be manually triggered using ZSTD_dictForceLoad.

New commands

zstd CLI extends its capabilities, providing new advanced commands, thanks to great contributions :

  • zstd generated files (compressed or decompressed) can now be automatically stored into a different directory than the source one, using --output-dir-flat=DIR command, provided by @senhuang42 .
  • It’s possible to inform zstd about the size of data coming from stdin . @nmagerko proposed 2 new commands, allowing users to provide the exact stream size (--stream-size=# ) or an approximative one (--size-hint=#). Both only make sense when compressing a data stream from a pipe (such as stdin), since for a real file, zstd obtains the exact source size from the file system. Providing a source size allows zstd to better adapt internal compression parameters to the input, resulting in better performance and compression ratio. Additionally, providing the precise size makes it possible to embed this information in the compressed frame header, which also allows decoder optimizations.
  • In situations where the same directory content get regularly compressed, with the intention to only compress new files not yet compressed, it’s necessary to filter the file list, to exclude already compressed files. This process is simplified with command --exclude-compressed, provided by @shashank0791 . As the name implies, it simply excludes all compressed files from the list to process.

Single-File Decoder with Web Assembly

Let’s complete the picture with an impressive contribution from @cwoffenden. libzstd has long offered the capability to build only the decoder, in order to generate smaller binaries that can be more easily embedded into memory-constrained devices and applications.

@cwoffenden built on this capability and offers a script creating a single-file decoder, as an amalgamated variant of reference Zstandard’s decoder. The package is completed with a nice build script, which compiles the one-file decoder into WASM code, for embedding into web application, and even tests it.

As a capability example, check out the awesome WebGL demo provided by @cwoffenden in /contrib/single_file_decoder/examples directory!

Full List

  • perf: Improved decompression speed, by > 10%, by @terrelln
  • perf: Better compression speed when re-using a context, by @felixhandte
  • perf: Fix compression ratio when compressing large files with small dictionary, by @senhuang42
  • perf: zstd reference encoder can generate RLE blocks, by @bimbashrestha
  • perf: minor generic speed optimization, by @davidbolvansky
  • api: new ability to extract sequences from the parser for analysis, by @bimbashrestha
  • api: fixed decoding of magic-less frames, by @terrelln
  • api: fixed ZSTD_initCStream_advanced() performance with fast modes, reported by @QrczakMK
  • cli: Named pipes support, by @bimbashrestha
  • cli: short tar's extension support, by @stokito
  • cli: command --output-dir-flat=DIE , generates target files into requested directory, by @senhuang42
  • cli: commands --stream-size=# and --size-hint=#, by @nmagerko
  • cli: command --exclude-compressed, by @shashank0791
  • cli: faster -t test mode
  • cli: improved some error messages, by @vangyzen
  • cli: fix command -D dictionary on Windows
  • cli: fix rare deadlock condition within dictionary builder, by @terrelln
  • build: single-file decoder with emscripten compilation script, by @cwoffenden
  • build: fixed zlibWrapper compilation on Visual Studio, reported by @bluenlive
  • build: fixed deprecation warning for certain gcc version, reported by @jasonma163
  • build: fix compilation on old gcc versions, by @cemeyer
  • build: improved installation directories for cmake script, by Dmitri Shubin
  • pack: modified pkgconfig, for better integration into openwrt, requested by @neheb
  • misc: Improved documentation : ZSTD_CLEVEL, DYNAMIC_BMI2, ZSTD_CDict, function deprecation, zstd format
  • misc: fixed educational decoder : accept larger literals section, and removed UNALIGNED() macro
zstd - Zstandard v1.4.3

Published by felixhandte about 5 years ago

Dictionary Compression Regression

We discovered an issue in the v1.4.2 release, which can degrade the effectiveness of dictionary compression. This release fixes that issue.

Detailed Changes

  • bug: Fix Dictionary Compression Ratio Regression by @cyan4973 (#1709)
  • bug: Fix Buffer Overflow in v0.3 Decompression by @felixhandte (#1722)
  • build: Add support for IAR C/C++ Compiler for Arm by @joseph0918 (#1705)
  • misc: Add NULL pointer check in util.c by @leeyoung624 (#1706)
zstd - Zstandard v1.4.2

Published by felixhandte about 5 years ago

Legacy Decompression Fix

This release is a small one, that corrects an issue discovered in the previous release. Zstandard v1.4.1 included a bug in decompressing v0.5 legacy frames, which is fixed in v1.4.2.

Detailed Changes

  • bug: Fix bug in zstd-0.5 decoder by @terrelln (#1696)
  • bug: Fix seekable decompression in-memory API by @iburinoc (#1695)
  • bug: Close minor memory leak in CLI by @LeeYoung624 (#1701)
  • misc: Validate blocks are smaller than size limit by @vivekmig (#1685)
  • misc: Restructure source files by @ephiepark (#1679)
zstd - Zstandard v1.4.1

Published by felixhandte over 5 years ago

Maintenance

This release is primarily a maintenance release.

It includes a few bug fixes, including a fix for a rare data corruption bug, which could only be triggered in a niche use case, when doing all of the following: using multithreading mode, with an overlap size >= 512 MB, using a strategy >= ZSTD_btlazy, and compressing more than 4 GB. None of the default compression levels meet these requirements (not even --ultra ones).

Performance

This release also includes some performance improvements, among which the primary improvement is that Zstd decompression is ~7% faster, thanks to @mgrice.

See this comparison of decompression speeds at different compression levels, measured on the Silesia Corpus, on an Intel i9-9900K with GCC 9.1.0.

Level v1.4.0 v1.4.1 Delta
1 1390 MB/s 1453 MB/s +4.5%
3 1208 MB/s 1301 MB/s +7.6%
5 1129 MB/s 1233 MB/s +9.2%
7 1224 MB/s 1347 MB/s +10.0%
16 1278 MB/s 1430 MB/s +11.8%

Detailed list of changes

  • bug: Fix data corruption in niche use cases (huge inputs + multithreading + large custom window sizes + other conditions) by @terrelln (#1659)
  • bug: Fuzz legacy modes, fix uncovered bugs by @terrelln (#1593, #1594, #1595)
  • bug: Fix out of bounds read by @terrelln (#1590)
  • perf: Improved decoding speed by ~7% @mgrice (#1668)
  • perf: Large compression ratio improvement for small windowLog by @cyan4973 (#1624)
  • perf: Faster compression speed in high compression mode for repetitive data by @terrelln (#1635)
  • perf: Slightly improved compression ratio of level 3 and 4 (ZSTD_dfast) by @cyan4973 (#1681)
  • perf: Slightly faster compression speed when re-using a context by @cyan4973 (#1658)
  • api: Add parameter to generate smaller dictionaries by @tyler-tran (#1656)
  • cli: Recognize symlinks when built in C99 mode by @felixhandte (#1640)
  • cli: Expose cpu load indicator for each file on -vv mode by @ephiepark (#1631)
  • cli: Restrict read permissions on destination files by @chungy (#1644)
  • cli: zstdgrep: handle -f flag by @felixhandte (#1618)
  • cli: zstdcat: follow symlinks by @vejnar (#1604)
  • doc: Remove extra size limit on compressed blocks by @felixhandte (#1689)
  • doc: Improve documentation on streaming buffer sizes by @cyan4973 (#1629)
  • build: CMake: support building with LZ4 @leeyoung624 (#1626)
  • build: CMake: install zstdless and zstdgrep by @leeyoung624 (#1647)
  • build: CMake: respect existing uninstall target by @j301scott (#1619)
  • build: Make: skip multithread tests when built without support by @michaelforney (#1620)
  • build: Make: Fix examples/ test target by @sjnam (#1603)
  • build: Meson: rename options out of deprecated namespace by @lzutao (#1665)
  • build: Meson: fix build by @lzutao (#1602)
  • build: Visual Studio: don't export symbols in static lib by @scharan (#1650)
  • build: Visual Studio: fix linking by @absotively (#1639)
  • build: Fix MinGW-W64 build by @myzhang1029 (#1600)
  • misc: Expand decodecorpus coverage by @ephiepark (#1664)
zstd - Zstandard v1.4.0

Published by terrelln over 5 years ago

Advanced API

The main focus of the v1.4.0 release is the stabilization of the advanced API.

The advanced API provides a way to set specific parameters during compression and decompression in an API and ABI compatible way. For example, it allows you to compress with multiple threads, enable --long mode, set frame parameters, and load dictionaries. It is compatible with ZSTD_compressStream*() and ZSTD_compress2(). There is also an advanced decompression API that allows you to set parameters like maximum memory usage, and load dictionaries. It is compatible with the existing decompression functions ZSTD_decompressStream() and ZSTD_decompressDCtx().

The old streaming functions are all compatible with the new API, and the documentation provides the equivalent function calls in the new API. For example, see ZSTD_initCStream(). The stable functions will remain supported, but the functions in the experimental sections, like ZSTD_initCStream_usingDict(), will eventually be marked as deprecated and removed in favor of the new advanced API.

The examples have all been updated to use the new advanced API. If you have questions about how to use the new API, please refer to the examples, and if they are unanswered, please open an issue.

Performance

Zstd's fastest compression level just got faster! Thanks to ideas from Intel's igzip and @gbtucker, we've made level 1, zstd's fastest strategy, 6-8% faster in most scenarios. For example on the Silesia Corpus with level 1, we see 0.2% better compression compared to zstd-1.3.8, and these performance figures on an Intel i9-9900K:

Version C. Speed D. Speed
1.3.8 gcc-8 489 MB/s 1343 MB/s
1.4.0 gcc-8 532 MB/s (+8%) 1346 MB/s
1.3.8 clang-8 488 MB/s 1188 MB/s
1.4.0 clang-8 528 MB/s (+8%) 1216 MB/s

New Features

A new experimental function ZSTD_decompressBound() has been added by @shakeelrao. It is useful when decompressing zstd data in a single shot that may, or may not have the decompressed size written into the frame. It is exact when the decompressed size is written into the frame, and a tight upper bound within 128 KB, as long as ZSTD_e_flush and ZSTD_flushStream() aren't used. When ZSTD_e_flush is used, in the worst case the bound can be very large, but this isn't a common scenario.

The parameter ZSTD_c_literalCompressionMode and the CLI flag --[no-]compress-literals allow users to explicitly enable and disable literal compression. By default literals are compressed with positive compression levels, and left uncompressed for negative compression levels. Disabling literal compression boosts compression and decompression speed, at the cost of compression ratio.

Detailed list of changes

  • perf: Improve level 1 compression speed in most scenarios by 6% by @gbtucker and @terrelln
  • api: Move the advanced API, including all functions in the staging section, to the stable section
  • api: Make ZSTD_e_flush and ZSTD_e_end block for maximum forward progress
  • api: Rename ZSTD_CCtxParam_getParameter to ZSTD_CCtxParams_getParameter
  • api: Rename ZSTD_CCtxParam_setParameter to ZSTD_CCtxParams_setParameter
  • api: Don't export ZSTDMT functions from the shared library by default
  • api: Require ZSTD_MULTITHREAD to be defined to use ZSTDMT
  • api: Add ZSTD_decompressBound() to provide an upper bound on decompressed size by @shakeelrao
  • api: Fix ZSTD_decompressDCtx() corner cases with a dictionary
  • api: Move ZSTD_getDictID_*() functions to the stable section
  • api: Add ZSTD_c_literalCompressionMode flag to enable or disable literal compression by @terrelln
  • api: Allow compression parameters to be set when a dictionary is used
  • api: Allow setting parameters before or after ZSTD_CCtx_loadDictionary() is called
  • api: Fix ZSTD_estimateCStreamSize_usingCCtxParams()
  • api: Setting ZSTD_d_maxWindowLog to 0 means use the default
  • cli: Ensure that a dictionary is not used to compress itself by @shakeelrao
  • cli: Add --[no-]compress-literals flag to enable or disable literal compression
  • doc: Update the examples to use the advanced API
  • doc: Explain how to transition from old streaming functions to the advanced API in the header
  • build: Improve the Windows release packages
  • build: Improve CMake build by @hjmjohnson
  • build: Build fixes for FreeBSD by @lwhsu
  • build: Remove redundant warnings by @thatsafunnyname
  • build: Fix tests on OpenBSD by @bket
  • build: Extend fuzzer build system to work with the new clang engine
  • build: CMake now creates the libzstd.so.1 symlink
  • build: Improve Menson build by @lzutao
  • misc: Fix symbolic link detection on FreeBSD
  • misc: Use physical core count for -T0 on FreeBSD by @cemeyer
  • misc: Fix zstd --list on truncated files by @kostmo
  • misc: Improve logging in debug mode by @felixhandte
  • misc: Add CirrusCI tests by @lwhsu
  • misc: Optimize dictionary memory usage in corner cases
  • misc: Improve the dictionary builder on small or homogeneous data
  • misc: Fix spelling across the repo by @jsoref
zstd - Zstandard v1.3.8

Published by Cyan4973 almost 6 years ago

Advanced API

v1.3.8 main focus is the stabilization of the advanced API.

This API has been in the making for more than a year, and makes it possible to trigger advanced features, such as multithreading, --long mode, or detailed frame parameters, in a straightforward and extensible manner. Some examples are provided in this blog entry.
To make this vision possible, the advanced API relies on sticky parameters, which can be stacked on top of each other in any order. This makes it possible to introduce new features in the future without breaking API nor ABI.

This API has provided a good experience in our infrastructure, and we hope it will prove easy to use and efficient in your applications. Nonetheless, before being branded "stable", this proposal must spend a last round in "staging area", in order to generate comments and feedback from new users. It's planned to be labelled "stable" by v1.4.0, which is expected to be next release, depending on received feedback.

The experimental section still contains a lot of prototypes which are largely redundant with the new advanced API. Expect them to become deprecated, and then later dropped in some future. Transition towards the newer advanced API is therefore highly recommended.

Performance

Decoding speed has been improved again, primarily for some specific scenarios : frames using large window sizes (--ultra or --long), and cold dictionary. Cold dictionary is expected to become more important in the near future, as solutions relying on thousands of dictionaries simultaneously will be deployed.

The higher compression levels get a slight compression ratio boost, mostly visible for small (<256 KB) and large (>32 MB) data streams. This change benefits asymmetric scenarios (compress ones, decompress many times), typically targeting level 19.

New features

A noticeable addition, @terrelln introduces the --rsyncable mode to zstd. Similar to gzip --rsyncable, it generates a compressed frame which is friendly to rsync in case of limited changes : a difference in the input data will only impact a small localized amount of compressed data, instead of everything from the position onward due to cascading impacts. This is useful for very large archives regularly updated and synchronized over long distance connections (as an example, compressed mailboxes come to mind).

The method used by zstd preserves the compression ratio very well, introducing only very tiny losses due to synchronization points, meaning it's no longer a sacrifice to use --rsyncable. Here is an example on silesia.tar, using default compression level :

compressor normal --rsyncable Ratio diff. time
gzip 68235456 68778265 -0.795% 7.92s
zstd 66829650 66846769 -0.026% 1.17s

Speaking of compression of level : it's now possible to use environment variable ZSTD_CLEVEL to influence default compression level. This can prove useful in situations where it's not possible to provide command line parameters, typically when zstd is invoked "under the hood" by some calling process.

Lastly, anyone interested in embedding a small zstd decoder into a space-constrained application will be interested in a new set of build macros introduced by @felixhandte, which makes it possible to selectively turn off decoder features to reduce binary size even further. Final binary size will of course vary depending on target assembler and compiler, but in preliminary testings on x64, it helped reducing the decoder size by a factor 3 (from ~64KB towards ~20KB).

Detailed list of changes

  • perf: better decompression speed on large files (+7%) and cold dictionaries (+15%)
  • perf: slightly better compression ratio at high compression modes
  • api : finalized advanced API, last stage before "stable" status
  • api : new --rsyncable mode, by @terrelln
  • api : support decompression of empty frames into NULL (used to be an error) (#1385)
  • build: new set of build macros to generate a minimal size decoder, by @felixhandte
  • build: fix compilation on MIPS32, reported by @clbr (#1441)
  • build: fix compilation with multiple -arch flags, by @ryandesign
  • build: highly upgraded meson build, by @lzutao
  • build: improved buck support, by @obelisk
  • build: fix cmake script : can create debug build, by @pitrou
  • build: Makefile : grep works on both colored consoles and systems without color support
  • build: fixed zstd-pgo target, by @bmwiedemann
  • cli : support ZSTD_CLEVEL environment variable, by @yijinfb (#1423)
  • cli : --no-progress flag, preserving final summary (#1371), by @terrelln
  • cli : ensure destination file is not source file (#1422)
  • cli : clearer error messages, notably when input file not present
  • doc : clarified zstd_compression_format.md, by @ulikunitz
  • misc: fixed zstdgrep, returns 1 on failure, by @lzutao
  • misc: NEWS renamed as CHANGELOG, in accordance with fb.oss policy
zstd - Zstandard regression testing data

Published by terrelln almost 6 years ago

Zstandard regression testing data

zstd - Zstandard v1.3.7

Published by Cyan4973 about 6 years ago

This is minor fix release building upon v1.3.6.

The main reason we publish this new version is that @indygreg detected an important compression ratio regression for a specific scenario (compressing with dictionary at level 9 or 10 for small data, or 11 - 12 for large data) . We don't anticipate this scenario to be common : dictionary compression is still rare, then most users prefer fast modes (levels <=3), a few rare ones use strong modes (level 15-19), so "middle compression" is an extreme rarity.
But just in case some user do, we publish this release.

A few other minor things were ongoing and are therefore bundled.

Decompression speed might be slightly better with clang, depending on exact target and version. We could observe as mush as 7% speed gains in some cases, though in other cases, it's rather in the ~2% range.

The integrated backtrace functionality in the cli is updated : its presence can be more easily controlled, invoking BACKTRACE build macro. The automatic detector is more restrictive, and release mode builds without it by default. We want to be sure the default make compiles without any issue on most platforms.

Finally, the list of man pages has been completed with documentation for zstdless and zstdgrep, by @samrussell .

Detailed list of changes

  • perf: slightly better decompression speed on clang (depending on hardware target)
  • fix : ratio for dictionary compression at levels 9 and 10, reported by @indygreg
  • build: no longer build backtrace by default in release mode; restrict further automatic mode
  • build: control backtrace support through build macro BACKTRACE
  • misc: added man pages for zstdless and zstdgrep, by @samrussell
zstd - Zstandard v1.3.6 "Database Edition"

Published by Cyan4973 about 6 years ago

Zstandard v1.3.6 release is focused on intensive dictionary compression for database scenarios.

This is a new environment we are experimenting. The success of dictionary compression on small data, of which databases tend to store plentiful, led to increased adoption, and we now see scenarios where literally thousands of dictionaries are being used simultaneously, with permanent generation or update of new dictionaries.

To face these new conditions, v1.3.6 brings a few improvements to the table :

  • A brand new, faster dictionary builder, by @jenniferliu, under guidance from @terrelln. The new builder, named fastcover, is about 10x faster than our previous default generator, cover, while suffering only negligible accuracy losses (<1%). It's effectively an approximative version of cover, which throws away accuracy for the benefit of speed and memory. The new dictionary builder is so effective that it has become our new default dictionary builder (--train). Slower but higher quality generator remains accessible using --train-cover command.

Here is an example, using the "github user records" public dataset (about 10K records of about 1K each) :

builder algorithm generation time compression ratio
fast cover (v1.3.6 --train) 0.9 s x10.29
cover (v1.3.5 --train) 10.1 s x10.31
High accuracy fast cover (--train-fastcover) 6.6 s x10.65
High accuracy cover (--train-cover) 50.5 s x10.66
  • Faster dictionary decompression under memory pressure, when using thousands of dictionaries simultaneously. The new decoder is able to detect cold vs hot dictionary scenarios, and adds clever prefetching decisions to minimize memory latency. It typically improves decoding speed by ~+30% (vs v1.3.5).

  • Faster dictionary compression under memory pressure, when using a lot of contexts simultaneously. The new design, by @felixhandte, reduces considerably memory usage when compressing small data with dictionaries, which is the main scenario found in databases. The sharp memory usage reduction makes it easier for CPU caches to manages multiple contexts in parallel. Speed gains scale with number of active contexts, as shown in the graph below :
    Dictionary compression : Speed vs Nb Active Contexts

    Note that, in real-life environment, benefits are present even faster, since cpu caches tend to be used by multiple other process / threads at the same time, instead of being monopolized by a single synthetic benchmark.

Other noticeable improvements

A new command --adapt, makes it possible to pipe gigantic amount of data between servers (typically for backup scenarios), and let the compressor automatically adjust compression level based on perceived network conditions. When the network becomes slower, zstd will use available time to compress more, and accelerate again when bandwidth permit. It reduces the need to "pre-calibrate" speed and compression level, and is a good simplification for system administrators. It also results in gains for both dimensions (better compression ratio and better speed) compared to the more traditional "fixed" compression level strategy.
This is still early days for this feature, and we are eager to get feedback on its usages. We know it works better in fast bandwidth environments for example, as adaptation itself becomes slow when bandwidth is slow. This is something that will need to be improved. Nonetheless, in its current incarnation, --adapt already proves useful for several datacenter scenarios, which is why we are releasing it.

Advanced users will be please by the expansion of an existing tool, tests/paramgrill, which has been refined by @georgelu. This tool explores the space of advanced compression parameters, to find the best possible set of compression parameters for a given scenario. It takes as input a set of samples, and a set of constraints, and works its way towards better and better compression parameters respecting the constraints.

Example :

./paramgrill --optimize=cSpeed=50M dirToSamples/*   # requires minimum compression speed of 50 MB/s
optimizing for dirToSamples/* - limit compression speed 50 MB/s

(...)

/*   Level  5   */       { 20, 18, 18,  2,  5,  2,ZSTD_greedy  ,  0 },     /* R:3.147 at  75.7 MB/s - 567.5 MB/s */   # best level satisfying constraint
--zstd=windowLog=20,chainLog=18,hashLog=18,searchLog=2,searchLength=5,targetLength=2,strategy=3,forceAttachDict=0

(...)

/* Custom Level */       { 21, 16, 18,  2,  6,  0,ZSTD_lazy2   ,  0 },     /* R:3.240 at  53.1 MB/s - 661.1 MB/s */  # best custom parameters found
--zstd=windowLog=21,chainLog=16,hashLog=18,searchLog=2,searchLength=6,targetLength=0,strategy=5,forceAttachDict=0   # associated command arguments, can be copy/pasted for `zstd`

Finally, documentation has been updated, to reflect wording adopted by IETF RFC 8478 (Zstandard Compression and the application/zstd Media Type).

Detailed changes list

  • perf: much faster dictionary builder, by @jenniferliu
  • perf: faster dictionary compression on small data when using multiple contexts, by @felixhandte
  • perf: faster dictionary decompression when using a very large number of dictionaries simultaneously
  • cli : fix : does no longer overwrite destination when source does not exist (#1082)
  • cli : new command --adapt, for automatic compression level adaptation
  • api : fix : block api can be streamed with > 4 GB, reported by @catid
  • api : reduced ZSTD_DDict size by 2 KB
  • api : minimum negative compression level is defined, and can be queried using ZSTD_minCLevel() (#1312).
  • build: support Haiku target, by @korli
  • build: Read Legacy support is now limited to v0.5+ by default. Can be changed at compile time with macro ZSTD_LEGACY_SUPPORT.
  • doc : zstd_compression_format.md updated to match wording in IETF RFC 8478
  • misc: tests/paramgrill, a parameter optimizer, by @GeorgeLu97
zstd - Zstandard v1.3.5 "Dictionary Edition"

Published by Cyan4973 over 6 years ago

Zstandard v1.3.5 is a maintenance release focused on dictionary compression performance.

Compression is generally associated with the act of willingly requesting the compression of some large source. However, within datacenters, compression brings its best benefits when completed transparently. In such scenario, it's actually very common to compress a large number of very small blobs (individual messages in a stream or log, or records in a cache or datastore, etc.). Dictionary compression is a great tool for these use cases.

This release makes dictionary compression significantly faster for these situations, when compressing small to very small data (inputs up to ~16 KB).

Dictionary compression : speed vs input size

The above image plots the compression speeds at different input sizes for zstd v1.3.4 (red) and v1.3.5 (green), at levels 1, 3, 9, and 18.
The benchmark data was gathered on an Intel Xeon CPU E5-2680 v4 @ 2.40GHz. The benchmark was compiled with clang-7.0, with the flags -O3 -march=native -mtune=native -DNDEBUG. The file used in the results shown here is the osdb file from the Silesia corpus, cut into small blocks. It was selected because it performed roughly in the middle of the pack among the Silesia files.

The new version saves substantial initialization time, which is increasingly important as the average size to compress becomes smaller. The impact is even more perceptible at higher levels, where initialization costs are higher. For larger inputs, performance remain similar.

Users can expect to measure substantial speed improvements for inputs smaller than 8 KB, and up to 32 KB depending on the context. The expected speed-up ranges from none (large, incompressible blobs) to many times faster (small, highly compressible inputs). Real world examples up to 15x have been observed.

Other noticeable improvements

The compression levels have been slightly adjusted, taking into consideration the higher top speed of level 1 since v1.3.4, and making level 19 a substantially stronger compression level while preserving the 8 MB window size limit, hence keeping an acceptable memory budget for decompression.

It's also possible to select the content of libzstd by modifying macro values at compilation time. By default, libzstd contains everything, but its size can be made substantially smaller by removing support for the dictionary builder, or legacy formats, or deprecated functions. It's even possible to build a compression-only or a decompression-only library.

Detailed changes list

  • perf: much faster dictionary compression, by @felixhandte
  • perf: small quality improvement for dictionary generation, by @terrelln
  • perf: improved high compression levels (notably level 19)
  • mem : automatic memory release for long duration contexts
  • cli : fix : overlapLog can be manually set
  • cli : fix : decoding invalid lz4 frames
  • api : fix : performance degradation for dictionary compression when using advanced API, by @terrelln
  • api : change : clarify ZSTD_CCtx_reset() vs ZSTD_CCtx_resetParameters(), by @terrelln
  • build: select custom libzstd scope through control macros, by @GeorgeLu97
  • build: OpenBSD support, by @bket
  • build: make and make all are compatible with -j
  • doc : clarify zstd_compression_format.md, updated for IETF RFC process
  • misc: pzstd compatible with reproducible compilation, by @lamby

Known bug

zstd --list does not work with non-interactive tty.
This issue is fixed in dev branch.

Package Rankings
Top 14.05% on Cocoapods.org
Top 14.77% on Pypi.org
Top 1.44% on Conda-forge.org
Top 6.42% on Swiftpackageindex.com
Top 8.97% on Repo1.maven.org
Top 0.51% on Pkg.adelielinux.org
Top 3.49% on Proxy.golang.org
Top 4.03% on Anaconda.org
Top 0.92% on Formulae.brew.sh
Badges
Extracted from project README
Build Status Build status Build status Fuzzing Status