普通视图

发现新文章,点击刷新页面。
昨天以前首页

Benchmarking compression programs

作者 MaskRay
2025年8月31日 15:00

tl;dr https://gist.github.com/MaskRay/74cdaa83c1f44ee105fcebcdff0ba9a7is a single-file Ruby program that downloads and compiles multiplecompression utilities, then benchmarks their compression anddecompression performance on a specified input file, finally generates aHTML file with scatter charts. Scroll to the end to view example HTMLpages.

Compression algorithms can be broadly categorized into three groupsbased on their typical compression ratio and decompression speed:

  • Low ratio, high speed: lz4, snappy, Oodle Selkie.
  • Medium ratio, medium speed: zlib, zstd, brotli, OodleKraken.
  • High ratio, low speed: LZMA, bzip2, bzip3, bsc, zpaq,kanzi, Oodle Leviathan.

Low ratio Codecs in this category prioritize speedabove all else. The compression and compression speeds are comparable.They are designed to decompress so quickly that they don't introduce anoticeable delay when reading data from storage like solid-state drives.These codecs typically producing byte-aligned output and often skip thefinal step of entropy encoding, which, while crucial for highcompression, is computationally intensive. They are excellent choicesfor applications where latency is critical, such as kernel features likezswap.

Medium ratio This is the sweet spot for many tasks.The codecs achieve better compression ratio by employing entropyencoding, usually Huffman coding.

zstd has emerged as a clear leader, gaining popularity andeffectively supplanting older codecs like the venerable DEFLATE(zlib).

High ratio They are designed to squeeze every lastbit of redundancy out of the data, often at the cost of significantlylonger compression and decompression times, and large memory usage. Theyare perfect for archival purposes or data distribution where the filesare compressed once and decompressed infrequently. Codecs typically have3 important components:

  • Transforms: Codecs typically implement strong transforms to increaseredundancy, even very specific ones like branch/call/jump filters formachine code.
  • Predication model: This model anticipates the next piece of databased on what has already been processed.
  • Entropy encoding: Traditional codecs use arithmetic encoder, whichis replaced by the more efficient Range variant of Asymmetric NumeralSystems (rANS).

Some projects apply neural network models, such as Recurrent NeuralNetwork, Long Short-Term Memory, and Transformer, to the predicationmodel. They are usually very slow.


This categorization is loose. Many modern programs offer a wide rangeof compression levels that allow them to essentially span multiplecategories. For example, a high-level zstd compression canachieve a ratio comparable to xz (a high-compression codec) byusing more RAM and CPU. While zstd's compression speed or ratiois generally lower, its decompression speed is often much faster thanthat of xz.

Benchmarking

I want to benchmark the single worker performance of a fewcompression programs:

  • lz4: Focuses on speed over compression ratio. It seemsPareto superior to Google's Snappy.
  • zstd: Gained significant traction and obsoleted manyexisting codecs. Its LZ77 variant uses three recent match offsets likeLZX. For entropy encoding, it employs Huffman coding for literals and2-way interleaved Finite State Entropy for Huffman weights, literallengths, match lengths, and offset codes. The large alphabet of literalsmakes Huffman a good choice, as compressing them with FSE provideslittle gain for a speed cost. However, other symbols have a small range,making them a sweet spot for FSE. zstd works on multiple streams at thesame time to utilize instruction-level parallelism. zstd is supported bythe Accept-Encoding: zstd HTTP header.
  • brotli: Uses a combination of LZ77, 2nd order contextmodel, Huffman coding, and static dictionary. The decompression speed issimilar to gzip with a higher ratio. At lower levels, its performance isovershadowed by zstd. Compared with DEFLATE, it employs alarger sliding window (from 16KiB-16B to 16MiB-16B) and a smallerminimum match length (2 instead of 3). It has a predefined dictionarythat works well for web content (but feels less elegant) and supports120 transforms. brotli is supported by theAccept-Encoding: br HTTP header.
  • bzip3: Combines BWT, RLE, and LZP and uses arithmeticencoder.
  • xz: LZMA2 with a few filters. The filters must be enabledexplicitly.
  • lzham: Provides a compression ratio similar to LZMA but with fasterdecompression. Compression is slightly slower. The build system is notwell-polished for Linux. I have forked it, fixed stdint.hbuild errors, and installed lzhamtest. The command lineprogram lzhamtest should really be renamed tolzham.
  • kanzi: There are a wide variety of transforms and entropyencoders, unusual for a compresion program. For the compression speed ofenwik8, it's Pareto superior to xz, but decompression isslower.

I'd like to test lzham (not updated for a few years), but I'm havingtrouble getting it to compile due to a cstdio headerissue.

Many modern compressors are parallel by default. I have to disablethis behavior by using options like -T1. Still,zstd uses a worker thread for I/O overlap, but I don't botherwith --single-thread.

To ensure fairness, each program is built with consistent compileroptimizations, such as -O3 -march=native.

Below is a Ruby program that downloads and compiles multiplecompression utilities, then benchmarks their compression anddecompression performance on a specified input file, finally generates aHTML file with scatter charts. The program has several notablefeatures:

  • Adding new compressors is easy: just modifyCOMPRESSORS.
  • Benchmark results are cached in files namedcache_$basename_$digest.json, allowing reuse of previousruns for the same input file.
  • Adding a new compression level does not invalidate existingbenchmark results for other levels.
  • The script generates an HTML file with interactive scatter charts.Each compressor is assigned a unique, deterministic color based on ahash of its name (using the hsl function in CSS).

The single file Ruby program is available at https://gist.github.com/MaskRay/74cdaa83c1f44ee105fcebcdff0ba9a7

Limitation

A single run might not be representative.

Running the executable incurs initialization overhead, which would beamortized in a library setup. However, library setup would make updatinglibraries more difficult.

Demo

1
2
3
4
5
ruby bench.rb enwik8
# The first iframe below

ruby bench.rb clang
# The second iframe below

Many programs exhibit a stable decompression speed (uncompressed size/ decompression time). There is typically a slightly higherdecompression speed at higher compression levels. If you think of thecompressed content as a form of "byte code", a more highly compressedfile means there are fewer bytes for the decompression algorithm toprocess, resulting in faster decompression. Some programs, likezpaq and kanzi, use different algorithms that canresult in significantly different decompression speeds.

xz -9 doesn't use parallelism on the two files under~100 MiB because their uncompressed size is smaller than the defaultblock size for level 9.

From install/include/lzma/container.h

For each thread, about 3 * block_size bytes of memory will beallocated. This may change in later liblzma versions. If so, the memoryusage will probably be reduced, not increased.

❌
❌