LZSSE and Lizard · Aras’ web site
Introduction and index of this sequence is here.
Some folks requested whether or not I’ve examined LZSSE or Lizard.
I’ve not! However I’ve been conscious of them for years. So right here’s a brief publish, testing them on “my” knowledge set. Be aware that no less than at the moment
each of those compressors don’t appear to be actively developed or up to date.
LZSSE and Lizard, with out knowledge filtering
Right here they’re on Home windows (VS2022, Ryzen 5950X). Additionally included Zstd and LZ4 for comparability, as faint dashed traces:
For LZSSE I’ve examined LZSSE8 variant, since that’s what readme tells to usually use.
“Zero” compression degree right here is the “quick” compressor; different ranges are the “optimum” compressor. Compression ranges past 5 appear
to not purchase a lot ratio, however get a lot slower to compress. On this machine, on this knowledge set, it doesn’t look competetive –
compression ratio is similar to LZ4; decompression a bit slower, compression lots slower.
For Lizard (née LZ5), it truly is like 4 totally different compression algorithms in there
(fastLZ4, LIZv1, fastLZ4 + Huffman, LIZv1 + Huffman). I’ve not examined the Huffman variants since they can’t co-exist with Zstd
in the identical construct simply (image redefinitions). The fastLZ4 is proven as lizard1x
right here, and LIZv1 is proven as lizard2x
.
lizard1x
(i.e. Lizard compression ranges 10..19) appears to be just about the identical as LZ4. Perhaps it was sooner than LZ4 again in
2019, however since then LZ4 gained some efficiency enhancements?
lizard2x
is attention-grabbing – higher compression ratio than LZ4, a bit slower decompression velocity. Within the center between Zstd and LZ4
in relation to decompression parameter area.
What about Mac?
The above charts are on x64 structure, and Visible Studio compiler. How a couple of Mac (with a Clang compiler)? However first, we’d like
to get LZSSE working there, since it is extremely a lot written with uncooked SSE4.1 intrinsics and no fallback or different platform paths.
Fortunately, simply dropping a sse2neon.h into the undertaking and doing a
tiny change in LZSSE supply make it simply work on an Apple M1 platform.
With that out of the way in which, right here’s the chart on Apple M1 Max with Clang 14:
Right here lzsse8
and lizard1x
do get forward of LZ4 when it comes to decompression efficiency. lizard1x
is about 40% sooner than LZ4 at
decompression on the similar compression ratio. LZSSE is “a bit” sooner (however compression efficiency continues to be lots slower than LZ4).
LZSSE and Lizard, with knowledge filtering and chunking
If there’s something we’ve discovered thus far on this entire sequence, is that “filtering” the information earlier than compression can improve the
compression ratio lots (which in flip can velocity up each compression and decompression as a consequence of knowledge being simpler or smaller). So let’s do
that!
Home windows case, all compressors with “cut up bytes, delta” filter from part 7,
and every 1MB block is compressed independently (see part 8):
Effectively, neither LZSSE nor Lizard are superb right here – LZ4 with filtering is quicker than both of them, with a barely higher compression ratio
too. When you’d need larger compression ratio, you’d attain for filtered Zstd.
On a Mac issues are a bit extra attention-grabbing for lzsse8
case; it could get forward of filtered LZ4 decompression efficiency at expense of some
compression ratio loss:
I’ve additionally examined on Home windows (similar Ryzen 5950X) however utilizing Clang 15 compiler. Neither LZSSE nor Lizard are on the Pareto frontier right here:
Conclusions
On my knowledge set, neither LZSSE nor Lizard are a lot competetive towards (filtered or unfiltered) LZ4 or Zstd. They could have been a number of
years in the past after they had been developed, however since then each LZ4 and Zstd acquired a number of speedup optimizations.
Lizard ranges 10-19, with none knowledge filtering, do get forward of LZ4 in decompression efficiency, however solely on Apple M1.
LZSSE is “principally LZ4” when it comes to decompression efficiency, however the compressor is far slower (truthful, the undertaking says as a lot within the readme).
Curiously sufficient, the place LZSSE will get forward of LZ4 is on an Apple M1, a platform it isn’t even presupposed to work on outdoors the field 🙂
Perhaps subsequent time I’ll lastly have a look at lossy floating level compression. Who is aware of!