Launch Zstandard v1.5.5 · fb/zstd · GitHub
It is a fast repair launch. The first focus is to right a uncommon corruption bug in excessive compression mode, detected by @danlark1 . The chance to generate such a situation by random likelihood is extraordinarily low. It evaded months of steady fuzzer assessments, because of the nb and complexity of simultaneous situations required to set off it. However, @danlark1 from Google shepherds such a humongous quantity of knowledge that he managed to detect a replica case (corruptions are detected due to the checksum), making it attainable for @terrelln to analyze and repair the bug. Thanks !
Whereas the chance is perhaps very small, corruption points are nonetheless very severe, so an replace to this model is very really useful, particularly when you make use of excessive compression modes (ranges 16+).
When the problem was detected, there have been quite a lot of different enhancements and minor fixes within the making, therefore they’re additionally current on this launch. Let’s now element the primary ones.
Improved reminiscence utilization and pace for the --patch-from
mode
V1.5.5
introduces memory-mapped dictionaries, by @daniellerozenblit, for each posix #3486 and home windows #3557.
This function permits zstd
to memory-map massive dictionaries, relatively than requiring to load them into reminiscence. This may make a fairly large distinction for memory-constrained environments working patches for big knowledge units.
It is largely seen beneath reminiscence strain, since mmap
will be capable to launch less-used reminiscence and proceed working.
However even when reminiscence is plentiful, there are nonetheless measurable reminiscence advantages, as proven within the graph under, particularly when the reference seems to be not utterly related for the patch.
This function is robotically enabled for --patch-from
compression/decompression when the dictionary is bigger than the user-set reminiscence restrict. It can be manually enabled/disabled utilizing --mmap-dict
or --no-mmap-dict
respectively.
Moreover, @daniellerozenblit introduces important pace enhancements for --patch-from
.
An I/O
optimization in #3486 significantly improves --patch-from
decompression pace on Linux, usually by +50%
on massive recordsdata (~1GB).
Compression pace can be taken care of, with a dictionary-indexing pace optimization launched in #3545. It wildly accelerates --patch-from
compression, usually doubling pace on massive recordsdata (~1GB), typically much more relying on precise situation.
This pace enchancment comes at a slight regression in compression ratio, and is due to this fact not enabled for very excessive compression methods (similar to >= ZSTD_btultra
), with a purpose to protect their greater compression ratios.
Pace enhancements of middle-level compression for particular situations
The row-hash match finder launched in model 1.5.0 for ranges 5-12 has been improved in model 1.5.5, enhancing its pace in particular corner-case situations.
The primary optimization (#3426) accelerates streaming compression utilizing ZSTD_compressStream
on small inputs by eradicating an costly desk initialization step. This ends in exceptional pace will increase for very small inputs.
The next situation measures compression pace of ZSTD_compressStream
at stage 9 for various pattern sizes on a linux platform working an i7-9700k cpu.
pattern measurement | v1.5.4 (MB/s) |
v1.5.5 (MB/s) |
enchancment |
---|---|---|---|
100 | 1.4 | 44.8 | x32 |
200 | 2.8 | 44.9 | x16 |
500 | 6.5 | 60.0 | x9.2 |
1K | 12.4 | 70.0 | x5.6 |
2K | 25.0 | 111.3 | x4.4 |
4K | 44.4 | 139.4 | x3.2 |
… | … | … | |
1M | 97.5 | 99.4 | +2% |
The second optimization (#3552) accelerates compression of incompressible knowledge by a big multiplier. That is achieved by rising the step measurement and lowering the frequency of matching when no matches are discovered, with negligible influence on the compression ratio. It makes mid-level compression primarily cheap when processing incompressible knowledge, usually, already compressed knowledge (word: this was already the case for quick compression ranges).
The next situation measures compression pace of ZSTD_compress
compiled with gcc-9
for a ~10MB incompressible pattern on a linux platform working an i7-9700k cpu.
stage | v1.5.4 (MB/s) |
v1.5.5 (MB/s) |
enchancment |
---|---|---|---|
3 | 3500 | 3500 | not a row-hash stage (management) |
5 | 400 | 2500 | x6.2 |
7 | 380 | 2200 | x5.8 |
9 | 176 | 1880 | x10 |
11 | 67 | 1130 | x16 |
13 | 89 | 89 | not a row-hash stage (management) |
Miscellaneous
There are different welcome pace enhancements on this package deal.
For instance, @felixhandte managed to extend processing pace of small recordsdata by rigorously lowering the nb of system calls (#3479). This may simply translate into +10% pace when processing quite a lot of small recordsdata in batch.
The Seekable format acquired a little bit of care. It is now a lot quicker when splitting knowledge into very small blocks (#3544). In an excessive situation reported by @P-E-Meunier, it improves processing pace by x90. Even for extra “widespread” settings, similar to utilizing 4KB blocks on some “usually” compressible knowledge like enwik
, it nonetheless offers a wholesome x2 processing pace profit. Furthermore, @dloidolt merged an optimization that reduces the nb of I/O
search()
occasions throughout reads (decompression), which can be helpful for pace.
The discharge will not be restricted to hurry enhancements, a number of free ends and nook instances had been additionally fastened on this launch. Although, for a extra detailed listing of adjustments, I’ll invite you to check out the changelog.
Change Log
- repair: repair uncommon corruption bug affecting the excessive compression mode, reported by @danlark1 (#3517, @terrelln)
- perf: enhance mid-level compression pace (#3529, #3533, #3543, @yoniko and #3552, @terrelln)
- lib: deprecated bufferless block-level API (#3534) by @terrelln
- cli: mmap massive dictionaries to save lots of reminiscence, by @daniellerozenblit
- cli: enhance pace of –patch-from mode (~+50%) (#3545) by @daniellerozenblit
- cli: enhance i/o pace (~+10%) when processing a lot of small recordsdata (#3479) by @felixhandte
- cli: zstd not crashes when requested to put in writing into write-protected listing (#3541) by @felixhandte
- cli: repair decompression into block gadget utilizing -o (#3584, @Cyan4973) reported by @georgmu
- construct: repair zstd CLI compiled with lzma help however not zlib help (#3494) by @Hello71
- construct: repair cmake does not require 3.18 as minimal model (#3510) by @kou
- construct: repair MSVC+ClangCL linking subject (#3569) by @tru
- construct: repair zstd-dll, model of zstd CLI that hyperlinks to the dynamic library (#3496) by @yoniko
- construct: repair MSVC warnings (#3495) by @embg
- doc: up to date zstd specification to make clear nook instances, by @Cyan4973
- doc: doc how one can create fats binaries for macos (#3568) by @rickmark
- misc: enhance seekable format ingestion pace (~+100%) for very small chunk sizes (#3544) by @Cyan4973
- misc: assessments/fullbench can benchmark a number of recordsdata (#3516) by @dloidolt
Full change listing (auto-generated)
New Contributors
Full Changelog: v1.5.4...v1.5.5