Now Reading
Intel® QuickAssist Expertise Zstandard Plugin, an Exterior Sequence Producer for Zstandard

Intel® QuickAssist Expertise Zstandard Plugin, an Exterior Sequence Producer for Zstandard

2023-08-16 17:52:56

Posted on behalf of:

Writer: Brian Will

Contributors: David Qian, Abhishek Khade, Joel Schuetze


Zstandard (zstd) is without doubt one of the hottest lossless compression algorithms/codecs in use at the moment on account of its distinctive pace in decompression and compression whereas attaining spectacular compression ratios. It is a very versatile format permitting for adaption to many sorts of knowledge and purposes. At a excessive degree, the algorithm is a two-stage course of. First, the method of discovering matches or repetition within the knowledge that ends in attainable areas of substitute with a extra condensed illustration, particularly a dictionary coder (e.g., LZ77). The output of this stage is a number of sequences, every of which specifies an offset to a match, match size, and doubtlessly a literal size. The second stage is a means of encoding these output sequences utilizing Finite State Entropy encoding (FSE) or Huffman encoding.

The subject for this weblog is a brand new characteristic added to zstd v1.5.4, which permits for an exterior implementation of a sequence producer to be injected into the zstd pipeline. This allows the utilization of Intel® QuickAssist Expertise (Intel® QAT), which may ship as much as 3.2x higher throughput, 3.8x discount in P99 latency, and 3.3x higher efficiency per watt when in comparison with zstd for compression. With these enhancements, it is anticipated that Intel QAT will open new breakthrough use circumstances the place compression can now be leveraged for workloads the place it will not have been possible beforehand.

Intel QAT will likely be an exterior sequence producer for zstd, enhancing efficiency whereas exposing the performance to purposes via the acquainted zstd interface.

Exterior Sequence Producers

An exterior sequence producer searches an enter buffer for matching bytes within the enter set. These matches are represented as an inventory of `ZSTD_Sequesnce`’s capturing data on:

  • distance to matching sequence
  • match size
  • is match a literal
  • rep code data

We’ll additionally cowl particulars of the interfaces to help exterior sequence producers, together with an implementation for Intel QAT.

Zstandard sequence producer registration perform

In zstd v1.5.4, the block-level sequence producer interface was launched; this allowed an exterior ‘plugin’ to be invoked per block of information and reply with a set of sequences (literals and matches) for that knowledge. The interface incorporates a registration perform:

ZSTD_CCtx* cctx,
void* sequenceProducerState,
ZSTD_sequenceProducer_F* sequenceProducer);

The perform produces sequences for zstd and has the next signature:

#outline ZSTD_SEQUENCE_PRODUCER_ERROR ((size_t)(-1))
typedef size_t ZSTD_sequenceProducer_F (
void* sequenceProducerState,
ZSTD_Sequence* outSeqs, size_t outSeqsCapacity,
const void* src, size_t srcSize,
const void* dict, size_t dictSize,
int compressionLevel,
size_t windowSize);

This permits an exterior implementation to register with a ZSTD_CCtx to be known as for every block, to compress and go an opaque state. The state can be utilized as a location for the sequence producer to take care of any transactional data for this occasion.

By offering this registration functionality, the sequence producer is utilized behind the usual zstd interfaces, permitting purposes to proceed utilizing zstd in the identical manner, with interfaces corresponding to `ZSTD_compress2()` or `ZSTD_compressStream2()`. Whereas the API is appropriate with all zstd APIs which respect superior parameters, there are some limitations.

QAT Zstandard Plugin sequence producer API

QAT-ZSTD-Plugin offers acceleration of sequence manufacturing utilizing Intel QAT via the zstd APIs. The producer perform to be registered is offered by qatSequenceProducer.

size_t qatSequenceProducer(
void *sequenceProducerState,
ZSTD_Sequence *outSeqs, size_t outSeqsCapacity,
const void *src, size_t srcSize,
const void *dict, size_t dictSize,
int compressionLevel,
size_t windowSize);

Intel QAT will take the enter knowledge stream and seek for sequences in {hardware}, returning a set of output sequences. These are then processed by the zstd library, encoding and establishing zstd formatted knowledge. Intel QAT will enhance throughput and reduce latency for a set of zstd’s compression ranges. The plugin is simply relevant to compression operations; it doesn’t help decompression. Not all options of the plugin API are currently supported.

Within the context of Intel QAT, the state variable handed in with the API will likely be used for storing particulars of the machine getting used for acceleration and its configuration and capabilities. As such, this state variable is managed externally to zstd via the producer plugin utilizing the capabilities `QZSTD_createSeqProdState` and `QZSTD_freeSeqProdState`.

Intel QAT as a tool must be began/stopped utilizing the capabilities `QZSTD_startQatDevice` and `QZSTD_stopQatDevice` previous to registration of the sequence producer. This move is captured on the plugin repo web page, Integration of Intel QAT sequence producer into an Application.

These are the modifications required to combine the QAT-ZSTD-Plugin into your software. Future iterations will likely be extra clear.


 Purposeful calling sequence to combine Intel® QAT-Zstd Plugin into your software.

Efficiency outcomes

To match efficiency knowledge between a software program implementation and acceleration with Intel QAT, a benchmark utility was developed that submits requests utilizing the `ZSTD_compress2` interface. The utility, QAT-ZSTD plugin benchmark, permits for setting the variety of threads of execution, the block dimension by which to compress the enter file, compression degree, and a number of other different parameters for altering configuration particulars.

For these measurements, the command line used was:

./benchmark -m${mode} -l1 –t${threads} –c${blocksize} –L${compression_level} –E2 <input_file>
  • mode: defines if the operations ought to be run purely in software program “0” or makes use of Intel(r) QAT for acceleration “1”.
  • threads: the variety of pthreads used for submitting compression requests concurrently.
  • blocksize: enter file will likely be chunked into the required dimension and submitted to the zstd API. Enter could be submitted as KBs or MBs, e.g., 16K is 16,384.
  • compression_level: the extent worth handed in on the zstd compression API.
  • input_file: for these benchmarks Silesia Corpus is the enter knowledge set.

QAT ZSTD Plugin Sequence Producer in comparison with zstd-1.5.4

The next measurements have been taken utilizing a block dimension of 16KB with Intel QAT HW configured for its greatest compression ratio.

Intel QAT is delivering as much as 3.2x increased throughput in comparison with zstd compression degree 5 and 2.5x in comparison with zstd degree 4. For the Silesia Corpus, knowledge compression ratios are:

  • QAT-ZSTD degree 9: 2.76
  • zstd degree 4: 2.74
  • zstd degree 5: 2.77

Ratio: is calculated because the enter dimension divided by the output dimension from compression.

Intel QAT reaches a peak efficiency of 11.15 Giga Bytes. If an software requires additional efficiency, zstd software program could be utilized to proceed scaling, all whereas utilizing the identical interface from the applying.


Compression throughput efficiency (MB/s) for 16KB requests, evaluating Zstandard v1.5.5 vs Intel® QAT-Zstd Plugin throughout numerous core combos. 

In case your software is latency delicate, Intel QAT can cut back P99 request latency to ~1/4 that of zstd on this configuration and keep a flat latency. QAT-ZSTD delivers as much as 3.8x decrease P99 latency in comparison with zstd degree 5 and 3.2x in comparison with zstd degree 4.


P99 16KB request latency (microseconds) evaluating Zstandard v1.5.5 vs. Intel® QAT-Zstd Plugin throughout numerous core combos. 

An in depth system configuration is under.

Energy comparability

The financial savings we have now coated with throughput and latency additionally include a power-saving element. Intel QAT acceleration is ready to ship as much as 3.3x higher efficiency per watt when in comparison with zstd degree 5 software program alone.


See Also

Efficiency per Watt comparability for QAT ZSTD plugin and Zstandard v1.5.5 software program for 16KB requests

Information is collected utilizing the identical utility; particulars are captured within the configuration part under.

If we view this from the angle of cores saved


Core financial savings when using Intel® QAT-Zstd Plugin vs. Zstandard v1.5.5 for 16KB request sizes.

This represents a core financial savings of 73% for comparable throughput and compression ratios when utilizing Intel QAT Acceleration. This interprets into acceleration, with Intel QAT taking 90W much less wall energy when in comparison with Zstandard software program. This can be a important financial savings in platform energy whereas offering further cores for different software workloads to make use of.


The addition of the sequence producer interface into zstd permits for the acceleration of one of many extra pricey operations within the compression pipeline, particularly trying to find matching byte strings. Intel QAT is ready to present HW acceleration of this perform, delivering throughput enhancements as much as 3.2x over zstd SW and P99 latency discount of three.8x for a comparable compression ratio. All whereas delivering 3.3x higher efficiency per watt.

Tight integration into zstd permits the applying to entry Intel QAT acceleration whereas programming to the identical zstd interfaces. Functions will be capable to simply entry Intel QAT acceleration with a future path to clear integration.

Intel QAT offers tangible worth for purposes in throughput, latency, and energy resulting in an general Whole Value of Possession profit for purposes requiring compression efficiency. The QAT ZSTD Plugin will proceed including options to enhance compression ratio (dictionary help) and efficiency.

Need to study extra? Please go to the next:


  • Intel® 4xxx (Intel®QuickAssist Expertise (Gen 4)
  • Intel®Xeon® Platinum 8470N Processor
  • Reminiscence configuration:
    • DDR5 4800 MT/s
    • 32 GB * 16 DIMMs
  • QAT Driver:
  • Hyper-Thread enabled
  • OS (Kernel):
  • BIOS:
    • EGSDCRB1.SYS.9409.P01.2211280753
    • SpeedStep (Pstates) disabled
    • Turbo Mode disabled
  • QAT configuration:
    • ServicesEnabled – dc
    • NumberCyInstances – 0
    • NumberDcInstances – 1-64
    • SVM Disabled & ATS Disabled
  • Take a look at file
  • Software program
  • Benchmark device: Included with Intel(r) QAT ZSTD plugin
    • Disable searchForExternalRepcodes
    • Take a look at command: `./benchmark -m${mode} -l1 –t${threads} –c${blocksize} –L${compression_level} –E2`
    • Instance: numactl -C 1-18,105-122 ./take a look at/benchmark -l2 -t36 -L9 -c16K -m1 -E2 silesia.concat

Configuration Particulars:

Take a look at by Intel as of 05/22/23. 1-node, 2x Intel(R) Xeon(R) Platinum 8470N, 52 cores, HT On, Turbo On, Whole Reminiscence 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS EGSDCRB1.SYS.9409.P01.2211280753, microcode 0x2b000161, 2x 223.6G KINGSTON SUV400S37240G, 1x 447.1G INTEL SSDSC2BB480G7, 1x 240M Disk, Ubuntu 22.04.1 LTS, 5.15.0-56-generic, GCC 11.3.0, QAT ZSTD: v1.5.5, QAT-ZSTD-Plugin: v0.0.1, QAT Driver:QAT20.L.1.0.40

Notices and Disclaimers

Efficiency varies by use, configuration, and different components. Be taught extra on the Performance Index site.
Efficiency outcomes are primarily based on testing as of dates proven in configurations and will not replicate all publicly accessible updates. No product or element could be completely safe.

Your prices and outcomes might fluctuate.

Intel applied sciences might require enabled {hardware}, software program, or service activation.

© Intel Company. Intel, the Intel brand, and different Intel marks are logos of Intel Company or its subsidiaries. Different names and types could also be claimed because the property of others.

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top