Now Reading
How we utilized superior fuzzing methods to cURL

How we utilized superior fuzzing methods to cURL

2024-03-01 09:09:54

By Shaun Mirani

Close to the tip of 2022, Path of Bits was employed by the Open Supply Expertise Enchancment Fund (OSTIF) to perform a security assessment of the cURL file switch command-line utility and its library, libcurl. The scope of our engagement included a code assessment, a risk mannequin, and the topic of this weblog put up: an engineering effort to research and enhance cURL’s fuzzing code.

We’ll focus on a number of parts of this course of, together with how we recognized necessary areas of the codebase missing protection, after which modified the fuzzing code to hit these missed areas. For instance, by setting sure libcurl choices throughout fuzzer initialization and introducing new seed information, we doubled the road protection of the HTTP Strict Transport Safety (HSTS) dealing with code and quintupled it for the Alt-Svc header. We additionally expanded the set of fuzzed protocols to incorporate WebSocket and enabled the fuzzing of many new libcurl choices. We’ll conclude this put up by explaining some extra refined fuzzing methods the cURL workforce may undertake to extend protection even additional, convey fuzzing to the cURL command line, and scale back inefficiencies intrinsic to the present check case format.

How is cURL fuzzed?

OSS-Fuzz, a free service offered by Google for open-source initiatives, serves as the continual fuzzing infrastructure for cURL. It helps C/C++, Rust, Go, Python, and Java codebases, and makes use of the coverage-guided libFuzzer, AFL++, and Honggfuzz fuzzing engines. OSS-Fuzz adopted cURL on July 1, 2017, and the integrated code lives within the curl-fuzzer repository on GitHub, which was our focus for this a part of the engagement.

The repository incorporates the code (setup scripts, check case turbines, harnesses, and many others.) and corpora (the units of preliminary check circumstances) wanted to fuzz cURL and libcurl. It’s designed to fuzz particular person targets, that are protocols supported by libcurl, comparable to HTTP(S), WebSocket, and FTP. curl-fuzzer downloads the most recent copy of cURL and its dependencies, compiles them, and builds binaries for these targets in opposition to them.

Every goal takes a specifically structured enter file, processes it utilizing the suitable calls to libcurl, and exits. Related to every goal is a corpus listing that incorporates fascinating seed information for the protocol to be fuzzed. These information are structured utilizing a customized type-length-value (TLV) format that encodes not solely the uncooked protocol knowledge, but additionally particular fields and metadata for the protocol. For instance, the fuzzer for the HTTP protocol consists of choices for the model of the protocol, customized headers, and whether or not libcurl ought to comply with redirects.

First impressions: HSTS and Alt-Svc

We’d been tasked with analyzing and bettering the fuzzer’s protection of libcurl, the library offering curl’s internals. The apparent first query that got here to thoughts was: what does the present protection appear like? To reply this, we needed to peek on the newest protection knowledge given within the reviews periodically generated by OSS-Fuzz. After some poking round on the URL for the publicly accessible oss-fuzz-coverage Google Cloud Storage bucket, we have been capable of finding the protection reviews for cURL (for future reference, you will get there by means of the OSS-Fuzz introspector page). Here’s a report from September 28, 2022, initially of our engagement.

Studying the report, we rapidly observed that a number of supply information have been receiving nearly no protection, together with some information that carried out security measures or have been liable for dealing with untrusted knowledge. As an example, hsts.c, which supplies capabilities for parsing and dealing with the Strict-Transport-Security response header, had solely 4.46% line protection, 18.75% operate protection, and a couple of.56% area protection after over 5 years on OSS-Fuzz:

The file liable for processing the Alt-Svc response header, altsvc.c, was equally coverage-deficient:

An investigation of the fuzzing code revealed why these numbers have been so low. The primary drawback was that the corpora listing was lacking check circumstances that included the Strict-Transport-Safety and Alt-Svc headers, which meant there was no manner for the fuzzer to rapidly soar into testing these areas of the codebase for bugs; it must use protection suggestions to assemble these check circumstances by itself, which is often a gradual(er) course of.

The second concern was that the fuzzer by no means set the CURLOPT_HSTS possibility, which instructs libcurl to make use of an HSTS cache file. Consequently, HSTS was by no means enabled throughout runs of the fuzzer, and most code paths in hsts.c have been by no means hit.

The ultimate obstacle to attaining good protection of HSTS was a problem with its specification, which tells consumer brokers to disregard the Strict-Transport-Safety header when despatched over unencrypted HTTP. Nevertheless, this creates an issue within the context of fuzzing: from the attitude of our fuzzing goal, which by no means stood up an precise TLS connection, each connection was unencrypted, and Strict-Transport-Safety was all the time ignored. For Alt-Svc, libcurl already included a workaround to calm down the HTTPS requirement for debug builds when a sure setting variable was set (though curl-fuzzer didn’t set this variable). So, resolving this concern was only a matter of including the same function for HSTS to libcurl and making certain that curl-fuzzer set all crucial setting variables.

Our adjustments to deal with these points have been as follows:

  1. We added seed information for Strict-Transport-Safety and Alt-Svc to curl-fuzzer (ee7fad2).
  2. We enabled CURLOPT_HSTS in curl-fuzzer (0dc42e4).
  3. We added a verify to permit debug builds of libcurl to bypass the HTTPS restriction for HSTS when the CURL_HSTS_HTTP setting variable is about, and we set the CURL_HSTS_HTTP and CURL_ALTSVC_HTTP setting variables in curl-fuzzer (6efb6b1 and 937597c).

The day after our adjustments have been merged upstream, OSS-Fuzz reported a major bump in protection for each information:

A little bit over a yr of fuzzing later (on January 29, 2024), our three fixes had doubled the road protection for hsts.c and practically quintupled it for altsvc.c:

Sowing the seeds of bugs

Exploring curl-fuzzer additional, we noticed quite a lot of different alternatives to spice up protection. One low-hanging fruit we noticed was the set of seed information discovered within the corpora listing. Whereas libcurl helps quite a few protocols (some of which stunned us!) and options, not all of them have been represented as seed information within the corpora. That is necessary: as we alluded to earlier, a complete set of preliminary check circumstances, referring to as a lot main performance as potential, acts as a shortcut to attaining protection and considerably cuts down on the time spent fuzzing earlier than bugs are discovered.

The performance we created new seed information for, with the hope of selling new protection, included (ee7fad2):

  • CURLOPT_LOGIN_OPTIONS: Units protocol-specific login choices for IMAP, LDAP, POP3, and SMTP
  • CURLOPT_XOAUTH2_BEARER: Specifies an OAuth 2.0 Bearer Entry Token to make use of with HTTP, IMAP, LDAP, POP3, and SMTP servers
  • CURLOPT_USERPWD: Specifies a username and password to make use of for authentication
  • CURLOPT_USERAGENT: Specifies the worth of the Consumer-Agent header
  • CURLOPT_SSH_HOST_PUBLIC_KEY_SHA256: Units the anticipated SHA256 hash of the distant server for an SSH connection
  • CURLOPT_HTTPPOST: Units POST request knowledge. curl-fuzzer had been utilizing solely the CURLOPT_MIMEPOST possibility to attain this, whereas the same however deprecated CURLOPT_HTTPPOST possibility wasn’t exercised. We additionally added assist for this older technique.

Sure different CURLOPTs, as with CURLOPT_HSTS within the earlier part, made extra sense to set globally within the fuzzer’s initialization operate. These included:

  • CURLOPT_COOKIEFILE: Factors to a filename to learn cookies from. It additionally allows fuzzing of the cookie engine, which parses cookies from responses and consists of them in future requests.
  • CURLOPT_COOKIEJAR: Permits fuzzing the code liable for saving in-memory cookies to a file
  • CURLOPT_CRLFILE: Specifies the certificates revocation listing file to learn for TLS connections

The place to go from right here

As we began to know extra about curl-fuzzer’s internals, we drew up a number of strategic suggestions to enhance the fuzzer’s efficacy that the timeline of our engagement didn’t permit us to implement ourselves. We offered these suggestions to the cURL workforce in our final report, and broaden on a number of of them beneath.

Dictionaries

Dictionaries are a function of libFuzzer that may be particularly helpful for the text-based protocols spoken by libcurl. The dictionary for a protocol is a file enumerating the strings which can be fascinating within the context of the protocol, comparable to key phrases, delimiters, and escape characters. Offering a dictionary to libFuzzer could enhance its search velocity and result in the sooner discovery of latest bugs.

curl-fuzzer already takes benefit of this function for the HTTP target, however presently provides no dictionaries for the quite a few different protocols supported by libcurl. We advocate that the cURL workforce create dictionaries for these protocols to spice up the fuzzer’s velocity. This can be a very good use case for an LLM; ChatGPT can generate a place to begin dictionary in response to the next immediate (change with the identify of the goal protocol):

See Also

A dictionary can be utilized to information the fuzzer. A dictionary is handed as a file to the fuzzer. The best enter accepted by libFuzzer is an ASCII textual content file the place every line consists of a quoted string. Strings can include escaped byte sequences like "xF7xF8". Optionally, a key-value pair can be utilized like hex_value="xF7xF8" for documentation functions. Feedback are supported by beginning a line with #. Write me an instance dictionary file for a <PROTOCOL> parser.

argv fuzzing

Throughout our first engagement with curl, considered one of us joked, “Have we tried curl AAAAAAAAAA… but?” There turned out to be a number of knowledge behind this quip; it spurred us to fuzz curl’s command-line interface (CLI), which yielded a number of vulnerabilities (see our weblog put up, cURL audit: How a joke led to significant findings).

This CLI fuzzing was carried out utilizing AFL++’s argv-fuzz-inl.h header file. The header defines macros that permit a goal program to construct the argv array containing command-line arguments from fuzzer-provided knowledge on customary enter. We advocate that the cURL workforce use this function from AFL++ to repeatedly fuzz cURL’s CLI (implementation particulars could be discovered within the weblog put up linked above).

Construction-aware fuzzing

Certainly one of curl-fuzzer’s weaknesses is intrinsic to the way in which it presently buildings its inputs, which is with a customized Sort-length-value (TLV) format. A TLV scheme (or one thing related) could be helpful for fuzzing a undertaking like libcurl, which helps a wealth of worldwide and protocol-specific choices and parameters that have to be encoded in check circumstances.

Nevertheless, the brittleness of this binary format makes the fuzzer inefficient. It’s because libFuzzer has no thought concerning the construction that inputs are supposed to stick to. curl-fuzzer expects enter knowledge in a strict format: a 2-byte area for the report kind (of which solely 52 have been legitimate on the time of our engagement), a 4-byte area for the size of the information, and at last the information itself. As a result of libFuzzer doesn’t take this format into consideration, many of the mutations it generates wind up being invalid on the TLV-unpacking stage and have to be thrown out. Google’s fuzzing steering warns about utilizing TLV inputs because of this.

Consequently, the protection suggestions used to information mutations towards fascinating code paths performs a lot worse than it might if we dealt solely with uncooked knowledge. In actual fact, libcurl could include bugs that may by no means be discovered with the present naive TLV technique.

So, how can the cURL workforce deal with this concern whereas retaining the flexibleness of a TLV format? Enter structure-aware fuzzing.

The thought with structure-aware fuzzing is to help libFuzzer by writing a customized mutator. At a excessive stage, the customized mutator’s job contains simply three steps:

  1. Attempt to unpack the enter knowledge coming from libFuzzer as a TLV.
  2. If the information can’t be parsed into a legitimate TLV, as a substitute of throwing it away, return a syntactically right dummy TLV. This may be something, so long as it may be efficiently unpacked.
  3. If the information does represent a legitimate TLV, mutate the fields parsed out in step 1 by calling the LLVMFuzzerMutate operate. Then, serialize the mutated fields and return the resultant TLV.

With this method, no time is wasted discarding inputs as a result of each enter is legitimate; the mutator solely ever creates accurately structured TLVs. Performing mutations on the stage of the decoded knowledge (reasonably than on the stage of the encoding scheme) permits higher protection suggestions, which results in a sooner and more practical fuzzer.

An open issue on curl-fuzzer proposes a number of adjustments, together with an implementation of structure-aware fuzzing, however there hasn’t been any motion on it since 2019. We strongly advocate that the cURL workforce revisit the topic, because it has the potential to considerably enhance the fuzzer’s potential to seek out bugs.

Our 2023 follow-up

On the finish of 2023, we had the prospect to revisit cURL and its fuzzing code in one other audit supported by OSTIF. Keep tuned for the highlights of our follow-up work in a future weblog put up.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top