On Internet-Safety and -Insecurity: Insecure Options in PDFs

In 2019, we printed assaults on PDF Signatures and PDF Encryption. Throughout our analysis and finding out the associated work, we found a variety of weblog posts, talks, and papers specializing in malicious PDFs inflicting some harm. Nevertheless, there was no systematic evaluation of all attainable harmful options supported by PDFs, however solely remoted exploits and assault ideas.
We determined to fill this hole and systematize the probabilities to make use of official PDF options and do dangerous stuff. We outline 4 assault classes: Denial of Service, Information Disclosure, Data Manipulation, and Code Execution.
Our analysis reveals 26 of 28 well-liked PDF processing purposes are weak to at the least one assault. You’ll be able to obtain all malicious PDFs here. You may also discover extra technical particulars in our NDSS’21 paper.
It is a joined work of Jens Müller, Dominik Noss, Christian Mainka, Vladislav Mladenov, and Jörg Schwenk.
Harmful Paths: Overview
To determine assault vectors, we systematically surveyed which probably harmful options exist within the PDF specification. We created a
complete checklist with all PDF Actions that may be known as. This checklist comprises 18 totally different actions that we rigorously studied.
enable entry to a file deal with and should due to this fact be abused for harmful
options similar to URL invocation or writing to recordsdata. Having an inventory of
security-sensitive actions, we proceeded by investigating all objects
and associated occasions that may set off these actions.
We recognized 4 PDF objects which permit calling arbitrary actions (Web page, Annotation, Subject, and Catalog). Most objects supply a number of alternate options for this function. For instance, the Catalog object, defines the OpenAction or extra actions (AA) occasions. Every occasion can launch any sequence of PDF actions, for instance, Launch, Thread, and many others. JavaScript actions may be embedded inside paperwork. It opens a brand new space for assaults, for instance, new annotations may be created that may have actions which as soon as once more result in accessing file handles.
Denial of Service
The aim of the denial of service class of assaults is implementing to course of PDF purposes in consuming all obtainable
sources (i.e., computing time or reminiscence) or causes them to crash by opening a specifically crafted PDF doc. We recognized two variants: Infinite Loop and Deflate Bomb.
Infinite Loop
This variant induces an countless loop inflicting this system execution to get caught.
The PDF customary permits numerous components of the doc construction to
reference to themselves, or to different components of the identical sort.
- Motion loop: PDF actions enable to specify a Subsequent motion to be carried out, thereby leading to “motion cycles”.
- ObjStm loop: Object streams might prolong different object streams permits the crafting of a doc with cycles.
- Define loop: PDF paperwork might comprise a top level view. Its entries, nevertheless, can consult with themselves or one another.
- Calculations: PDF defines “Sort 4” calculator capabilities,
for instance, to rework colours. Processing hard-to-solve mathematical formulation might result in excessive calls for of CPU.
- JavaScript: Lastly, in case the PDF software processes scripts inside paperwork, infinite loops may be induced.
Deflate Bomb
Information amplification assaults primarily based on malicious zip archives are well-known. The primary publicly documented DoS assault utilizing a “zip bomb”
was performed in 1996 towards a Fidonet BBS administrator. Nevertheless, not solely zip recordsdata but in addition stream objects inside PDF paperwork may be compressed utilizing numerous algorithms similar to Deflate to scale back the general file measurement.
Info Disclosure
The aim of this class of assaults is to trace the utilization of a doc by
silently invoking a connection to the attacker’s server as soon as the file
is opened, or to leak PDF doc kind information, native recordsdata, or NTLM
credentials to the attacker.
URL Invocation
PDF paperwork that silently “telephone house” needs to be thought of as privacy-invasive. They can be utilized, for instance, to deanonymize reviewers, journalists, or activists behind a shared mailbox. The assault’s aim is to open a backchannel to an attacker-controlled server as soon as the PDF file is opened by the sufferer.
The potential of malicious URI resolving in PDF paperwork has been launched by Hamon [1] who gave an analysis for URI and SubmitForm actions in Acrobat Reader. We prolong their evaluation to all customary PDF options that enable opening a URL, similar to ImportData, Launch, GoToR, and JavaScript.
Kind Information Leakage
Paperwork can comprise types to be stuffed out by the person – a characteristic
launched with PDF model 1.2 in 1996 and used every day for
routine places of work duties, similar to journey authorization or trip
requests. The thought of this assault is as follows: The sufferer downloads a
kind – a PDF doc which comprises kind fields – from an attacker
managed supply and fills it out on the display screen, for instance, so as
to print it. The shape is manipulated by the attacker in such a approach that
it silently ship enter information to the
attacker’s server.
Native File Leakage
The PDF customary defines numerous strategies to embed exterior recordsdata into
a doc or in any other case entry recordsdata on the host’s file system, as
documented under.
- Exterior streams: Paperwork can comprise stream objects (e.g., pictures) to be included from exterior recordsdata on disk.
- Reference XObjects: This characteristic permits a doc to import content material from one other (exterior) PDF doc.
- Open Prepress Interface: Earlier than printing a doc, native recordsdata may be outlined as low-resolution placeholders.
- Types Information Format (FDF): Interactive kind information may be saved in, and auto-imported from, exterior FDF recordsdata.
- JavaScript capabilities: The Adobe JavaScript reference allows paperwork to learn information from or import native recordsdata.
If a malicious doc managed to firstly learn recordsdata from the sufferer’s disk and secondly, ship them again to the attacker, such habits would arguably be vital.
Credential Theft
In 1997, Aaron Spangler posted a vulnerability in Home windows NT on the Bugtraq mailing checklist [2]:
Any shopper program can set off a connection to a rogue SMB server. If the server requests authentication, Home windows will mechanically attempt to log in with a hash of the person’s credentials. Such captured NTLM hashes enable for environment friendly offline cracking and may be re-used by making use of pass-the-hash or relay assaults to authenticate beneath the person’s id. In April 2018, Verify Level Analysis [3] confirmed that related assaults may be carried out with malicious PDF recordsdata. They discovered that the goal of GoToR and GoToE actions may be set to attacker.comdummyfile, thereby leaking credentials within the type of NTLM hashes.
Information Manipulation
This assault class offers with the capabilities of malicious paperwork to silently modify kind information, to put in writing to native recordsdata on the host’s file system, or to indicate a distinct content material primarily based on the applying that’s used to open the doc.
Kind Modification
The thought of this assault is as follows: Just like Kind Information Leakage assaults,
the sufferer obtains a harmlessly trying PDF doc from an attacker
managed supply, for instance, a remittance slip or a tax kind. The aim of the attacker is to dynamically, and with out data of the
sufferer, manipulate kind area information.
File Write Entry
The PDF customary allows paperwork to submit kind information to exterior webservers. Technically the webserver’s
URL is outlined utilizing a PDF File Specification. This ambiguity in the usual could also be interpreted by implementations in such a approach that they permit paperwork to submit PDF kind information to a neighborhood file, thereby writing to this file.
Content material Masking
The aim of this assault is to craft a doc that renders
in a different way, relying on the utilized PDF interpreter. This can be utilized,
for instance, to indicate totally different content material to totally different reviewers, to trick
content material filters (AI-based machines in addition to human content material
moderators), plagiarism detection software program, or search engines like google, which
index a distinct textual content than the one proven to customers when opening the
doc.
- Stream confusion: It’s unclear how content material streams are parsed if their Size worth doesn’t match the offset of the endstream marker, or if syntax errors are launched.
- Object confusion: An object can overlay one other object. The second object might not be processed if it has a replica object quantity,
if it isn’t listed within the XRef desk, or if different structural syntax errors are launched. - Doc confusion: A PDF file can comprise yet one more doc (e.g., as embedded file), a number of XRef tables, and many others., which ends up in ambiguities on the structural stage.
- PDF confusion: Objects earlier than the PDF header or after an EOF marker could also be processed by implementations, introducing ambiguities within the outer doc construction.
Code Execution
The aim of this assault is to execute attacker-controlled code. This may be achieved by silently launching an executable file, embedded inside the doc, to contaminate the host with malware. The PDF specification defines the Launch motion, which permits paperwork to launch arbitrary purposes. The file to be launched can both be specified by a neighborhood path, a community share, a URL, or a file embedded inside the PDF
doc itself.
Analysis
Out of 28 examined purposes, 26 are weak to at the least one assault.