Parsing the .DS_Store file format | Sebastian Neef
About two years in the past I got here throughout a .DS_Store
file and needed to extract its info (e.g. file names). After researching the file format and its safety implications, in addition to writing a parser for it, I wish to share my (restricted) data and the parser in Go / Python with the world.
You may need to proceed studying if you’re excited about how the file works and the way it helped to reveal a number of .sql
/ .db
/ .swp
/ .tgz
recordsdata on web sites from the Alexa High 1M.
Allow us to start with a small introduction to this blogpost and the way I acquired to have a look at a file format from Apple. If you wish to straight bounce into the technical stuff, then skip to the subsequent part.
Whereas conducting reasearch centered on delicate recordsdata on webservers about two years in the past, I got here throughout a file referred to as .DS_Store
. Again then I wrote and used a device in Go to scan the Alexa High 1M for various safety points, however there have been no parsers in that language for the .DS_Store
file format. I discovered one in Perl and a pair in Python, however none of them labored correctly or might reliably parse the set of recordsdata that I obtained. Moreover, I needed to name a category/operate that extracts the fascinating info straight from Go with out having to make use of exterior packages.
Due to this fact, I assumed that re-implementing a parser for this file format in Go can be a pleasant excercise and studying expertise, as a result of I used to be nonetheless fairly new to that language. The ensuing code ended up on GitHub: Gehaxelt – Go DS_Store (Don’t have a look at this if you understand how to write down Go 😉 ). Sadly, I did not remark the code a lot throughout improvement, however it will definitely labored™ 🙂
On the 34C3 convention in Leipzig final yr, a colleague and me determined to meet up with the analysis of this file format once more. We completed it by now and I felt like I ought to share my data with the remainder of the world. Being unable to completely perceive the code that I had written two years in the past and due to this fact clarify the file format, I began to dig into the small print once more and determined to re-implement the parser in Python!
I’m nonetheless missing some tiny components of the specification/particulars that I managed to know some years in the past, however the brand new parser ought to have the identical performance and I’ll attempt to give an introduction to the file format.
Earlier than we begin with the parsing of a .DS_Store
file, let me inform you a bit about it. You may need acquired the (hidden) file on an USB stick from a colleague with MacOs or seen it some place else. Apple’s working system creates this file in apparently all directories to retailer meta details about its contents. In truth, it incorporates the names of all files (and also directories) in that folder. The equal on Microsoft Home windows is perhaps thought-about the desktop.ini
or Thumbs.db
.
Resulting from the truth that .DS_Store
is prefixed with a dot, it’s hidden from MacOs’ Finder, so Mac-users may not pay attention to its existence. Moreover, the file format is proprietary and never a lot documentation about it’s accessible on-line.
I’m not the primary to write down a parser for this type of file, so I don’t need to declare this, however writing a parser for it was a very good studying expertise. The next assets helped me to study and perceive its format:
I like to recommend studying all three of them to get a tough understanding of the file earlier than persevering with!
Anyway, let’s begin: I’ll use an instance .DS_Store
file to elucidate its construction. The parsers use the same methodology to course of the file.
The file is in big-endian format and begins with a header of 36 bytes:
The primary 4-byte integer is at all times 0x01
and apparently used as an alignment, and that is why different references outline the header to be 32 bytes after that. Anyway, the 4 blue bytes are the magic bytes (0x42756431
).
The 2 pink blocks are a 4-byte integer (0x1000
) defining the place (offset) within the file of a root block that incorporates details about different items that we are going to parse later. Each offset values need to have the identical worth or the file needs to be thought-about invalid. In between is the inexperienced 4-byte integer (0x800
) indicating the scale of the earlier than talked about root block.
The remaining gray 16 bytes aren’t reversed but and regarded unknown information
, so the parser can merely skip it.
Root block
Now that the essential details about the foundation block is in our fingers, we are able to concentrate on its contents between 0x1004
and 0x1804
. Be aware that we use the beforehand obtained place 0x1000
with a further block-alignment of 0x04
.
The details about the file names is saved in a tree-like construction the place the foundation block incorporates vital metadata in regards to the tree and its different blocks. Normally, the metadata may be break up in three completely different sections:
- Offsets
- Tables of content material
- Free checklist
Offsets
The offsets part incorporates details about the offsets of the tree’s (leaf) blocks within the file. These blocks retailer the precise info like file names and many others. and the offsets are wanted to traverse the tree.
The blue integer (0x03
) tells us what number of offsets we have to learn after we skipped one other 4 gray bytes that seem to at all times be zero. The next twelve inexperienced bytes are the three 4-bytes integers that needs to be added to an offsets
checklist:
0x0000100B
0x00000045
0x00000209
The order is vital, as a result of we are going to later entry the values by their index within the checklist. These offsets are the tree’s block positions within the file. The remainder of the part is padded with zeroes (pink bytes) and the padding is aligned to go as much as the subsequent a number of of 256 entries (1024 bytes). In our case the padding goes as much as 0x140c
, as a result of it equals 0x1000
+ 3*4 bytes for the three integers (skipped/depend/skipped) + 3*4 bytes for the three offsets + (256 entries – 3 entries)*4 bytes of padding.
Due to this fact, the subsequent part will begin at 0x140c
.
Tables of content material
After the offsets, the tables of content material part follows. It normally incorporates no less than one desk named DSDB
with the worth 0x01
. This specific desk references the primary block’s id that we are going to traverse.
The pink bytes are the padding from the offsets part and the TOC begins at 0x140c
with 4 blue bytes representing the depend of TOCs to parse. In our case that is just one (0x01
).
It’s adopted by a single inexperienced byte indicating the TOC title’s size which is 0x04
. The TOC’s title may be retrieved by the yellow marked bytes as an ASCII string. After the title, the purple 4-bytes integer is the TOC’s worth.
It is suggested to retailer the TOC in a dictionary, in order that we are able to question it later:
Free checklist
The final part is the free checklist, the place unused or free blocks of the tree may be saved. In follow, I have not used any values of that checklist to retrieve the file names, however it is perhaps helpful somewhen else.
It consists of n=0..31
buckets with the dictionary’s key being 2^n
.
In our instance the free checklist begins at 0x1419
. For every bucket a blue 4-byte integer is learn. This integer then represents quantity of offsets that we have to learn.
From the hexdump above we see that the primary 5 buckets from 0x1419
to 0x142d
have zero components. The sixth bucket after 0x142d
has a price of 0x02
and due to this fact two components 0x00000020
and 0x00000060
.
After the entire iteration of the loop, the ensuing free checklist ought to look just like this:
{
1: [],
2: [],
4: [],
8: [],
16: [],
32: [32, 96],
64: [],
128: [128],
256: [256],
512: [],
1024: [1024],
2048: [2048, 6144],
4096: [],
8192: [8192],
16384: [16384],
32768: [32768],
65536: [65536],
131072: [131072],
262144: [262144],
524288: [524288],
1048576: [1048576],
2097152: [2097152],
4194304: [4194304],
8388608: [8388608],
16777216: [16777216],
33554432: [33554432],
67108864: [67108864],
134217728: [134217728],
268435456: [268435456],
536870912: [536870912],
1073741824: [1073741824],
2147483648: []
}
After parsing all three sections, we’re finished with the foundation block and might proceed with the tree.
Tree
As I stated earlier, the data is organized in a tree-like construction. This tree must be traversed to acquire the file names or different info saved within the .DS_Store
file.
Block IDs and offsets
I defined that the TOC incorporates the block id and particularly, the DSDB
TOC references the primary block by its ID that we are going to traverse. In our instance the ID was 0x01
.
We use the ID because the index to our beforehand computed offsets
checklist to acquire an handle: offsets[0x01] => 0x00000045
Nonetheless, we can not merely use the info on the location of 0x00000045
, as a result of the true offset and dimension of the block is encoded inside this worth:
2^okay
, with okay being the 5 least-significant bits, is the block’s dimension. It should not be decrease than 32 bytes.- it turns into the block’s offset when the 5 bits are set to zero.
With our instance handle 0x00000045
and a few bit operation magic, we get the next outcomes:
- offset:
int(0x00000045) >> 0x5 << 0x5
=0x40
- dimension:
1 << (int(0x00000045) & 0x1f)
=0x20
Our instance block with ID 0x01
will due to this fact begin at 0x40
+0x4
= 0x44
and be 0x20
bytes lengthy.
Traversing the tree
To get the file names, we have to traverse the tree from its root block. As beforehand described, it’s referenced by the block ID within the DSDB
TOC and begins at 0x44
.
This block incorporates precisely 5 integers of which the pink one is probably the most fascinating one, as a result of it incorporates the block-ID of the primary block with precise information. The opposite integers are:
- inexperienced: Ranges of inside blocks (
0x00
) - yellow: Data within the tree (
0x06
) - blue: Blocks within the tree (
0x01
) - brown: At all times the identical worth (
0x1000
)
Utilizing the parsed block-ID 0x02
we are able to traverse the tree utilizing recursion and extract the file names.
The info block’s handle is offsets[0x02] => 0x00000209
that turns into:
- offset:
int(0x00000209) >> 0x5 << 0x5
=0x200
- dimension:
1 << (int(0x00000209) & 0x1f)
=0x200
Figuring out that the info block will begin at 0x204
within the file, we proceed with the next hexdump:
A block begins with two vital integers:
- pink: Block mode (
0x00
) - inexperienced: Document depend (
0x06
)
If the mode is 0x00
then it’s instantly adopted by depend
information.
In any other case, depend
pairs of next-block-ID|document
comply with, the place the traverse operate may be referred to as recursilvely with the next-block-ID.
Nonetheless, our instance block is in mode 0x00
and due to this fact solely 0x06
information have to be parsed.
A Document
Let’s take a look at how the information inside a block appear like.
A document begins with the size (blue 4-bytes integer) of the next UTF-16 file title of two*size bytes (yellow 4-bytes integer).
After the file title a brown 4-bytes integer structure-ID
(that I am undecided the way it’s used) and a pink 4-byte string structure-type
. Relying on the construction sort, a unique quantity of bytes must be skipped earlier than reaching the top of the present block. An exhaustive checklist of construction varieties may be discovered here.
As soon as the parser finishes, a listing of six file names needs to be the outcome:
- favicon.ico
- flag
- static
- templates
- weak.py
- weak.wsgi
Code
I’m going to share the code that I’ve written over time, however please don’t count on bug-free, excellent code. As I stated to start with, I’m not the primary to attempt to write a parser; the code relies on the work of others and may not be feature-complete. Bugfixes and PRs are at all times welcome!
If you’re courageous sufficient to have a look at it (and even use it!) then listed here are the hyperlinks:
For those who simply need to try to parse a .DS_Store
to see its contents, then you may as well use the webservice that I’m offering right here:
Recognized Points
Whereas creating the code and writing the blogpost, I found some points within the implementation and parsing logic, which I wish to focus on briefly. Perhaps you will discover a repair?
Root block offset
I got here throughout no less than one .DS_Store
file, the place the foundation block’s offset from the preliminary header parsing was off by 4 bytes. This resulted in a incorrectly parsed offsets checklist in addition to TOC. Nonetheless, this gave the impression to be a uncommon occurence and I’m not positive, how or why it occured.
Incorrect file title size
One other challenge that I encountered was that the file title size inside a document had a improper worth. For instance, it appeared as 0x0a
(10 * 2 bytes), however the UTF-16 file title was truly > 20 bytes lengthy. Typically, this resulted in an unmatched construction sort and an error. Setting the right size with an hexeditor normally mounted the problem, however I’m not sure how that clearly improper size made it there.
Nonetheless, I’ve carried out a brute-force like strategy to resolve this challenge: Re-reading the subsequent two bytes of the file title till a identified structure-ID seems. This principally mounted the problem, however it doesn’t really feel prefer it’s the very best strategy.
Till now I solely mentioned the construction and contents of .DS_Store recordsdata, however it is a security-related weblog, and I promised to reply the above query: YES, this file has some safety implications if it is being uploaded to webservers!
Concerning to me, the juicy components of the .DS_Store file are the file names that it incorporates. MacOs creates a .DS_Store file in nearly all folders and you will not even discover it, as a result of it’s prepended with a dot and Finder will not present dot
-files per default. Every file in a listing has an entry within the directorie’s .DS_Store file.
Data disclosure (of delicate recordsdata)
With the Internetwache.org undertaking that I am a part of, we scanned the Alexa High 1M domains for this file of their doc root. It seems that delicate recordsdata are uncovered and probably accesible by the existence of that file. We found file names that indicated the existence of full doc root backups, databases, configuration recordsdata, swap/momentary recordsdata and even non-public keys!
You’ll find an in depth blogpost in regards to the methodology and the outcomes on the internetwache.org’s english blog.
The way to examine and defend your self?
An vital factor that must be clarified is that the file names saved in a .DS_Store file solely signify the contents of a listing on a neighborhood MacOs primarily based system. This, nonetheless, implies that the recordsdata we have now discovered on the web should had been (unknowingly) uploaded by somebody. On the opposite aspect which means that not all file names essentially exist on the server!
The add may occur if…
- you dedicated the file to your model management system (e.g. git/svn/and many others) and pulled the repo’s contents on the server.
- you add the recordsdata utilizing
rsync
/sftp
/and many others with out excluding/eradicating them first. - (the server runs on Mac :D)
For those who really feel like checking your webserver now, I’d suggest working the the next command (on a linux system):
cd /var/www/ #wherever your webserver's doc root is
discover . -type f -iname "*.DS_Store*"
This command searches by all folders within the /var/www
listing for recordsdata which have .DS_Store
of their title and prints them.
For those who discover any recordsdata that aren’t supposed to be there, you is perhaps leaking some file names. You possibly can delete the recordsdata by appending -delete
to the earlier command.
Moreover, you may harden your webserver to disclaim entry to these recordsdata.
Apache
Add the next block to your httpd.conf
:
<Recordsdata ~ ".DS_Store$">
Order permit,deny
Deny from all
</Recordsdata>
Nginx
Put the next strains into your server
block:
location ~ .DS_Store$ {
deny all;
}
Let me end with a brief conclusion in regards to the issues that I (or we?) have discovered from this blogpost. Writing a parser for a (proprietary) file format allowed my to study in regards to the internals of file codecs and the way these may be structured and parsed. Sadly, the .DS_Store format isn’t totally open, so some options are lacking and a few bugs can’t be defined/mounted. One other vital factor that I’ve discovered: Commenting your code is a MUST if you wish to perceive it a few years later. Particularly, when it’s a extra “complicated” piece of software program like a file format parser 😉
Furtermore, I hopefully satisfied you to examine your webserver if there are any .DS_Store recordsdata laying round that may expose some delicate recordsdata. For those who can not discover any, you must nonetheless add a configuration rule to disclaim entry to these recordsdata and examine again together with your builders that these recordsdata is not going to be dedicated or uploaded wherever within the first place!
Final however not least, I wish to thank the individuals who did the onerous reversing half and revealed their notes on-line! In any other case I’d not have managed to have my enjoyable with the file format 🙂
-=-