Now Reading
Making a PDF that’s bigger than Germany – alexwlchan

Making a PDF that’s bigger than Germany – alexwlchan

2024-01-31 16:47:47

I used to be searching social media this morning, and I noticed a declare I’ve seen go previous a number of occasions now – that there’s a most dimension for a PDF doc:

Some model of this has been floating across the Web since 2007, in all probability earlier. This tweet is fairly emblematic of posts about this declare: it’s acknowledged as pure truth, with no supporting proof or rationalization. We’re meant to only settle for {that a} single PDF can solely cowl about half the realm of Germany, and we’re not given any cause why 381 kilometres is the magic restrict.

I began questioning: has anyone made a PDF this huge? How laborious wouldn’t it be? Are you able to make a PDF that’s even greater?

A couple of years in the past I did some silly noodling into PostScript, the precursor to PDF, and it was lots of enjoyable. I’ve by no means truly dived into the internals of PDF, and this looks like a very good alternative.

Let’s dig in.

The place does the declare come from?

These posts are sometimes accompanied by a “properly, truly” the place individuals within the replies clarify this can be a limitation of a selected PDF reader app, not a limitation of PDF itself. They often hyperlink to one thing like the Wikipedia article for PDF, which explains:

Web page dimensions usually are not restricted by the format itself. Nonetheless, Adobe Acrobat imposes a restrict of 15 million by 15 million inches, or 225 trillion in2 (145,161 km2).[2]

In the event you observe the reference hyperlink, you discover the specification for PDF 1.7, the place an appendix merchandise explains in additional element (emphasis mine):

In PDF variations sooner than PDF 1.6, the dimensions of the default consumer house unit is mounted at 1/72 inch. In Acrobat viewers sooner than model 4.0, the minimal allowed web page dimension is 72 by 72 models in default consumer house (1 by 1 inch); the utmost is 3240 by 3240 models (45 by 45 inches). In Acrobat variations 5.0 and later, the minimal allowed web page dimension is 3 by 3 models (roughly 0.04 by 0.04 inch); the utmost is 14,400 by 14,400 models (200 by 200 inches).

Starting with PDF 1.6, the dimensions of the default consumer house unit could also be set with the UserUnit entry of the web page dictionary. Acrobat 7.0 helps a most UserUnit worth of 75,000, which provides a most web page dimension of 15,000,000 inches (14,400 * 75,000 * 1 ⁄ 72). The minimal UserUnit worth is 1.0 (the default).

15 million inches is strictly 381 kilometres, matching the quantity within the authentic tweet. And though this restrict first appeared in PDF 1.6, it’s “model 7” of Adobe Acrobat. That is in all probability the place the unique declare comes from.

What if we make a PDF that exceeds these “most” values?

The inside construction of PDFs

I’ve by no means dived into the internals of a PDF doc – I’ve often glimpsed some bits in a hex editor, however I’ve by no means actually understood how they work. If I’m going to be futzing round for enjoyable, this can be a good alternative to discover ways to edit the PDF instantly, somewhat than going via a library.

I discovered a good article which explains the interior construction of a PDF, and mixed with asking ChatGPT a number of questions, I used to be capable of get sufficient to write down some easy information by hand.

I do know that PDFs help an enormous variety of options, so that is in all probability a gross oversimplification, however that is the psychological image I created:

%PDF-1.6 objects object 1 object 2 object N xref trailer startxref %%EOF tbc

The beginning and finish of a PDF file are at all times the identical: a model quantity (%PDF-1.6) and an end-of-file marker (%%EOF).

After the model quantity comes an extended record of objects. There are many forms of objects, for all the assorted issues you’ll find in a PDF, together with the pages, the textual content, and the graphics.

After that record comes the xref or cross-reference desk, which is a lookup desk for the objects. It factors to all of the objects within the file: it tells you that object 1 is 10 bytes after the beginning, object 2 is after 20 bytes, object 3 is after 30 bytes, and so forth. By this desk, a PDF studying app is aware of what number of objects there are within the file, and the place to search out them.

The trailer comprises some metadata in regards to the total doc, just like the variety of pages and whether or not it’s encrypted.

Lastly, the startxref worth is a pointer to the beginning of the xref desk. That is the place a PDF studying app begins: it really works from the top of the file till it finds the startxref worth, then it could go and skim the xref desk and find out about all of the objects.

With this information, I used to be capable of write my first PDF by hand. In the event you save this code right into a file named myexample.pdf, it ought to open and present a web page with a purple sq. in a PDF studying app:


% The primary object.  The beginning of each object is marked by:
%     <object quantity> <technology quantity> obj
% (The technology quantity is used for versioning, and is often 0.)
% That is object 1, so it begins as `1 0 obj`.  The second object will
% begin with `2 0 obj`, then `3 0 obj`, and so forth.  The top of every object
% is marked by `endobj`.
% It is a "stream" object that attracts a form.  First I specify the
% size of the stream (54 bytes).  Then I choose a color as an
% RGB worth (`1 0 0 RG` = purple), then I set a line width (`5 w`) and
% lastly I give it a sequence of coordinates for drawing the sq.:
%     (100, 100) ----> (200, 100)
%                          |
%     [s = start]          |
%         ^                |
%         |                |
%         |                v
%     (100, 200) <---- (200, 200)
1 0 obj
	/Size 54
1 0 0 RG
5 w
100 100 m
200 100 l
200 200 l
100 200 l

% The second object.
% It is a "Web page" object that defines a single web page.  It comprises a
% single object: object 1, the purple sq..  That is the road `1 0 R`.
% The "R" means "Reference", and `1 0 R` is saying "take a look at object no 1
% with technology quantity 0" -- and object 1 is the purple sq..
% It additionally factors to a "Pages" object that comprises the details about
% all of the pages within the PDF -- that is the reference `3 0 R`.
2 0 obj
	/Sort /Web page
	/Dad or mum 3 0 R
	/MediaBox [0 0 300 300]
	/Contents 1 0 R

% The third object.
% It is a "Pages" object that comprises details about the totally different
% pages.  The `2 0 R` is reference to the "Web page" object, outlined above.
3 0 obj
	/Sort /Pages
	/Children [2 0 R ]
	/Depend 1

% The fourth object.
% It is a "Catalog" object that gives the principle construction of the PDF.
% It factors to a "Pages" object that comprises details about the
% totally different pages -- that is the reference `3 0 R`.
4 0 obj
	/Sort /Catalog
	/Pages 3 0 R

% The xref desk.  It is a lookup desk for all of the objects.
% I am not solely certain what the primary entry is for, but it surely appears to be
% necessary.  The remaining entries correspond to the objects I created.
0 4
0000000000 65535 f
0000000851 00000 n
0000001396 00000 n
0000001655 00000 n
0000001934 00000 n

% The trailer.  This comprises some metadata in regards to the PDF.  Right here there
% are two entries, which inform us that:
%   - There are 4 entries within the `xref` desk.
%   - The foundation of the doc is object 4 (the "Catalog" object)
	/Measurement 4
	/Root 4 0 R

% The startxref marker tells us that we will discover the xref desk 2196 bytes
% after the beginning of the file.

% The top-of-file marker.

I performed with this file for some time, simply doing easy issues like including additional shapes, altering how the shapes appeared, and placing totally different shapes on totally different pages. I attempted for some time to get textual content working, however that was a bit past me.

It rapidly turned obvious why no person writes PDFs by hand – it obtained very fiddly to redo all of the lookup tables! However I’m glad I did it; manipulating all of the PDF objects and their references actually helped me really feel like I perceive the essential mannequin of PDFs. I opened some “actual” PDFs created by different apps, and so they have many extra objects and forms of object – however now I might not less than observe a few of what’s occurring.

With this newfound means to edit PDFs by hand, how can I create monstrously huge ones?

Altering the web page dimension: /MediaBox and /UserUnit

Inside a PDF, the dimensions of every web page is ready on the person “Web page” objects – this enables totally different pages to be totally different sizes. We’ve already seen this as soon as:

	/Sort /Web page
	/Dad or mum 3 0 R
	/MediaBox [0 0 300 300]
	/Contents 1 0 R

Right here, the MediaBox is setting the width and top of the web page – on this case, a sq. of 300 × 300 models. The default unit dimension is 1/72 inch, so the web page is 300 × 72 = 4.17 inches. And certainly, if I open this PDF in Adobe Acrobat, that’s what it experiences:

See Also

Screenshot of Acrobat’s ‘Document Properties’ panel, showing the page size of 4.17 x 4.17 in.

By altering the MediaBox worth, we will make the web page greater. For instance, if we modify the worth to 600 600, Acrobat says it’s now 8.33 x 8.33 in. Good!

We are able to improve all of it the way in which to 14400 14400, the max allowed by Acrobat, after which it says the web page is now 200.00 x 200.00in. (You get a warning when you attempt to push previous that restrict.)

However 200 inches is much in need of 381 kilometres – and that’s as a result of we’re utilizing the default unit of 1/72 inch. We are able to improve the unit dimension by including a /UserUnit worth. For exaple, setting the worth to 2 will double the web page in each dimensions:

	/Sort /Web page
	/Dad or mum 3 0 R
	/MediaBox [0 0 14400 14400]
	/UserUnit 2
	/Contents 1 0 R

And now Acrobat experiences the dimensions of the web page as 400.00 x 400.00 in.

If we crank all of it the way in which as much as the utmost of UserUnit 75000, Acrobat now experiences the dimensions of our web page as 15,000,000,000.00 x 15,000,000,000.00 in – 381 km alongside either side, matching the unique declare. In the event you’re curious, you may download the PDF.

In the event you attempt to create a web page with a bigger dimension, both by rising the MediaBox or UserUnit values, Acrobat simply ignores it. It retains saying that the dimensions of a web page is 15 billion inches, even when the web page metadata says it’s larger. (And when you improve the UserUnit previous 75000, this occurs silently – there’s no warning or error to counsel the dimensions of the web page is being capped.)

This in all probability isn’t a difficulty – I don’t suppose the UserUnit worth is broadly utilized in observe. I discovered one Stack Overflow answer saying as such, and I couldn’t discover any examples of it on-line. The builtin macOS doesn’t even help it – it utterly ignores the worth, and treats all PDFs as if the unit dimension is 1/72 inch.

However in contrast to Acrobat, the Preview app doesn’t have an higher restrict on what we will put in MediaBox. It’s completely pleased for me to write down a width which is a 1 adopted by twelve 0s:

Screenshot of Preview’s Document inspector, showing the page size of 352777777777.78 x 10.59 cm.

In the event you’re curious, that width is roughly the space between the Earth and the Moon. I’d should get my ruler to test, however I’m fairly certain that’s bigger than Germany.

I might maintain going. And I did. Ultimately I ended up with a PDF that Preview claimed is bigger than your entire universe – roughly 37 trillion gentle years sq.. Admittedly it’s principally empty house, however so is the universe. In the event you’d wish to play with that PDF, you may get it here.

Please don’t attempt to print it.

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top