Making a PDF that's larger than Germany

jl6 · 2024-02-01T07:59:34

PDF is a fabulous format. I mean, it’s an awful format in so many ways, technically speaking, but the net effect of having a self-contained static file in your custody stands in blissful contrast to the user-hostile dynamic/SaaS website that can be taken away at a moment’s notice. PDF/A is the true PDF - it strips out most of the dangerous cruft.

Anyway, if you like weird PDF hijinks, here’s a polyglot PDF/A CSV file that is also its own original soundtrack as a polyglot Amiga soundtracker mod:

https://www.lab6.com/6

ourmandave · 2024-02-01T12:00:52

The "in your custody" part is important, when Amazon starts yoinking books from your account.

https://www.nytimes.com/2009/07/18/technology/companies/18am...

financypants · 2024-02-01T16:29:14

I buy all my books paperback, even if I listen to them on audible, for posterity’s sake.

shiftpgdn · 2024-02-01T17:22:59

Do you buy them after you’ve finished listening?

addminztrator · 2024-02-04T05:32:32

You guys buy books? I only buy the ones I keep coming back to after I've read them through libgen

amelius · 2024-02-01T10:23:38

PDF is an executable file. Many people are worried about running Javascript but still use PDF files without problems.

JKCalhoun · 2024-02-01T13:38:54

For better or worse, the years I spent working on Preview for Apple (and PDFKit) I felt bad that our (Apple's) PDF implementation was far short of Adobe's.

Radars would show up with PDFs attached, "Preview Does Not Display 3D Image in PDF Like Acrobat" or similar. And I would feel so ... inadequate.

PDFKit could render and capture basic annotations ... and that was about it. We could show you forms, allow editing, but if the PDF had Javascript that would add two fields and put the sum in a third field I had to shrug and say, "Oh well." The effort of hoisting a JavaScript interpreter/runtime was beyond my skillset anyway.

But then I kind of came to see our subset of PDF support as a kind of feature. It's true, we left out the kitchen sink. Adobe was/is clearly interested in putting everything into PDF.

And I mean, as pointed out here, at least you could open a PDF in Preview and not worry about any Javascript executing. ;-)

peter_l_downs · 2024-02-01T16:44:48

If it makes you feel any better, Preview is by far the best PDF viewer and editor (I use it for signatures and adding text) I've ever used. I like that the PDF previews in Finder are instant and accurate. I like that it shows as much PDF and as little UI/menubar as possible. I like that it never asks me to upgrade or log in. The search tools work well. I can stitch PDFs together (if I google how to, always forget) and pull certain pages out as their own files.

For all of the PDFs I've ever encountered, Preview has been sufficient and capable. Thank you for your hard work!

JKCalhoun · 2024-02-01T21:29:16

I thought Acrobat had ugly UI — stacks and stacks of toolbars for example (this, BTW, about a decade ago — I haven't launched Reader in some time so can't speak to the current UI).

I met one of the engineers from Adobe and said as much — as politely as I could. He said, yeah, we're modeling our UI on Office.

I saw in an instant that they wanted to be seen as a peer, a co-tool, to the Microsoft suite and it all made sense to me.

B1FF_PSUVM · 2024-02-01T18:11:57

> If it makes you feel any better, Preview is by far the best PDF viewer and editor

Seconded. At least most pleasant to use for most things, and never balked at anything I needed to see, fortunately.

amluto · 2024-02-01T17:56:41

And somehow Acrobat (current paid version from Creative Cloud) is the worst PDF form filing option.

gruturo · 2024-02-01T17:42:07

Thank you, thank you, THANK YOU for not having put all that cruft in, and by Apple's sheer size, effectively discouraging many from producing and circulating those abominations.

Adobe has an awful track record of security (how many exploits in the past 25 years were in Acrobat (not the PDF spec, the actual Acrobat software) and in Flash?) but PDF is an amazing gift to the world, and, thanks to people like you, effectively safer than how Adobe designed it :))

Unfortunately I have the full Acrobat on my work computer, mandated by my employer, sigh, but that's another story.

eirikbakke · 2024-02-01T15:22:05

When I ordered an official PDF copy of my college diploma, the order form had an option to enable "tracking" in the PDF file. Sure enough, when the recipient opened the PDF file (and when I tried it myself on a different machine), I got a notification from the company that generated the PDF...

SpaghettiCthulu · 2024-02-01T16:50:19

That's horrific! I had no idea that was even a feature of PDFs!

layer8 · 2024-02-01T18:18:20

PDFs are roughly on par with web pages feature-wise, including JavaScript or other actions that execute on load. Adobe did this, of course, to stave off the competition from the early web. Nowadays, PDF readers disable most of that by default (if they even support it).

venusenvy47 · 2024-02-01T14:38:44

Is it really executable on the OS? Doesn't it require a native application to run it on an OS?

afiori · 2024-02-01T15:36:57

No, they are not executable by the OS (generally).

Formats are on a gradient between "completely code" and "completely data" and PDFs are quite close to the "completely code" extreme'; I guess this is what the parent meant.

shp0ngle · 2024-02-01T10:27:56

Yeah but the javascript can only do things inside the pdf.

muskypirate · 2024-02-01T12:42:42

It is not about JS. Look into BadPDF as an example.

jk3000 · 2024-02-01T10:55:45

Famous last words :)

berkes · 2024-02-01T10:52:15

What javascript can escape my browser, (edit: or an HTML page) for example?

lukan · 2024-02-01T12:43:47

XMLHttpRequest to send anything the site knows anywhere.

And row hammer, to breach the sandbox.

https://en.wikipedia.org/wiki/Row_hammer

berkes · 2024-02-01T14:39:09

Row hammer is an exploit. It wasn't "by design".

While that may technically be "escaping the sandbox" it's a different case, because it was never meant to work, will be fixed and often is fixed.

afiori · 2024-02-01T15:41:57

Almost every "escaping the sandbox" is due to some kind of bug.

Sure if the PDF standard exposed a "globalThis.runBlobAsNativeExecutable" function it would be worse, but it is still escaping the sandbox.

grotorea · 2024-02-01T14:09:44

Are the non-browser PDF readers more vulnerable? Do most even execute the Javascript?

kevincox · 2024-02-01T15:27:26

I would expect so simply because browsers are fairly hardened pieces of software. Adobe Acrobat is decently hardened but it seems to be far behind browsers.

It is worth noting that Chromium and later Firefox both added PDF viewers that live inside the browser sandbox. They are essentially web-apps that render the PDF. When I worked at Google they strongly recommended using Chrome for opening PDF files because they felt much more comfortable about its security and sandboxing than other PDF readers.

On another perspective is that you are likely browsing the internet anyways. In fact you likely got the PDF by visiting a website. So you have already exposed a huge attack surface (your browser) to a possible hostile adversary. It is better to expose them to the same attack surface again (plus whatever security the PDF reader itself provides) than to give them a fresh new attack surface.

shp0ngle · 2024-02-01T10:26:39

about pdf/a... until recently there was not even an easy way to figure out if pdf is really pdf/a; now there is (verapdf) and it's crazy complex piece of software

and maybe I'm wrong but the only way to convert arbitrary pdf to pdf/a with open source software is to convert it to postscript and back with ghostscript - which is affero licensed... with all the possible problems it entails. (there is old version that is just gpl, works on most pdfs but is 15 years old or such.)

i needed to deal with pdf/a in a previous job... was not fun.

account42 · 2024-02-05T12:45:31

> which is affero licensed... with all the possible problems it entails

And what problems are those? Besides making it harder to leech off the community without giving back of course.

Elzair · 2024-02-01T16:06:16

You could use the pdfium library as an alternative to Ghostscript.

shp0ngle · 2024-02-01T20:58:23

I don't think there is a way to do that with pdfium. I have tried to look at it and failed. Maybe I didn't look hard enough.

rqtwteye · 2024-02-01T13:13:46

“ PDF is a fabulous format”

I will never forgive the pain PDF caused me when I worked on a project to parse millions of PDF files from various sources. Just reconstructing paragraphs was a huge effort not even mentioning parsing tables. I think we should do better for something that’s basically a standard. PDF manuals also suck big time.

JKCalhoun · 2024-02-01T13:31:54

PDF is supposed to a be a printer format, not a word processing document format. While I too would love to nail down a PDF subset to be a standard (for example requiring the accessibility tags that make text extraction easy) perhaps trying to create a hybrid format, one that satisfies both printers and resizable windows, is already an impossible goal.

(I've always had to keep my love of PDF a secret from fellow nerds. But here's another secret, I like printing documents out from time to time.)

da_chicken · 2024-02-01T13:51:21

I really appreciate what PDF can accomplish, but I also really dislike that it turns into a black box. There really ought to be something that can describe a document structure and also describe document layout in a durable and portable manner. In the range of XML/JSON <-> HTML+CSS <-> PDF <-> PS <-> RAW, it really does feel like there's something missing between HTML and PDF.

And it can't be LaTeX, because the document shouldn't be a programming language at all. "The document is a program" has proven itself to be a terrible scheme overall.

layer8 · 2024-02-01T18:22:55

PDF includes optional document structure information. Most PDF creation software chooses to not generate it, though.

JKCalhoun · 2024-02-01T16:24:23

ePub is kind of trying to be that? Or maybe that hews too close to HTML.

It can reflow but tries to paginate HTML ... the way printing a web page tries to paginate HTML, ha ha.

grotorea · 2024-02-01T14:16:16

I wonder a bit if we wouldn't have a easier time extracting data, resizing pages etc if we sent HTML files instead of PDF. Are even half of PDFs printed at all?

niels_bom · 2024-02-03T20:27:19

Did you go the “display it, then OCR what’s displayed” route as a last ditch effort?

martin_a · 2024-02-01T09:08:22

> PDF/A is the true PDF

As someone working in the graphic industry, I'd say PDF/X is the true PDF, but ymmv. :-)

throwaway290 · 2024-02-01T08:56:46

Does it anything else? Maybe pwn me via my PDF viewer?;)

TeMPOraL · 2024-02-01T09:43:39

It contains Bitcoin hashes, rendering them one by one as it mines them.

nayuki · 2024-02-01T04:35:40

I'll analyze PNG for comparison. The largest width and height is 2147483647 (2^31 - 1). Using the pHYs chunk (physical pixel dimensions), the lowest density we can specify is 1 pixel per metre. So, 2 billion metres (2 gigametres) is somewhat bigger than the diameter of the sun at 1.39 Gm. https://en.wikipedia.org/wiki/Orders_of_magnitude_(length)#g...

Using the sCAL chunk (physical scale) would allow extremely large dimensions because it uses ASCII floating-point.

lifthrasiir · 2024-02-01T06:41:10

> Using the sCAL chunk (physical scale) would allow extremely large dimensions because it uses ASCII floating-point.

AFAIK sCAL is more about the image's subject, not the image itself. A 1:10,000,000 scale world map would be < 10 m wide according to pHYs, but it will be ~40,000 km wide according to sCAL.

jdlyga · 2024-02-01T01:16:55

You know you're reading a good technical article when it measures pdf width in kilometers

apapapa · 2024-02-01T07:15:51

microSD cards can contain millions of miles in a very small space...

geraldhh · 2024-02-01T13:49:08

or billions of feet

coldpie · 2024-02-01T15:22:40

A truly unfathomable quantity of toes on a single SD card!

macropin · 2024-02-01T02:27:37

Reminds me of this PDF I created more than a decade ago from a Postscript implementation of the game of life. Seems it still works, but causes MacOS preview to crash. https://andrewcutler.net/docs/joke/life.pdf

maleldil · 2024-02-01T14:23:33

It doesn't cause Preview to crash on Sonoma. FWIW, I can't see any animation, just the final state, while Firefox's PDF reader does show some animation. Skim has the same behaviour as Preview but doesn't show the grid.

JKCalhoun · 2024-02-01T13:43:14

Whew! Didn't crash on my Mac OS — just a static Game of Life render. This machine is still on Monterey FWIW.

remoquete · 2024-02-01T08:40:34

This post reminds me of Umberto Eco's intellectual divertissements. More specifically, this fantastic piece, "On the Impossibility of Drawing a Map of the Empire on a Scale of 1 to 1."

https://s3.amazonaws.com/arena-attachments/881694/cb6119367b...

alberto_ol · 2024-02-01T13:54:49

Umberto Eco quoted Jorge Luis Borges:

https://en.wikipedia.org/wiki/On_Exactitude_in_Science

B1FF_PSUVM · 2024-02-01T18:18:33

Speaking of Borges, sometimes I'm sadly reminded of his spoof of categorization schemes ("Animals: those that belong to the Emperor, embalmed ones," etc. ) : https://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevole...

remoquete · 2024-02-01T14:00:54

Yes, though it'd be perhaps more accurate to say it expanded upon the theme, as Wikipedia says.

tremarley · 2024-02-01T01:43:33

“But unlike Acrobat, the Preview app doesn’t have an upper limit on what we can put in MediaBox. It’s perfectly happy for me to write a width which is a 1 followed by twelve 0s:

Screenshot of Preview’s Document inspector, showing the page size of 352777777777.78 x 10.59 cm. If you’re curious, that width is approximately the distance between the Earth and the Moon. I’d have to get my ruler to check, but I’m pretty sure that’s larger than Germany.”

The size of every planet in our solar system, put next to each other, can fit in this doc with room to spare

svantana · 2024-02-01T17:29:07

By my counting, that document is ~ 373 km^2, which is much smaller than germany. It turns out the ruler was needed after all

croes · 2024-02-01T05:28:49

You have one 7 too many. 352777777777.78cm are 3,527,777.7777778km.

MichaelZuo · 2024-02-01T02:30:26

Now I wonder how large of a file size would such a PDF be if it wasn't empty space...

pas · 2024-02-01T02:54:29

pdf supports vector graphics! or it can be just a lot of "a" characters, it supports compression/repeat, right?

mrb · 2024-02-01T00:37:54

Fun experiment alexwlchan! Two small mistakes in your post: you write "15,000,000,000.00 in" and "that the size of a page is 15 billion inches", but it should be 15 million.

You said you had difficulty formatting text. Here is a "hello world" pdf that just has these two words on a page: copy and paste this text (stripping leading spaces on each line) and save it in a .pdf file. Basically in order to write text you have to define a font (object 5) and then a stream with a Tf command to use the font, a Td command to position the text, and a Tj command to write it.

    %PDF-1.2
    1 0 obj
    <<
     /Type /Catalog
     /Pages 2 0 R
    >>
    endobj
    2 0 obj
    <<
     /Type /Pages
     /Kids [ 3 0 R ]
     /Count 1
     /MediaBox
     [ 0 0 612 792 ]
    >>
    endobj
    3 0 obj
    <<
     /Type /Page
     /Parent 2 0 R
     /Resources 4 0 R
     /Contents 6 0 R
    >>
    endobj
    4 0 obj
    <<
     /ProcSet[/PDF/Text]
     /Font <<
      /F1 5 0 R
     >>
    >>
    endobj
    5 0 obj
    <<
     /Type /Font
     /Subtype /Type1
     /BaseFont /Times-Roman
    >>
    endobj
    6 0 obj
    <<
     /Length 52
    >>
    stream
    BT
    /F1 48 Tf
    185 400 Td
    (Hello World)Tj
    ET
    endstream
    endobj
    trailer
    <<
     /Root 1 0 R
    >>

alexwlchan · 2024-02-01T06:33:55

Argh! I knew I was going to make a numerical mistake somewhere, thanks for spotting it. Correction will be up shortly. Thanks for spotting it! :D

And thanks for the text example! This looks like what I was trying, but clearly I had a mistake somewhere.

dingensundso · 2024-02-01T07:21:31

Spotted another math mistake: > The default unit size is 1/72 inch, so the page is 300 × 72 = 4.17 inches.

whartung · 2024-02-01T00:43:11

Is the xref at the end of a PDF required or not? Seems like it is in the spec.

jchw · 2024-02-01T01:15:15

By the spec, yes. Some PDF readers will parse it anyway, some will not. In my experience depending on the renderer the xref table can be varying degrees of malformed before things go wrong. Edge's old PDF reader (the one before Acrobat and after PDFium) for example seemed to tolerate just about anything, falling back to the latest version of objects if the xref table was broken. There's also other mistakes you can make, like for example, the xref table requires carriage returns (each entry in the table is supposed to be an exact number of bytes) but some PDF readers will still interpret the xref table even if the carriage returns are missing.

whartung · 2024-02-01T01:23:29

As I understand it, the xref entries don’t require a carriage return, but they require a fixed line length. If you don’t want to use a CR, you can pad with a space.

So CR/LF, space/LF, and space/CR are all valid endings.

jchw · 2024-02-01T01:43:38

Yep:[1]

> The byte offset in the decoded stream shall be a 10-digit number, padded with leading zeros if necessary, giving the number of bytes from the beginning of the file to the beginning of the object. It shall be separated from the generation number by a single SPACE. The generation number shall be a 5-digit number, also padded with leading zeros if necessary. Following the generation number shall be a single SPACE, the keyword n, and a 2-character end-of-line sequence consisting of one of the following: SP CR, SP LF, or CR LF. Thus, the overall length of the entry shall always be exactly 20 bytes

This is interesting. Never actually saw anything other than CRLF in practice, even inside of PDF files that otherwise were LF-only.

[1]: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard... page 41

mrb · 2024-02-01T01:00:26

It is required according to the standard. But in practice most PDF viewers don't care. They may complain the PDF is "damaged" or "no valid xref was found", but they will render it perfectly fine.

wodenokoto · 2024-02-01T04:55:55

> "15,000,000,000.00 in" and "that the size of a page is 15 billion inches", but it should be 15 million.

Can you help me count zeroes? Why is it million and not billion?

jetrink · 2024-02-01T05:20:59

The numerical and word versions are equal, but they're both wrong. 15 billion inches is to the distance from the Earth to the Moon.

croes · 2024-02-01T05:22:39

>If we crank it all the way up to the maximum of UserUnit 75000, Acrobat now reports the size of our page as 15,000,000,000.00 x 15,000,000,000.00 in – 381 km along both sides, matching the original claim. If you’re curious, you can download the PDF.

15 billion inches are 381,000km. The original claim is the limit is 15 million inches.

mrb · 2024-02-01T05:48:42

He put too many zeroes. It should be "15,000,000.00 in" or 15 million.

puck · 2024-02-05T08:58:37

> Please don’t try to print it.

How will I know if I can fold it more than 7 times, though?

whartung · 2024-02-01T00:51:10

It seems germane at this point to paraphrase Steven Wright.

“I have a map of the United States. It’s actual size. “

galaxyLogic · 2024-02-01T05:55:57

I have the map of US in my cell-phone.

I'm somewhat confused by its directions however when I look at the map and want to go somewhere. Is the top-part of the map where I'm moving? Or is the top-part North?

Seems it is not North and that is confusing because maps I've seen before have North at the top always.

If I turn 90 degrees, the map turns around. But I thought it was I who turned around.

And if I stop, the map cannot know where I'm going because I'm not going anywhere. So it is almost like I have to start moving before the map can tell me where to turn.

Or if I hold the smart-phone in front of my eyes the top of the map is towards the sky. Am I supposed to look at the map from above?

What are some good tactics on how to use Google-map on your cell-phone?

flexagoon · 2024-02-01T10:07:17

There are two modes in Google Maps - one shows the map in a fixed rotation (north on top by default, but you can rotate the map with two fingers), the other mode automatically rotates the map based on what direction you're facing. *Facing*, not moving, so you don't actually have to walk for it to determine the direction.

You can switch between the modes by clicking a compass icon

wongarsu · 2024-02-01T13:17:19

Part of the confusion might be that it's pointing in the direction the phone is facing. Which is kind of obvious, but notably doesn't work if you put your phone in an upright phone holder, as many people do in their car.

galaxyLogic · 2024-02-02T06:45:00

> pointing in the direction the phone is facing

Do you mean where the top edge of the phone is directed towards when I hold the phone so that its display is pointing towards the sky?

Or where the backside of the phone is directed at if I hold it upright in front of my face?

galaxyLogic · 2024-02-02T06:48:56

Thanks for the tip about the modes

TowerTall · 2024-02-01T06:08:26

I really hate that too. You are in a intersection and the voice says "Drive north for x miles/km". What is wrong with "turn right and drive for x miles/km"? I normally have zero clue in what direction north is especially when I am in a location i have never been before. I drive a bike and have the phone in my pocket and can therefore not see any arrow that the app might display. I only have the audio to navigate from.

roxgib · 2024-02-01T10:19:54

It will do that if it doesn't already know what direction you're travelling, which is usually because you've just activated navigation and you aren't moving yet. Unless I happen to know which direction north is or which way to towards my destination I'll just pick a random direction and it will adjust the route if I guessed wrong.

thaumasiotes · 2024-02-01T12:25:45

> You are in a intersection and the voice says "Drive north for x miles/km".

Does that really happen? I have never experienced it. How do they tell which way is north?

Highway 101 runs through San Jose pretty much due east/west, but because it also runs up to San Francisco, it is officially a north-south highway. So you check your position on the map and you're traveling due east along an east/west road. Is that "north"? (Of course not. It's "south".)

jeffhuys · 2024-02-01T08:56:23

That’s Google maps for you. Try another one, most have way better voice cues (amongst other things!).

robertlagrant · 2024-02-01T09:38:50

That's odd. My Google Maps tells me to turn left or right. It doesn't use compass directions.

TeMPOraL · 2024-02-01T09:53:04

> What are some good tactics on how to use Google-map on your cell-phone?

For navigation?

1. Don't activate navigation. It's broken six ways to Sunday, and burns through battery like there's no tomorrow. Use route preview instead (i.e. the step after searching, but before activating the voice nav proper).

2. Use your fingers to rotate the map so it always faces the same way you're going.

3. If confused, recenter and press the compass so it rotates to have North at the top, and continue from there.

Now FWIW, I use Google Maps when navigating on foot/scooter, or as a pilot in the car. If I were a driver... I'd probably buy TomTom or whatever nav that's not shit.

p1mrx · 2024-02-01T08:46:59

If you want North to be up, tap the compass icon.

iggldiggl · 2024-02-01T08:02:22

Umberto Eco also has something to say on that subject – On the Impossibility of Drawing a Map of the Empire on a Scale of 1 to 1:

https://s3.amazonaws.com/arena-attachments/881694/cb6119367b...

fuzztester · 2024-02-01T01:22:23

I have a map of the Universe. Dunno, it keeps expanding ...........................................,............................................................................................................................

venusenvy47 · 2024-02-01T14:41:58

I always liked his related joke: "I want to get a tattoo of myself on my entire body only 2" taller.".

https://scomedy.com/quotes/10779

0134340 · 2024-02-01T01:57:29

>Please don’t try to print it.

Sounds like a print bomb waiting to happen. Last time I had a printer it was next to impossible to cancel a print job on Windows. Back when people had wifi printers that were open or ill-secured, those were fun times.

matheusmoreira · 2024-02-01T04:33:01

> it was next to impossible to cancel a print job on Windows

It's still impossible. The only reliable method I've found consists of turning the printer off and then deleting the print job in the queue. Only way to get Windows to actually delete it. Doesn't work unless the printer is sitting right next to me, of course. I have no idea why this is so hard.

tazjin · 2024-02-01T14:04:05

Some ~12 years ago, I was debugging POS integration with a receipt printer and accidentally sent garbage postscript to the receipt printer, which printed it out verbatim.

Stopping it was impossible. Power cycling that printer had absolutely no effect. It wrote the unfinished print job to some kind of persistent memory, and by god it was going to finish it.

It went through something like 2 1/2 rolls of receipt paper (yes it dutifully awaited the new rolls and then just continued) and due to the thermal printing process it smelled very odd, and I had quite a few metres of raw Postscript afterwards to decorate a wall with.

askvictor · 2024-02-01T09:57:19

And sometimes windows won't delete it from the print queue as it can't talk to the printer. Fun times.

foreigner · 2024-02-01T07:42:45

Only slightly more reliable method: unplug the printer and throw the computer out the window.

kome · 2024-02-01T09:48:24

when i read

> Please don’t try to print it.

my first reaction has been: you are not my mom. >:-)

bombcar · 2024-02-01T12:17:42

Just click “scale to fit on one page”.

poulsbohemian · 2024-01-31T23:49:54

About 30 years ago I interviewed to be a summer intern at Microsoft, and one of the interviewers asked a question very similar to this but regarding Excel. This is the kind of topic that never gets old for understanding a person’s curiosity and ability to dissect the potential issues.

danbruc · 2024-02-01T09:04:04

So what is the actual limit if any? I just had a quick look at ISO 32000-2:2020 [1] and think the answer is none or implementation depended if you want. In the file format a media box is a rectangle, a rectangle is an array of four numbers, and a number is either an integer or a real. Numbers are represented as strings, so there is no a priori limit on their range and there seem to be no requirements on the minimum or maximum range of values an implementation has to support. The appendix only says that IEEE 754 is a commonly used format to represent reals and that this might impose limits.

[1] https://developer.adobe.com/document-services/docs/assets/5b...

gr33nq · 2024-01-31T23:35:17

Coincidentally, I just finished watching a video that explored the same topic of massive and unique PDF files: https://www.youtube.com/watch?v=ZvVNRRQjDh8

NooneAtAll3 · 2024-02-01T00:42:07

yeah, I went here to post it as well

not every time one can see a whole game integrated into a PDF!

drewcoo · 2024-02-01T00:13:18

Please, nobody tell Randall Munroe!

Because I would literally spend days worth of time scrolling.

yannis · 2024-02-01T03:51:58

And of course you can try and produce this pdf using TeX. In this post https://tex.stackexchange.com/a/27482/963 I created a pdf of 15283 pages (lettersize) filled with lorem ipsum text and without the program running out of memory.

anotheraccount9 · 2024-02-01T00:48:21

For a moment I wasn't sure if I wanted to click on the link.

chaxor · 2024-02-01T02:32:06

On an only slightly related note: is there any good way to check PDFs for malware/executables?

If I'm stuck with an attempt at it, the best I can think of is opening in a new QEMU or docker with no Internet access, but that's 1) a fair but of work to check something, and 2) probably not even that secure. Using some cli tool, like xxx, bat, or ranger, that does some processing to extract the text and looking at just that feels more secure - but I know it really isn't.

What is a simple tool to "clean" PDFs? An ML tool that does QEMU/docker/no-net to extract the content, turns that into game, and saves a typst/latex template with it would probably be the best possible outcome - but that's a decent (yet potentially very lucrative) task.

peddling-brink · 2024-02-01T05:06:56

For analysis, I’ve used Didier’s tools. If you just want a safe way to open it, upload it to a cloud storage provider which destructively renders the pdf. Box or Google drive should work.

https://blog.didierstevens.com/programs/pdf-tools/

worewood · 2024-02-01T03:41:14

What you mean with "PDFs with malware/executables"?

If you're talking about embedded active content within them, then a reader application can just ignore/not run it.

If you're talking about a crafted PDF that exploits, let's say, font rendering bugs inside the reader than it's near impossible. Keep your applications updated.

arunsivadasan · 2024-02-01T08:08:54

There is a Chrome addon called SquareX https://chromewebstore.google.com/detail/kapjaoifikajdcdehfd... The founder is pretty reputed in the Cybersecurity field.

flexagoon · 2024-02-01T10:09:17

There are some pdf readers that protect you against those things.

On Android, for example, there is the GrapheneOS Pdf Viewer [1]. It's readme has a pretty good explanation of how it works.

1: https://github.com/GrapheneOS/PdfViewer

qwertox · 2024-02-01T00:55:41

It also screams buffer overflow.

maxerickson · 2024-02-01T00:58:48

PDF readers are probably mostly pretty hardened against "naive" non-conforming content.

kirubakaran · 2024-02-01T01:41:04

> probably mostly pretty hardened

Quite possibly perhaps that might be true-ish to some extent, I think, but take that with a grain of salt, I'm not an expert, that's just my wild guess :-p

maxerickson · 2024-02-01T02:05:26

It's pretty ridiculous to peel that off the following qualifier.

Readers have been aggressively attacked for a long time. It's certainly not impossible that some basic demonstration PDF will cause an issue, but it's probably not reasonable to expect it.

oneseven · 2024-02-01T01:20:43

Slightly tangential: if you are hacking on PDFs, manually or otherwise, this is an incredibly useful tool: https://pdfcpu.io/ (not the author, just a user)

vendiddy · 2024-02-01T12:21:54

Thanks for this. Any other tools that are useful when hacking on PDFs? I need to do a lot of programmatic PDF manipulation at work.

jakey_bakey · 2024-02-01T10:25:05

I love the texture of your website, it's like nice tactile wallpaper.

JKCalhoun · 2024-02-01T13:47:08

Looks like it's: https://alexwlchan.net/theme/white-waves-transparent.png

kepano · 2024-01-31T23:41:53

I cannot let this opportunity go by without quoting On Exactitude in Science by Borges in its entirety

". . . In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography."

https://en.wikipedia.org/wiki/On_Exactitude_in_Science

staplung · 2024-02-01T00:14:31

Or a portion of one of it's inspirations: Lewis Carroll's Sylvie and Bruno Concluded

  "We actually made a map of the country, on the scale of a mile to the mile!"

  "Have you used it much?" I enquired.

  "It has never been spread out, yet," said Mein Herr: "the farmers objected: they said it would cover the whole country, and shut out the sunlight ! So we now use the country itself, as its own map, and I assure you it does nearly as well."

defrost · 2024-02-01T02:40:40

Also Carroll, from The Hunting of the Snark

    He had bought a large map representing the sea,

    Without the least vestige of land

    And the crew were much pleased when they found it to be

    A map they could all understand.


    “What’s the good of Mercator’s North Poles and Equators,Tropics, Zones, and Meridian Lines?”
    
    So the Bellman would cry

    and the crew would reply

    “They are merely conventional signs!


    “Other maps are such shapes, with their islands and capes!

    But we’ve got our brave Captain to thank

    (So the crew would protest) that he’s bought us the best

    A perfect and absolute blank!”

iggldiggl · 2024-02-01T08:04:16

And Umberto Eco expanded on that with On the Impossibility of Drawing a Map of the Empire on a Scale of 1 to 1:

https://s3.amazonaws.com/arena-attachments/881694/cb6119367b...

jancsika · 2024-02-01T04:53:09

There are some funny lines from They Might Be Giants' "Women and Men" that run along the same lines:

Women and men have crossed the ocean,

They now begin to pour

Out from the boat and up the shore.

Two by two they enter the jungle,

And soon they number more,

Three by three as well as four by four.

Soon the stream of people gets wider,

Then it becomes a river,

River becomes an ocean,

Carrying ships that bear

Women and men.

**

Borges: map of an area gets so detailed it becomes the same size as the area.

TMBG: creatures multiply and ultimately overrun an area so fully that their group behavior recreates the ecology of the area they took over

msarris · 2024-02-01T06:28:04

So I guess the question is, how did she figure out the size of the entire universe?

moritzwarhier · 2024-02-01T18:31:10

This is mentioned prominently in the German Wikipedia article about the PDF format. Was wondering if this needs a (200x).

Sad to see the article quote some random Xitter post as "source".

JackSlateur · 2024-02-01T08:32:10

Long story short: the original tweet makes a confusion between PDF (the file format) and adobe acrobat (the PDF reader) : the 381km2 is an acrobat limit, not a PDF limit

Funny document, still

latexr · 2024-02-01T12:39:35

> Long story short: the original tweet

I don’t think the tweet is relevant at all and it’s a disservice to this post to feature it that prominently in a summary. A more interesting conclusion is that PDF files can have dimensions larger than the Universe, and an example is provided.

mr_mitm · 2024-02-01T09:17:32

Interesting, I always though Acrobat was the reference implementation of PDF.

markussss · 2024-01-31T23:25:38

This was a fun read! Thank you

tkgally · 2024-02-01T00:35:57

I second that comment! That was the most enjoyably nerdy thing I’ve read in quite a while.

wiradikusuma · 2024-02-01T04:01:01

I open the PDF in Google Chrome on a Mac. When I Ctrl+P, the dialog says it's 1 Page. I don't try to print it, but I think it will not consume more than 1 page?

Also, PDF preview in Chrome simply showing it like a normal PDF, but Preview seems confused (gray background instead of white)?

justsomehnguy · 2024-02-01T07:34:46

> I don't try to print it

Well, you can even without consuming a single sheet: just print to PDF.

> Preview seems confused (gray background instead of white)?

It tries to render it and fit in the preview.

DontBreakAlex · 2024-02-01T08:44:35

Wait, pdf files aren't binary ?!

gaazoh · 2024-02-01T09:40:20

I just had the exact same reaction! So I opened a random PDF I had laying around, and yes, it's mostly a text format. Some (most) objects are binary data streams, but some are also text data. Likewise, objects may or may not be compressed, obviously compressed streams are binary data. But the file structure is text, some objects are xml, and you can figure out quite a lot of stuff just by looking at a pdf in a text editor, and it might not even be that long: the single page PDF I just looked at is just over 1500 lines long, I can definitely manually scroll through it (although offsets are in bytes, not lines, which make them not very useful for manual lookup).

roxgib · 2024-02-01T10:26:19

I was surprised that the underlying format doesn't implement compression (though I assume objects can be compressed). Perhaps I shouldn't be surprised since I often get text only PDFs with unreasonably large sizes.

whoisthemachine · 2024-02-01T00:13:19

While the Germany PDF actually scrolls pretty quickly at 100% zoom (makes one realize just how much text is read in a day), the Universe one is pretty fun, Firefox's PDF reader at 100% zoom obviously doesn't budge the scrollbar at all.

pitherpather · 2024-02-01T00:48:36

Hackaday soon: Synchronizing a treadmill to a pdf the size of Germany.

Obligatory?: The pdf is not the territory.

ipsum2 · 2024-02-01T02:43:12

Chrome's PDF reader reports the file size as disappointly 200.00 × 200.00 in (square)

denysvitali · 2024-02-01T07:17:47

I can't wait for people to start rendering their CVs with this trick >:)

roxgib · 2024-02-01T10:28:07

I'm a bit disappointed in myself that it didn't occur to me to submit my CV in A3.

xanth · 2024-02-01T02:00:36

Related CGPGrey video comparing metric paper sizes to comparable objects from the plank scale to the galactic[1]; could have been an XKCD comic

[1]: https://www.youtube.com/watch?v=pUF5esTscZI

_bax · 2024-02-01T10:05:24

An idea to send a DDOS attack to company LAN printers

mglz · 2024-02-01T11:09:17

Wouldn't they just check their available paper, notice there is no AGermany size and give up?

lovegrenoble · 2024-02-01T15:03:06

How many viruses it could contain )

poulpy123 · 2024-02-01T11:01:33

what does mean "larger than germany" for a document file ?

654wak654 · 2024-02-01T11:04:20

PDFs have physical dimensions in them (I think most document formats do), so you can literally define a square of 5x5cm in the document for example.

RecycledEle · 2024-02-01T09:18:47

"Your Scientists Were So Preoccupied With Whether Or Not They Could, They Didn’t Stop To Think If They Should."

--Ian Malcolm in the original Jurassic Park film

user2342 · 2024-02-01T12:26:03

"Please don’t try to print it." :-)

codeflo · 2024-02-01T02:06:23

I take offense to that diagram; Germany should refuse to be covered by a PDF that's not in proper DIN format.

In theory, DIN paper sizes go all the way from subatomic to the size of the universe. It seems like A(-39) is barely too small to cover Germany's land mass, but A(-40) should be more than sufficient. That's 882 x 1247 km if I didn't miscalculate.

Cacti · 2024-02-01T06:44:35

Oh here we go again.

groestl · 2024-02-01T07:24:49

That's actually quite funny, especially because Germany almost has portrait DIN format.

Cacti · 2024-02-01T20:24:01

:) I think maybe the joke was a hair too subtle for this crowd.

Simon_ORourke · 2024-02-01T07:16:36

[flagged]

bowsamic · 2024-02-01T07:45:06

was? More like is. They're overlooking one right now, because of the last one they did.