There's actually a good practical use-case for this: you're building software that needs to detect PDF files (as a smaller detail of what it dies, not its primary purpose) and you want to include a tiny one in your unit tests.
I don’t know what you expected. File is just there to give a good guess at a file’s format. There are a ton of reasons why this problem is hard, and there are reasons to make “file” less accurate in order to make the implementation simpler and more secure.
I'm curious why you needed the absolute smallest PDF file for testing? As an intellectual exercise, golfing the PDF format sounds like a bit of fun, but given that https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pd... is 13KB, I would think you could load that in a test suite on pretty much any reasonable, non-embedded platform. And, why would you want to load it on an embedded platform, anyway?
Testing near/at boundary conditions is generally good practice.
A minimal size file could easily catch some cases where your code assumes some sort of structure (some flag, header, metadata structure, etc.) exists when it's possible it actually doesn't.
Not exactly on topic, but this reminds me of somewhere around the year 2000, where i had to produce "documentation" in a hurry, in under an hour before delivering and deploying some systems. I think i used Dia, but am not sure anymore.
Anyways, all essential information in DIN A4 landscape mode, nice diagrams with network structure, IP numbers and so on, ready.
Now what? Remember it was around the year 2000, what to use?
Floppy disks of course! Saved it, looked at it and thought it had gone wrong somehow because it listed as 4KB only.
Used another floppy, lowlevel formatted with fdformat to be sure, taking minutes, hurry, hurry! Saved again.
4KB!? WT..?
Booted up another system, loaded the PDF with different readers, worked.
Shrugged and hoped it worked at the customers site also, which it did, they even said it looked nice and clear.
Can anyone explain to me why PDF persists as the most common document format for "official" correspondence? It's absolute dog-vomit of a format, just unbelievably overwrought and unfriendly. I wince every time I have to sign one, or, god forbid, actually fill in some form.
Is the explanation really as lame as "They were there first and it stuck"?
PostScript is built-in to high-end printing hardware and PDF does a good job at encapsulating PostScript which gives you high reliability that the thing on the screen is going to print out the same way that it appears on the screen. Adobe has traditionally had a dominant role in both font (Type 1, OTF) technology and licensing and creative tools (Illustrator - a .ai file is a PDF) and so both the creation and consumption sides of the ecosystem have coalesced around a common format whose semantics carry through predictably (though not simply!) from one end to another.
What makes PDF particularly challenging is that there are so many broken PDFs out there which must be tolerated and so many legacy fonts, images and formats which have accumulated in the format over time - JPEG 2000 anybody?
Most complaints I see about PDF though are usually “why can’t it just wrap lines” and the answer is that there’s a zillion ways to do that and we’re supposed to be representing the output of that process as a visual artifact, not the input, as HTML does, because the use case is printing and stability of the output is non-negotiable.
Because it is the only format with mass adoption that makes presentation reasonably stable. Plus it's the format with the only reasonable authoring tools.
Personally, I enjoy working with DeJaVu files very much but you only find those of pirated ebooks.
What do you use for form filling. During job applications last year nearly all the pdf forms were just images, or vector boxes that you couldn't fill ...
In means of "minimal valid raster image" - its raster image with 1px × 1px; in means of "minimal valid vector image" - its vector image with single dot OR single line segment.
But I can't imagine what is "minimal valid raster PDF" means.
Technically, the %PDF doesn't have to occur at byte 0. But, like many others inferred, that pushes onward toward real world structure (vs. an academically correct but useless PDF).
If you want a fully covered example it'll need a trailer, xref and at least one obj reference. ...and there are TWO flavors of those (linearized and not). ...and flate coded and not.
So, for a test harness you'd actually want a collection of small files.
I did that here with tiny images in JPEG, PNG and GIF https://github.com/simonw/datasette-render-images/blob/maste...