"Will You Be Able to Read This Article in 1,000 Years?"
Barring some significant advancement in Sid-Meier's-Alpha-Centauri-style technologies like longevity vaccines or brain uploading, probably not.
Even if I do survive, will I still be speaking this dialect of English? Will I even be speaking English at all?
But back to the main point of the article, saving the article as a PDF is a very wrong approach; binary formats are likely going to be the first to die off. I'd be much more confident in the article being stored more-or-less as it is now: as plain-text markup, style information, and some code executable by just about any modern Javascript implementation, accompanied by the texts of the relevant standards (for HTML, CSS, ECMAScript/Javascript, etc.), so that any individual could then (theoretically) piece everything together and display it as it was intended to be displayed.
Lots of very old software from the beginning of computing history has been preservable in source code form for this very reason of the robustness of plaintext; even with encoding differences, it's been possible to convert these documents (even if doing so has been a manual process). I'd expect that, due to this robustness of plaintext, free-and-open-source software has significant preservation advantages relative to proprietary software (of the closed-source variety, at least), since there's a greater chance of copies surviving due to how widespread said copies are.
Flip this around and look backwards: how much exists from 1000BC–1000AD (using the "1,000 years to 3,000 years" figure cited by Cerf)? In popular knowledge, not much: Sun Tzu and Confucius, the Bible, the Kama Sutra, Homer, Virgil, Beowulf, Aristotle, Plato, Egyptian hieroglyphics. Machiavelli, Chaucer, and their peers around the time of Gutenberg are all "too late" in the 14th century.
These works survived because they were (presumably) important. Cultural understanding is important to anthropologists and shows up as well (e.g., graffiti at Pompeii). But how much is Crash Bandicoot going to matter in 3000AD? How wide will the cultural divide be? (What's a "computer"? What's a "button"?) What language will they speak?
Thus, the concern to me is less how we'll save everything, but more what are the important things to save? After all, the Iliad doesn't stand out among zetabytes of video games, cat pictures, and Kardashian re-runs.
Interesting aside: the Department of Energy is facing a similar, simpler challenge with the Yucca Mountain nuclear waste site. Namely, how do you create signage that will work (survive, be understandable) for 10,000 years? (http://www.salon.com/2002/05/10/yucca_mountain/)
Well, it's also about the durability of the information medium. Clay tablets have survived remarkably well and they give us a better view into ancient Mesopotamia (particularly the ancient Ur-III dinasty) than later centuries' records kept as papyrus or strips of wood.
Very true. And the medium (among other factors, like literacy) has a self-selecting component: you're not going to take the time (money, etc.) to make clay tablets out of everything. There wasn't a clay-tablet-Twitter back in the Neo-Sumerian Empire.
As modern medium becomes a) cheap, b) ubiquitous, and c) durable, there's no reason to "edit" ourselves. Maybe that's a good thing: it would give future anthropologists a treasure trove of information. But again, my concern is that we'll end up with so much quantity that the quality data/recordings/thinking get subsumed, and future humans think we worship cats. (All those photos! And captions! And videos! I for one bow to our feline overlords.)
The only reason we have anything from 1000yrs ago is because people deliberately conserved it - copying, translating and actively passing things down the generations - for example:
Media:
Oral history/stories -> early written versions on tablets/papyrus -> vellum/scrolls/hand copied books -> printed books -> digital.
Language:
Aramaic/Ancient Greek -> Latin/Greek -> Modern English
If the original source media is very long-lived, you can skip some steps, but you still need to actively conserve and translate things - not just between old human languages, but from old media to new.
This process is obviously very lossy - we have only a tiny fraction of the literature from 1000 yrs ago - we have only the intersection of the stuff that people cared enough about to transcribe many times over and the transcribed pieces that physically survived the ravages of time and history.
Expecting things to just survive over the long term without anyone doing this - without anyone caring enough to do this - is probably both idealistic and unrealistic.
No, I don't think I'll be able to read anything in 1000 years, not given my life expectancy.
More seriously, on one hand, we have pretty good redundancy, since it's so easy to copy things.
On the other hand, the durability of the mediums we use to store information is probably not as good as writing stuff in stone or even on paper since in the worst case scenario, depending on the format used, minor corruption can make the data completely worthless.
Add to that the fact that future archaeologists will have to reverse-engineer the whole stack from hardware to software if they want to have a chance to make something of the (probably corrupt) data they have on hand. It probably puts the bar even higher than understanding Egyptian hieroglyphs.
Oh, and also add to that the signal vs. noise ratio of the data we produce today and the amount of data we produce. It would probably be interesting to see what picture future historians paint of our times with what they have on hand.
People have been talking about this problem for decades. The only information from this era likely to survive is what's stored in plain text or other easy to interpret format. Do you really think people 500 years from now will be able to watch the Blu-ray of Paul Blart Mall Cop 2? It's more likely that they'll be able to create a C compiler and build some of the free software of today.
One of the explicit goals of the Document Foundation is to be able to answer this question "yes". LibreOffice already does better on ancient Word documents than Word 2013, and the Document Liberation Project is all about the old formats.
Hmm, taken literally, this is a perfect example of Betteridge's Law, as I won't be around in 1,000 years to read anything. Rather than just the usual discussion of bit rot, I personally would wonder if humans will still exist in 1,000 years, and if we do, if we will still read.
Barring some significant advancement in Sid-Meier's-Alpha-Centauri-style technologies like longevity vaccines or brain uploading, probably not.
Even if I do survive, will I still be speaking this dialect of English? Will I even be speaking English at all?
But back to the main point of the article, saving the article as a PDF is a very wrong approach; binary formats are likely going to be the first to die off. I'd be much more confident in the article being stored more-or-less as it is now: as plain-text markup, style information, and some code executable by just about any modern Javascript implementation, accompanied by the texts of the relevant standards (for HTML, CSS, ECMAScript/Javascript, etc.), so that any individual could then (theoretically) piece everything together and display it as it was intended to be displayed.
Lots of very old software from the beginning of computing history has been preservable in source code form for this very reason of the robustness of plaintext; even with encoding differences, it's been possible to convert these documents (even if doing so has been a manual process). I'd expect that, due to this robustness of plaintext, free-and-open-source software has significant preservation advantages relative to proprietary software (of the closed-source variety, at least), since there's a greater chance of copies surviving due to how widespread said copies are.