If we created X amount of data in 2003, then, 7 years later, we're creating 128X as much data; which roughly works out to X every 3 days.
Has everyone forgotten about logistic growth? There is probably a ceiling to most growth, wouldn't it make more sense to ask where it lies? http://en.wikipedia.org/wiki/Logistic_function
Edit: if you think of the logistic function as modeling the rate we create information, this is one possible "story": initially there's slow growth due to technology. Then the technology picks up, and there's the exponential phase we're seeing right now. That settles into a nice linear trend as the tech matures. Finally, we hit either natural (e.g., satiation) or technological limitations and that slows the rate of growth. At the very limit, we're still creating information, but at a constant rate. The rate of growth might be near 0, but the rate of production is still bloody high.
Forecasting these kinds of relationships back to the dawn of time seems a bit tricky to me - I would guess we're really seeing something like an S curve.
"23 Exabytes of information was recorded and replicated in 2002. We now record and transfer that much information every 7 days."
Call me crazy, but that sounds every bit just as sensational to me. Seems like all this article is doing is getting overly picking with some throwaway oft-repeated trivia stat. Who cares what the exact numbers are? The purpose of the statement remains the same.
That said, the title could have been much improved.
It's not so much nit picky IMO as light heartedly correcting the facts.
I think that counts as intellectually stimulating :)
These are numbers incomprehensibly big. Nobody can even begin to mentally picture these in a reasonable fashion. The only part of the statement that is really important is the meaning, not the details of the numbers. That meaning is "we are creating shittons of data really really fast, faster than ever before". If you're getting hung up on the accuracy of the numbers used to express this, then you're missing the point, and I wonder why you don't have something better to do than get worked up about it...
EDIT: Furthermore, according to the "more accurate" statement form the article, we're creating 23 Exabytes in 7 days, not 5. Read it again: "_23 Exabytes_ of information was recorded and replicated in 2002. We now record and transfer _that much_ information every _7 days_."
I think a better quote would be "By 2003, mankind had generated a shitload of information. Now we generate a shitload every day."
If you care enough about numbers and can figure out how to say it more accurately... more power to you I guess? Doesn't change the effectiveness of the quote though.
Sorry to be nitpicking... :)
The there was a 50x growth between now and the single year of 2002. Schmidt's quote stated 260,000X growth between now and the average of all data created in the past 5000 years.
In order to create an actual comparison of Schmidt Rate vs Real Rate, you need a 'real' number for the amount of data from recorded history until 2002, something that nobody has volunteered. Nobody is debating that Schmidt's numbers were inaccurate, and the number he provides for this is almost certainly wrong, however we don't know by how much. Extrapolating out from data from the single year of 2002 is flawed; it is foolish to think that the growth rates of data in 2002CE and 2002BCE are the same without data to suggest so.
If we accept that there has been a constant rate of growth of data creation, then your conclusion is valid.
Edit: I can't link to the middle of a Silverlight presentation, but if you visit http://research.microsoft.com/apps/tools/tuva/ , click the middle (with the picture of Feynman), click Lecture 5, and skip to 17:00 in, you can see the incident. But you really should watch all of them, especially the one on Symmetry.
Weather isn't data, a measurement of temperature would be data...
[Not that I am trying to imply that other things aren't data - but I don't think weather itself is data although you can, of course, have weather data!]
Edit: "There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days."
Obviously there would be plenty of questions, ranging from the most basic (a CD released in 1990 contains 'x' MB of data on it... but there's also the sleeve design, printed track listing etc...) to more complex (how do you measure the data of a painting in computer terms?), but I'm sure if someone were to set about trying to work this out, they could think up (albeit debatable) definitions.
* I have no substantiation for this, but it seems about right: a JPEG is a megabyte or 2, while a single movement of a symphony is 10 or 20, and a short movie at reasonably high quality is easily 100 MB.
Worst case: http://articles.cnn.com/2007-10-17/us/monalisa.mystery_1_leo...
150,000 dots per inch, 13 light spectrums, including ultra-
violet and infrared at say 10 bit per spectrum * largest painting in the world (12 384 000 square inches)
150 000 x 150 000 x 13 x 12 x (12 384 000 bits) = 4.71279266 exabytes
The number might still have been made up, but let's not forget that Schmidt might have some sources of information no available to the public, for example the server stats from Google and YouTube.
True, not as catchy as the dawn of time, but still mighty impressive. And in fairness to their outgoing CEO, Google didn't cache much data at the dawn of time (or even in the '80s), so it can't have been that important.
My point is that a lot of this "information" is ephemeral and not really all that important in the long run.
Reply to the edit: If you stretch the timeline long enough, nothing is of any importance.