

Eric Schmidt’s "5 Exabytes" Quote is a Load of Crap - robertjmoore
http://themetricsystem.rjmetrics.com/eric-schmidts-5-exabytes-quote-is-a-load-of-c

======
ajays
The figure I've heard is that the _data_ generated doubles every year (here,
"data" can mean web pages, logs, transactions, etc.) . Therefore, it follows
that every year we create as much data as in all the previous years combined (
sum_i 2^i = 2^(i+1) ).

If we created X amount of data in 2003, then, 7 years later, we're creating
128X as much data; which roughly works out to X every 3 days.

~~~
Retric
It's still BS. A consumer level analog camera captures around 1GB of data with
each picture. _The peak of film sales was in 1999 when 800 million rolls of
film were sold and 25 billion images were captured and printed_ which works
out to around 25 EB of data just from analog cameras in 1999.

~~~
pjscott
Your example also neatly illustrates that not all data is equally meaningful.
You lose a lot of information when you do JPEG or MP3 compression, but it
doesn't _feel_ like you're losing much of value.

~~~
billswift
Until you try to enlarge that JPEG later.

------
burgerbrain
_Based on the primary sources I’ve been able to piece together, the more
accurate (but far less sensational) quote would be:

"23 Exabytes of information was recorded and replicated in 2002. We now record
and transfer that much information every 7 days."_

Call me crazy, but that sounds every bit just as sensational to me. Seems like
all this article is doing is getting overly picking with some throwaway oft-
repeated trivia stat. Who cares what the exact numbers are? The purpose of the
statement remains the same.

~~~
anonymous246
Oh really? Schmidt's claim overstates reality by 5000 times and you don't
care? Link is to an (awesome) online calculator showing how I arrived at this.

[http://instacalc.com/?d=&c=ZGF5c19zaW5jZV9oaXN0b3J5X3N0Y...](http://instacalc.com/?d=&c=ZGF5c19zaW5jZV9oaXN0b3J5X3N0YXJ0ID0gNTAwMCozNjV8ZGF5c19pbl8yMDAyID0gMzY1fG51bV9kYXlzX2Zvcl81X2V4YWJ5dGVzX3RvZGF5ID0gN3xzY2htaWR0X2dyb3d0aCA9IGRheXNfc2luY2VfaGlzdG9yeV9zdGFydCAvIG51bV9kYXlzX2Zvcl81X2V4YWJ5dGVzX3RvZGF5fHJlYWxfZ3Jvd3RoID0gZGF5c19pbl8yMDAyIC8gbnVtX2RheXNfZm9yXzVfZXhhYnl0ZXNfdG9kYXl8c2NobWlkdF9leGFnZ2VyYXRpb24gPSBzY2htaWR0X2dyb3d0aCAvIHJlYWxfZ3Jvd3RofGV4YWdlcnJhdGlvbl9vcmRlcnNfb2ZfbWFnbml0dWRlID0gM3x8fA&s=ssssssssss&v=0.9)

~~~
burgerbrain
Yeah, that's right.

These are numbers incomprehensibly big. Nobody can even begin to mentally
picture these in a reasonable fashion. The only part of the statement that is
really important is the meaning, not the details of the numbers. That meaning
is _"we are creating shittons of data really really fast, faster than ever
before"_. If you're getting hung up on the accuracy of the numbers used to
express this, then you're missing the point, and I wonder why you don't have
something better to do than get worked up about it...

EDIT: Furthermore, according to the "more accurate" statement form the
article, we're creating 23 Exabytes in 7 days, not 5. Read it again: _"_23
Exabytes_ of information was recorded and replicated in 2002. We now record
and transfer _that much_ information every _7 days_."_

~~~
turboneat
To me, the difference between "23 exabytes every 7 days" and "5 exabytes every
3 days" is pretty irrelevant.

I think a better quote would be "By 2003, mankind had generated a shitload of
information. Now we generate a shitload every day."

~~~
anonymous246
Using shitload as a unit of data: sarcasm or serious?

~~~
burgerbrain
It's not a unit, it's a magnitude or count. Used in similar situations as "a
lot". It's neither 'sarcastic' nor 'serious', but rather 'casual'.

------
joubert
Interesting statistic: _It has been said that 78% of all statistics are made
up._

~~~
frobozz
And people are 86.3% more likely to believe a statistic if it has a decimal
point in it.

~~~
sp332
Richard Feynman was giving a lecture and, as usual, had gone off-script. He
mentioned some historical event, but got the last digit of the date wrong
(like 1951 instead of 1957 or something). He said, "Hey! 3 significant digits
is pretty good for a theoretical physicist!" :-)

Edit: I can't link to the middle of a Silverlight presentation, but if you
visit <http://research.microsoft.com/apps/tools/tuva/> , click the middle
(with the picture of Feynman), click Lecture 5, and skip to 17:00 in, you can
see the incident. But you really should watch all of them, especially the one
on Symmetry.

Dang Silverlight!

------
corin_
I wonder if anyone would be able to calculate the amount of data created in
the last two millennia... and if so, how.

~~~
simias
I guess the first issue would be to define what "data" is. Speech is data,
weather is data...

~~~
arethuza
"Data (plural of "datum") are typically the results of measurements"

Weather isn't data, a measurement of temperature would be data...

<http://en.wikipedia.org/wiki/Data>

[Not that I am trying to imply that other things aren't data - but I don't
think weather itself is data although you can, of course, have weather data!]

~~~
zb
The quote said "information", not data.

Edit: "There were 5 Exabytes of information created between the dawn of
civilization through 2003, but that much information is now created every 2
days."

------
fxj
information is not all equal. recording from /dev/random is not valuable
information even though it fills up disk space. the value of information
depends very much on the context.

------
Tichy
A lot might have happened since 2002. People with digital cameras take a lot
of pictures, for example. YouTube is booming. Lot's of devices generate
automatic data feeds, for example location tracking from mobile phones,
clickstreams on the internet.

The number might still have been made up, but let's not forget that Schmidt
might have some sources of information no available to the public, for example
the server stats from Google and YouTube.

------
dvdt
How timely! I was actually at a Google recruiting event/tech talk today at my
university, where a Google engineer repeated this quote to us. Fittingly, he
also misquoted it and said that 5 exabytes of data are created every day,
instead of every two days as in the original quote. I looked at him askance
for a moment due to the absurdity of the number--thanks for clearing it up!

------
JacobAldridge
"We now create and replicate as much data in one week, as we did in one year,
just a decade ago."

True, not as catchy as the dawn of time, but still mighty impressive. And in
fairness to their outgoing CEO, Google didn't cache much data at the dawn of
time (or even in the '80s), so it can't have been _that_ important.

------
HyprMusic
Perhaps the figures he was given were based entirely on computer data - and he
quoted them to sound like all data?

------
patrickgzill
My tummy rumbled and I burped at 9:22AM EST this morning. Now that I have
posted this: is that a piece of information?

My point is that a lot of this "information" is ephemeral and not really all
that important in the long run.

~~~
vinutheraj
Now it is.

Reply to the edit: If you stretch the timeline long enough, nothing is of any
importance.

------
maeon3
Yes we are creating more data now than in the history of mankind. However the
ratio of (quality stored data / total data ) has gone down with the ease of
storage. Most of the "data" is for entertainment.

------
JonnieCache
Lies, damn lies, and clichés.

