

Most data isn’t "big" and businesses are wasting money pretending it is - donohoe
http://qz.com/81661/most-data-isnt-big-and-businesses-are-wasting-money-pretending-it-is/

======
rpedela
I completely agree. I have always hated the phrase "big data". It is a
meaningless marketing phrase. As the article points out, very few IT companies
have single data sets which are in terabyte or larger scale. Although the
total size of their data may be in terabytes or higher, the size of an
analyzed data set is usually in megabytes or gigabytes.

You know who actually does "big data"? Physicists. There are some telescopes
that generate terabytes of data per second. There are other scientific fields
which generate very large, single data sets such as genomics, proteomics,
climate, etc.

~~~
twic
Even genomics isn't really that big. The human genome is three billion base
pairs; that's 750 MB if packed as tightly as can be. The largest genome (that
of _Paris japonica_) is 150 billion base pairs, 37.5 GB.

You might be storing multiples of that amount of data if you are doing a
shotgun sequence assembly, or looking at polymorphisms, but then you're likely
to have no more than 10 copies, and i can't imagine you'd have more than 100
copies. For the human genome, 100 copies would be 75 GB, which is still only
about four times the size of Team Fortress 2.

~~~
rpedela
A single genome, true. But what if you wanted to compare 100,000 genomes of
people with heart disease to find common genes for heart disease? Now you are
at ~72 TB which I would consider big data. As far as I am aware, enough human
genomes have not been sequenced to do that. But it is coming (I hope).

~~~
comrade_ogilvy
If you were doing blind correlations between all the raw data at once, it
would be 72 TB.

There are ~30k genes (encoded protein sequences).

100,000 human individuals may or may not have that many deltas per gene.

The data is naturally structured. There is no reason to believe that looking
at 72TB at once is useful. You would start by looking for correlations with
particular genes. (Even correlations with random genes is merely 30,000 simple
problems.)

This kind of work can be done on a single multicore computer with a lot
diskspace and memory, if you have the right tools.

------
na85
I always thought "Big Data" referred to the industry, similar to the terms
"Big Pharma" and "Big Tobacco": i.e., very large firms with very large budgets
that focus on data science/storage, pharmaceuticals, or cancer sticks.

Using "Big Data" as a noun to refer to actual data sets offends my grammatical
senses.

------
dweinus
Most data AREN'T big.

~~~
scribu
Actually, no; "data" can be either plural or singular.

In this case, you can substitute "data" with "information", and "most
information isn't big" definitely sounds grammatical.

Source: <http://www.merriam-webster.com/dictionary/data>

~~~
dweinus
They may both be accepted, but as your link notes, the singular is avoided by
most publications, and I think for good reason. The singular is not logical.
In the case of "information", the word is already in singular form and
describes a group, much like the word "flock" is a singular group of birds. On
the other hand, "data" is a plural form of "datum". It follows that describing
a group of data using a singular construction should require a different word
or phrase, such as "data set".

