

"Computing resources for genome data soon exceeds those of Twitter and YouTube" - samuell
http://www.nature.com/news/genome-researchers-raise-alarm-over-big-data-1.17912

======
samuell
Arvados.org to the rescue:
[https://twitter.com/peteramstutz/status/626395473704845315](https://twitter.com/peteramstutz/status/626395473704845315)
:)

(Yes, it is one of the most promising solutions to the problem)

------
tetron
The Arvados project [https://arvados.org/](https://arvados.org/) is an open
source scale-out storage and compute platform designed to address the needs of
huge data, like genomics.

------
ende
Genomics data is only 'big data' until you have an alignment. After that the
raw data can be archived or even deleted. Most secondary data such as variants
and expression data are not large at all. The only real problem in this field
is that bench biologists tend to rush head first into sequencing without
involving IT early in the planning process. The tools already exist, it's the
communication that is lacking.

~~~
abetusk
Are you sure about that? What happens when you get to a million cohort that
you want to analyze?

