"Computing resources for genome data soon exceeds those of Twitter and YouTube"

samuell · on July 29, 2015

Arvados.org to the rescue: https://twitter.com/peteramstutz/status/626395473704845315 :)

(Yes, it is one of the most promising solutions to the problem)

tetron · on July 29, 2015

The Arvados project https://arvados.org/ is an open source scale-out storage and compute platform designed to address the needs of huge data, like genomics.

ende · on July 29, 2015

Genomics data is only 'big data' until you have an alignment. After that the raw data can be archived or even deleted. Most secondary data such as variants and expression data are not large at all. The only real problem in this field is that bench biologists tend to rush head first into sequencing without involving IT early in the planning process. The tools already exist, it's the communication that is lacking.

abetusk · on July 29, 2015

Are you sure about that? What happens when you get to a million cohort that you want to analyze?