You can get some of the data from the 1000 genomes project directly from Amazon, so you don't need to pay to download it. There's about 200TB of data there (so far).
What I'm working on is mapping those short sequences (50-75 bases) to the genome and then either looking for mutations or expression levels (how many of those reads map to a particular location). There are a couple of ways to do the mapping, but most these days use either a big hash table or a Burrow Wheeler transform.
http://aws.amazon.com/1000genomes/
What I'm working on is mapping those short sequences (50-75 bases) to the genome and then either looking for mutations or expression levels (how many of those reads map to a particular location). There are a couple of ways to do the mapping, but most these days use either a big hash table or a Burrow Wheeler transform.
http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transfo...
And that's all just to get the data that you can then do something else with (gene expression, variation modeling, etc...).