You can get some of the data from the 1000 genomes project directly from Amazon, so you don't need to pay to download it. There's about 200TB of data there (so far).
What I'm working on is mapping those short sequences (50-75 bases) to the genome and then either looking for mutations or expression levels (how many of those reads map to a particular location). There are a couple of ways to do the mapping, but most these days use either a big hash table or a Burrow Wheeler transform.
Well, the raw output of a typical so-called "next gen sequencing" (which are actually very current gen) machine is around 1TB (at least, the ones we used here).
This is raw file though, so once processed (but not yet analyzed) I believe we have sizes around 50 to 100GB (but that's not really what I work on so don't quote me fully on this).
The next steps vary on what you want to do exactly, but it usually involves alignment of base pairs (basically, trying to tie together by their ends sequences of DNA but seeing if they "fit").
Essentially you sequence tons of short bits of dna and then either fit them together (assemble) or fit them to a reference (align). You can find example data sets in the Short Read Archive:
http://www.ncbi.nlm.nih.gov/sra/
Cloudburst (a hadoop based aligner) has a good description of an algorithm:
http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.p...
Though they can get much more sophisticated and there are a number of open and closed source implementations...I only link this one because of the quality of the figure.
The data sets we work with in my group can be up 400gb's of compressed text for the reads from a single individual.
Another example from biology with a similar computational profile would be searching through a hugh number of mass spectrometer outputs to identify the components in a new sample.