Essentially you sequence tons of short bits of dna and then either fit them toge...

Essentially you sequence tons of short bits of dna and then either fit them together (assemble) or fit them to a reference (align). You can find example data sets in the Short Read Archive: http://www.ncbi.nlm.nih.gov/sra/

Cloudburst (a hadoop based aligner) has a good description of an algorithm: http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.p... Though they can get much more sophisticated and there are a number of open and closed source implementations...I only link this one because of the quality of the figure.

The data sets we work with in my group can be up 400gb's of compressed text for the reads from a single individual.

Another example from biology with a similar computational profile would be searching through a hugh number of mass spectrometer outputs to identify the components in a new sample.