
Solving the genome puzzle - xaverius
http://plus.maths.org/issue55/features/sequencing/index.html
======
daeken
You know, I never really thought about it before, but there really isn't a
whole lot of data in the human genome. There are approximately 3B base pairs
in it, which sounds like a lot, but think of it in terms of binary data. DNA
is composed of 4 bases, so that's base 4; converting to base 2 gives you
approximately 6 billion bits of data. While that's a lot of data to sift
through by hand, it's not a lot of data for a computer to deal with.
Obviously, there's a lot we simply don't understand yet, but I wonder how much
your average hacker can bring to the table. After all, That's only 5.5GB to
toss around -- nothing, these days.

~~~
philh
> That's only 5.5GB

6 billion bits = 0.75 GB. I wonder how compressible it is, on top of that.

Another thought: two humans are 99.9% identical, so a diff is only 0.75 MB.
Multiply by an estimated 6.79 billion world population. 5 billion MB = 5
petabytes. Google has enough hard drives to store the firmware to every human
being in the world.

~~~
bh42
Well.... proteins play an important role in how DNA is expressed. And you get
your initial set of proteins, like your DNA, from your parents (mom mostly),
so they set the starting conditions. Then during your life what you do has a
methylation effect on your DNA. Thus the total information that makes up a
human is actually quite a bit bigger then the information contained in his or
her DNA.

------
BahUnfair
This is awesome, I've encountered this bioinformatics stuff before in a
cluttered textbook and didn't follow it. Reading this at least gave me a grasp
of the knowledge. Should come in handy when it gets introduced in my course.

------
puredemo
I wonder how large the genome would be if all the "junk DNA" was stripped out.

~~~
nostoc
Most of "Junk DNA" is not junk. Think of it as metadata...

