Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is what my thesis is about (I titled it "Rehabilitating Junk DNA"). It isn't that complicated. Your resonance comment is gibberish, sorry. The DNA is a giant tape in base 4 (ATGC). There are programs written at random places on the tape that in a 64 bit encoding (i.e. triplets of quaternary bits). This is "coding sequence", about 1% of the genome (itself 3 gigabases, so 30 Mbases of coding sequence). A tape head (polymerase) comes and reads these out into RNA. The RNA is like a flash copy that is sent to a compiler (ribosome) that translates it into a working protein molecule (a sequence of 20 amino acids with the mapping encoded in base 64, see "genetic code"). There are control sequences proximal (~within 30k bases) to each program that tell the tape head when and how much to read each program into flash (i.e., transcribe RNA). These control sequences are in the 99% non-coding part of the genome. In my thesis i estimated about another 1% is actual control sequence. The rest may be random noise, we'll see.


Nice explanation but you seem to be saying that at most 3% (being generous) of DNA would be functional (if the rest is “random noise”). This number is over twice as small as the lowest estimate (~ 8%) of even the most fervent critics of ENCODE’s number. I’d love to hear a brief break-down of your reasons, if possible.


Eh, I graduated in 2008. ENCODE was only a pilot yet. We didn't know about lincRNAs. My estimate was based on multi-species sequence alignments so it is pretty conservative. At the time it was not, however; it was doubling the amount of functional sequence. To be honest i am not sure what the current best estimates are based on (i switched fields). 8% seems absurdly high to me, but there is a ton more evidence now than i had.


Ah, I misunderstood your commend as saying that you’re currently working on the thesis.


As an added complication, a decent proportion of DNA in the genome is meaningful at least in how long it is, because it separates two functional pieces of DNA that interact in 3D space due to the way the DNA wraps around other stuff. If you change the content of this DNA, it doesn't cause disease, but if you change the length, it does.


Fascinating, thanks for breaking it down into data structures.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: