
How DNA could store all the world’s data - lzlarryli
http://www.nature.com/news/how-dna-could-store-all-the-world-s-data-1.20496
======
08-15
The real problem isn't storing the data, it's accessing it. There is no way to
address DNA, you can only "shotgun sequence" it. In doing so, you get random
fragments of around 200 bases (400 bits). You can't get one such fragment, you
get half a billion in one go, currently at a cost of around $5000. (Older,
much more expensive technology, got up to 1000 bases... sometimes, and only
100 fragments per machine run.) So how are you going to access your archive?
By sequencing the whole thing and (temporarily) storing it on a hard drive?

The manufacturers of modern sequencers (both Illumina and ABI) have been
talking about this for at least 7 years (i.e. as long as they've been selling
high throughput sequencers). They actually made a weaker claim: According to
them, it makes no sense to keep a sequenced genome, because just sequencing it
again would be cheaper than storing the data. In these 7 years, it hasn't
happened. Instead, ABI's SOLiD technology all but vanished. Actually storing
data in DNA is one step further, it's not going to happen for a long time.

(Source: My employer does a lot of sequencing. I talked to sales
representatives of both companies, and I work on data sequenced using
Illumina's machines. We store that data on spinning rust.)

~~~
patall
From what I get from my own research, the talk about HGP-write and a few chats
with Nick Goldman (who is a very funny guy) himself, the main problem is
neither storing nor accessing (which you can improve by probing and is also
not that important as a primary application could be archives) but mostly
synthesis which is still at minimum $1 per 10 bp.

And sequencing will become even cheaper when you do not do it from a library
prep but in a controlled buffer environment. It is just currently not getting
cheaper because there is no incentive for Illumina to do so (similar to Intels
position in CPUs), lets hope that ONT, BGI and who ever else still hopes to
get some market share (Ion Torrent, PacBio ...) can force them to evolve
(project firefly, yeah).

~~~
toufka
Synthesis is dropping fast, and will drop even faster in the near future.
There are a couple of 'humps' in the demand for synthesis. And plateaus in
between. Synthesis between 0 and ~200bp gets you all you need for PCR
(copy/paste). But if you can't do ~3000bp, you can't make a full-sized gene.
So people get used to PCRing everything. And there is simply no proper demand
for anything larger.

But with a few new players on the block (Twist, Gen9, and a few other
smaller/newer startups), the goal is to hit economical ~2-3kb, at which point
the race is back on again, and whole new markets will open up. And the moment
that happens, expect the price to drop again. Competition will kick back in
and everyone's price will drop.

The size of a moderate plasmid (~5-7,000) is another hurdle, and the size of a
small chromosome is another (~100,000).

Also, if you're ordering DNA in pools or bulk (have a good compression
algorithm), you can get the price/bp to come down even more.

------
blazespin
I've often thought that if we ever decide to send nano spaceships filled with
engineered DNA to populate other planets like spores we should include human
knowledge in the DNA so when the spores turn into an advanced civilization
they could read the DNA and learn about their progenitors.

~~~
catbird
Sounds like the plot to a scifi novel: Scientists discover that so-called junk
DNA contains physics equations, along with what appears to be coordinates to a
distant star system with an Earthlike planet.

~~~
abecedarius
[https://www.fourmilab.ch/documents/sftriple/gpic.html](https://www.fourmilab.ch/documents/sftriple/gpic.html)

~~~
catbird
Oh well, there is nothing new under the sun.

------
goldenrules
> The researchers' biggest worry was that DNA synthesis and sequencing made
> mistakes as often as 1 in every 100 nucleotides. This would render large-
> scale data storage hopelessly unreliable — unless they could find a workable
> error-correction scheme. Could they encode bits into base pairs in a way
> that would allow them to detect and undo the mistakes? “Within the course of
> an evening,” says Goldman, “we knew that you could.”

how does this work? are the mistakes consistent enough that we can design
encodings that rely upon them?

~~~
witty_username
FEC (0) basically adds extra information that can be used to fix errors. A
simple scheme is to just duplicate all the info (like a backup), but there are
much more clever schemes which are much more efficient.

[0]
[https://en.wikipedia.org/wiki/Forward_error_correction](https://en.wikipedia.org/wiki/Forward_error_correction)

------
chris_va
Even if you can make it work, DNA stability is poor.

I don't see why you wouldn't use a higher fidelity atomic storage solution.

~~~
jon_richards
DNA stability is quite high. To the point where there is actually a movement
to get scientists to stop freezing DNA for long term storage because it uses
large amounts of energy for no reason.

~~~
chris_va
Well, freezing/thawing will create sheer that destroys the DNA, so I think the
reason is different.

------
pronoiac
Do we need to keep the DNA away from bacteria? Would it not be digested for
nutrients or food? Or is that just propaganda from the salesman pushing memory
carbon for my looongterm data storage needs?

~~~
gww
DNAse contamination would be a big problem too. It would also be a relatively
easy way to "securely" erase your data.

~~~
agumonkey
And what about additional layers of redundancy ?

------
kyloren
DNA is also compressed in a very spectacular way. I wonder a similar
compression can be applied to data.

~~~
haloboy777
[http://thenextweb.com/insider/2016/04/28/microsoft-
turning-d...](http://thenextweb.com/insider/2016/04/28/microsoft-turning-dna-
ultimate-storage-device/)

Microsoft is already initiated towards this.

