
Harvard stores 700 terabytes of data into a single gram of DNA (2012) - ghosh
http://www.extremetech.com/extreme/134672-harvard-cracks-dna-storage-crams-700-terabytes-of-data-into-a-single-gram
======
jorjordandan
It's weird when you read about tech advances that are so profound, you have to
stop and consider the philosophical implications. I have a dime on my desk
that probably weighs 2 grams... something that small could contain more raw
data than a person could possibly read in their entire life... that could
contain 1.4 million hours of cd quality audio. Insane!

~~~
ikeboy
What does that have to do with philosophy?

~~~
valarauca1
Considering and quantifying the some total of your actions in life is largely
philosophical in nature. Yes its not writing a paper on the applications of
Friedrich Nietzsche in 2015, but everyone also isn't an academic.

Much like figuring out how much to tip your waiter is a mathematical exercise.
Just one that might not shake the foundations of academic mathematics.

Just because territory is well travel, doesn't mean everyone has walked there.

~~~
ikeboy
By that definition, everything is philosophy.

~~~
benkillin
[http://xkcd.com/903/](http://xkcd.com/903/)

"Wikipedia trivia: if you take any article, click on the first link in the
article text not in parentheses or italics, and then repeat, you will
eventually end up at 'Philosophy'."

------
skywhopper
My quibble is with the claim that 3TB (3.5") hard drives are the "densest
storage medium in use today". Yes, the article is two and a half years old,
but by then 32GB MicroSD cards had been around for a couple of years. 100 such
cards would exceed the capacity of a 3TB hard drive and weigh a tenth as much.

~~~
skosuri
Yeah, the article isn't great. We go through how we calculated densities; we
didn't use hard drive enclosures, but did include thickness for HD platters in
calculating density. In general, magnetic tape mostly wins as they are so
thin.

------
matthewrhoden1
My first thought was, how long did it take to write and read this data? I
don't expect exceptional speed, but I do wonder about throughput and seeking.

~~~
boomzilla
FTA: While it took years for the original Human Genome Project to analyze a
single human genome (some 3 billion DNA base pairs), modern lab equipment with
microfluidic chips can do it in hours.

OK, so assuming 1 hour for 3 billion bases, it's 1000 hours for 3 trillions
bases (3 Terabits) or 1 million hours for ~3 petabits (or 400TBs of data).
Yeah, that's a long time, roughly 100 years :)

~~~
laxatives
I believe that also consumes the strands you're reading, so unless the source
is copied many times, it is probably even slower than that.

~~~
andrewstuart2
Good thing cells have been copying DNA since life began.

~~~
schrodinger
Great, now my hard drive is going to get cancer!

~~~
andrewstuart2
At least you can remove/destroy the cells though. Aside from DNA storage &
replication, there's no auxiliary purpose of the tissue.

[http://xkcd.com/1217/](http://xkcd.com/1217/)

------
alevskaya
DNA storage is pretty silly. The claim is 5.5petabits/mm^3 with a 100x
redundancy. The problem is that massive storage is only as good as your
ability to encode/decode into it.

Let's be clear: this work encoded just 5.27 megabits. It's stored in what's
basically a large molecular hash table where each piece of key-value data is
replicated a million times for redundancy. Each piece then read 100x to
correct for the -abundant- errors in each piece. So they encoded less than a
megabyte.

The problem with encoding information into DNA is that writing serial polymers
accurately is difficult and slow. In this paper they're using an inkjet
printed DNA array. It takes a day to make them, resulting in a bandwidth of:

(5.27 Mbits) / (24x60x60 sec) = 66 bits/sec

Reading is a little faster. The fastest system, the HiSeq 2500 reads 120Gbits
raw in 27 hours. Factoring in the necessary 100x redundancy, one has a
-maximum- read rate of:

(120 Gbits / 100) / (27x60x60 sec) = 12 Kbits / sec

So for 5.5 petabits it would take 16,000 years to write the data into a cubic
mm but only 16 years to read it at the current rate.

If we get a little scifi, and assume we build programmable polymerases and get
nanopore (direct read) sequencing. Even then, physics limits you to something
like 1000 read/writes per sec per pore/polymerase. Instrumenting to these will
probably limit per-feature size to being larger than 100micron on a fabricated
chip, giving us an ultimate read/write limit around:

(2cm/100um)^2 x 1000 bit/sec = 40Mbit/sec

for a giant 2cmx2cm chip. With the necessary error-correction and redundancy,
it's probably going to cap out around 1Mbit/sec at best.

Those 700TB take about 2months to read/write at these rates, and we'll have
much-better solid state storage technologies by the time we figure out how to
all that with DNA.

~~~
skosuri
Largely agreed, though I think information storage in sequenced polymers in
general is fairly interesting, but we are long ways off. Also, there are
applications for DNA storage that are somewhat interesting when weight becomes
very important (space travel) or biocompatibility (barcoding food
ingredients).

------
slayed0
If they have possible values: TGAC, why not convert everything into base4?
Wouldn't you get astronomical gains in storage?

~~~
elmin
Not astronomical gains. Ignoring the technical challenges of a biological
system like DNA, base 4 can encode the same data in exactly half as many
characters, so capacity would double.

~~~
slayed0
I don't think that's right. Assuming we had 5 characters to encode data,

with base2 we get 2^5 = 32 possible combinations

with base4 we get 4^5 = 1024 possible combinations

~~~
epistasis
But storage length is the log of the number of possible combinations, so
you're back to just double the amount of storage.

~~~
slayed0
Both functions (2^x and 4^x) have exponential growth, but their exponential
growth is not linearly related.

2*(2^x) != (4^x)

~~~
epistasis
We are talking about data, which is the log of the number of combinations,
like any measure of information. If this interests you, definitely look into
basic information theory, and then move onto coding theory.

Take, for example, 32bit integers vs 64bit integers (unsigned for simplicity).
Two 32 bit integers can represent exactly the same number of combinations that
a single 64 bit number can. Sure, there's an exponential number more
combinations in a 64bit integer than a 32 bit, but the number of combinations
is not how the storage capacity is measured.

------
mariusz79
The era of sexually transmitted data is upon us. :]

~~~
swalsh
What would be great is each time you copy that data, it mutated a bit... and
if the mutation was positive, you could copy it again.

~~~
spacemanmatt
I see what you did there, with that...

------
logicallee
Particularly impressive when you consider the human genome is only ~700 MB.[1]
So this is like a million times as much data as human DNA stores, but in the
same DNA format. Impressive.

[1] [http://stackoverflow.com/questions/8954571/how-much-
memory-w...](http://stackoverflow.com/questions/8954571/how-much-memory-would-
be-required-to-store-human-dna)

------
olso
What is the current state of this research (since this article is 2012
material)?

------
acadien
Doesn't DNA have a halflife of like 13 years or something? Wonder what the
error rate is on 700TB of DNA just sitting there for a year. I guess you could
engineer a system with multiple redundancies and checksums, with that kind of
density raid2 or raid1000 doesn't make a difference.

~~~
skosuri
Depends on how you store it, but if it's dry it's much much longer. e.g., see
here:
[http://onlinelibrary.wiley.com/doi/10.1002/anie.201411378/ab...](http://onlinelibrary.wiley.com/doi/10.1002/anie.201411378/abstract)

~~~
acadien
Haha that paper was published today! Are you the author?

This is pretty interesting. I'd love to read the paper but it looks like I
can't download the article yet since its not finished uploading to the site.

~~~
skosuri
I am an author on the 2012 paper.

------
skosuri
This is pretty old, and not sure why it's here again today. That said, I was
an author on the paper, and I'm happy to answer questions during a useless
meeting I have to attend in 30 minutes.

~~~
acadien
The last author too, excellent!

What kind of IO bandwidth can you currently get from DNA? Does reading from it
damage the DNA? What kind of hurdles need to be overcome to bring this to
market and are they hurdles we can overcome in the near future?

edit; Hey looks like you answered most of these else where, so thanks.

------
html5web
Soon, we will use our body as storage and eyes as monitors :)

~~~
Terr_
We don't already?

------
JoeAltmaier
I look forward to a portable storage device that has all the books, and all
the movies, on a chain around my neck.

------
iMark
So, what's the latency?

------
cft
This article is 2 years old. Are there any new developments?

------
mschuster91
How robust is DNA against natural or artificial radiation?

~~~
kyzyl
A question that's somewhat akin to asking, 'How robust are humans to flying
projectiles?' ;-)

Of course, it's all about the energy. I'm not a biologist, so I won't speak to
the details of how resilient DNA is to specific types of exposure or why, but
I am a physicist and I can tell you that if you blast DNA with high energy
radiation, like gamma rays, it's gonna have a bad time. That said, if we want
to talk about robustness in terms of what it's likely to be exposed to, then
it's pretty darn robust. We are exposed to a wide band of EM radiation on a
constant basis, and can even withstand exposures on the higher end without
becoming 'corrupted'.

I'd also note that standard, non-specially designed electronics aren't very
resilient to radiation either. Not to mention, if you could reliably
read/write data using DNA, I'd expect that redundancy would become... easy.
Security on the other hand...

~~~
Zancarius
> Security on the other hand...

Sneeze once and everyone has a copy? :)

------
aceperry
And gets sued by the patent holders of the DNA code!

------
v1p1n
the market for archival storage solutions is getting bigger and bigger... this
seems like a great step forward for biological storage.

------
ikeboy
Someone remind me why this isn't in our hard drives yet.

~~~
RIMR
Because this is a discovery, not an invention. If they could make high-
performance biological storage devices the size of hard drives, you don't
think they would be doing this already?

This probably required a machine significantly larger than most desktop
computers, with a pricetag in the $100,000+ range. That and it has a R/W speed
of ~30Kb/s...

It will be in our hard drives when the technology has been developed to make
that happen.

I mean, graphene is already looking to be a far more efficient and powerful
alternative to silicon. Why isn't that in our processors yet? It's because
creating a logic gate and creating an x86 CPU are entirely different things.

The same way that encoding a bunch of data into DNA and replicating it a few
billion times is entirely different from making an on-demand biological data
storage device.

~~~
ikeboy
Thank you. Although, it doesn't make sense that they copied 700 TBs if the
speed is 30kb/s.

Any idea on how long this will take to get to market, and whether it's being
worked on?

