
A cheap way of storing information in DNA - oedmarap
https://qz.com/1314803/storing-information-on-dna-is-now-cheap-enough-to-be-viable/
======
chuckledog
Article is a bit light on the “how” part. Found this over at
[https://www.wired.com/story/the-rise-of-dna-data-
storage/](https://www.wired.com/story/the-rise-of-dna-data-storage/) :

“What Catalog does, instead, is cheaply generate large quantities of just a
few different DNA molecules, none longer than 30 base pairs. Then it uses
billions of enzymatic reactions to encode information into the recombination
patterns of those prefab bits of DNA. Instead of mapping one bit to one base
pair, bits are arranged in multidimensional matrices, and sets of molecules
represent their locations in each matrix.”

~~~
tiatia123
[http://pdf.sumobrain.com/US20180137418A1.pdf?AWSAccessKeyId=...](http://pdf.sumobrain.com/US20180137418A1.pdf?AWSAccessKeyId=AKIAIBOKHYOLP4MBMRGQ&Expires=1530404882&Signature=nx2ToGP5EIplnlpeZ51dTqzEA8w%3D#view=FitH)

------
bawana
as long as were talking about unorthodox ways to store info, does anyone know
about the information density possible in laser engraved glass? You know,
those blocks of glass with a design etched into the interior? If we can
produce an etching in the middle of a block of glass,why can't we encode
symbols as collections of 'pits' in the glass. If we can etch transistors in
silicon at 14 nm (ok, lets be generous and say 25 nm), then we can produce a
25 nm pit in a transparent hunk of silicon dioxide.Add 25 nm for the blank
space next to it needed for its definition and a bit is 50 nm. In a 1 mm cube
you code fit 8000 bits or 1000 characters. 16 megabits per cubic inch.
Compared to 1.2 nm per character in DNA it seems low but remember that DNA is
linear. In a square inch (2.54 cm x 2,54 cm) DNA can store 5.38 terabits. Hard
disks are at 1 terabit per square inch. I probably do not have the technical
knowledge to actually know what the smallest laser engraved dot in glass could
be. And then again you only end up with a ROM

~~~
antcas
There's this: [https://www.southampton.ac.uk/news/2016/02/5d-data-
storage-u...](https://www.southampton.ac.uk/news/2016/02/5d-data-storage-
update.page)

"Using nanostructured glass, scientists from the University’s Optoelectronics
Research Centre (ORC) have developed the recording and retrieval processes of
five dimensional (5D) digital data by femtosecond laser writing.

The storage allows unprecedented properties including 360 TB/disc data
capacity, thermal stability up to 1,000°C and virtually unlimited lifetime at
room temperature (13.8 billion years at 190°C ) opening a new era of eternal
data archiving. As a very stable and safe form of portable memory, the
technology could be highly useful for organisations with big archives, such as
national archives, museums and libraries, to preserve their information and
records."

------
TeMPOraL
Could someone shed the light on how useful this concept is? Aren't the data
densities we're working with in modern silicon already better than what DNA
storage could offer us (or at least in the same ballpark)?

~~~
xevb3k
From memory the base spacing on double stranded DNA is ~0.4nm. So it’s
significantly dense. They’re taking about using 30 bases to encode a single
symbol, so density is worse than that.

However the basic density isn’t really the factor of interest. What’s more
interesting about DNA is that you don’t need to store it on a surface. So
while with semiconductors, you can only really cover a 2d surface (and maybe
stack a few dies for density). With DNA you can pack a lot more information
into a 3 dimensional area.

I however am not particularly bullish in DNA storage. We don’t really have the
read systems we’d need to make this viable (still costs ~1000USD to read the ~
1Gigabyte human genome and > 1 day). Unless these guys have really developed
something very novel, we don’t have the write systems either.

Both the read and write systems are also highly errored compared to digital
storage (like error rates of 1 in 100). So their are many issues...

~~~
moyix
> still costs ~1000USD to read the ~ 1Gigabyte human genome

This piece is dropping fast though; Dante Labs will currently do it (with 30X
coverage) for individuals for $500 [1]. And I believe the price for labs that
do a lot of sequencing is significantly lower per-genome.

Writing is still a much bigger challenge, though. Writing something the size
of the human genome is currently considered a "grand challenge" and is being
tackled by HGP-Write:

[https://en.wikipedia.org/wiki/Genome_Project-
Write#Human_Gen...](https://en.wikipedia.org/wiki/Genome_Project-
Write#Human_Genome_Project-Write)

[1] [https://us.dantelabs.com/products/whole-genome-sequencing-
wg...](https://us.dantelabs.com/products/whole-genome-sequencing-wgs-full-dna-
analysis)

~~~
xevb3k
That’s an impressive price, do you know anyone who’s done it? I’d be
interested in knowing what the experience was like.

Illumina appear to be targeting 100USD right now. I think their markup on
reagents is probably at least x10...

But even that’s super expensive compared to reading data from a HD... so you
have to wonder where DNA as storage is viable. Unless a vastly cheaper read
method becomes available.

~~~
moyix
I don't know anyone who's done it (it's still a bit too expensive for me to
give it a shot) but it appears to be a fairly standard saliva sample kit that
you mail in.

Yes compared to HDD it's way too expensive to read. However tape backups are
also cumbersome and time-consuming to use, but the fact that you can get
extremely cheap bulk storage from them means lots of big companies still use
them. So I could see it maybe being competitive there ( _eventually_ ).

------
newfocogi
Not a biology expert, but how do you prevent mutations from ruining the data?
If the DNA isn't being biologically replicated, why use DNA over any other 4
distinct chemical structures that can be added to chains or lattices of
molecules?

~~~
jaskcheng
> How do you prevent mutations from ruining the data? In much the same way as
> is done when using existing data storage systems, as well as in
> telecommunications: error detection and correction.

A recent approach
([https://www.biorxiv.org/content/early/2018/06/16/348987](https://www.biorxiv.org/content/early/2018/06/16/348987))
includes synchronization nucleotides. This constrains the sequence space,
aiding recovery of data from multiple strands, even if those strands contain
errors (provided that they do not all include the same error).

Additionally, different types of errors are more likely in certain contexts,
depending on the synthesis and storage approach. This particular paper
utilizes enzymatic synthesis (which is the new hotness in DNA technology) and
the authors model these error probabilities.

Combined, with a couple of other nifty tricks, they demonstrate ~30% error
tolerance across 10 strand variants (i.e. copies of the same data containing
errors), albeit for a very short sequence.

> If the DNA isn't being biologically replicated, why use DNA over any other 4
> distinct chemical structures? In many cases, the DNA _is_ being biologically
> replicated -- not within organisms, but biochemically via enzymatic
> reactions. One attractive feature of DNA storage aside from its density is
> that it's cheap to make a ton of copies. After all, Nature does this all the
> time as cells divide. And given the cost of mutations (most are
> detrimental), DNA polymerases have evolved (ironically?) to be high-
> fidelity. Something like 1 error in 10^8 base pairs.

And that touches upon a broader point: DNA is an attractive substrate for
alternative storage relative to other polymers because its so widely utilized
in nature and, as a direct result, we've developed a lot of technology and
infrastructure around it. However, much of this development has been around
sequencing (reading), alignment and assembly algorithms, and, though still
sorely lacking, biological interpretation. It's exciting to see growing
interest over the past few years in DNA encoding and synthesis. Remains to be
determined which approach(es) become workable standards for the field.

------
zaroth
So you lose all your data if the temperature rises? If it’s not shelf stable
it seems a bit niche...?

~~~
tiatia123
My gut feeling is that it is for backup solutions and storage in the cloud,
not real time applications.

------
craftyguy
So how long before carefully crafted images, text, etc are 'encoded' into the
DNA of a highly lethal virus?

~~~
mirimir
It doesn't need to be lethal, just highly transmissible. And persistent. Such
as Herpes. And not necessarily even symptomatic.

Here's the requisite quote from _Echopraxia_ , by Peter Watts:

> Even DNA computers, custom-built for a specific task and then tramped
> carelessly into wild genotypes like muddy footprints on a pristine floor.
> Nowadays it seemed like half the technical data on the planet were being
> stored genetically. Try sequencing a lung fluke and it was even money
> whether the base pairs you read would code for protein or the technical
> specs on the Denver sewer system.

------
yummybear
If you can encode any random sequence of data, what's stopping someone from
writing a deadly virus?

~~~
gnode
My understanding is that in this approach, rather than producing a long
sequence of DNA, they're making tiny sequences of address and data snippets,
then assembling them with enzymes randomly. Containing addresses allows the
data to be assembled back into the correct order.

With typically three base pairs per amino acid, encoding complex proteins in
30 base pairs isn't possible. That said there are other methods to assemble
longer sequences.

Nothing is technically stopping someone from writing a deadly virus, except
that you'd have to engineer its DNA (not a trivial task). Why do that when
nature has created plenty of deadly viruses to choose from? Furthermore, for
military purposes, you would want something controllable, so you can deploy it
without also killing yourself. Bacterial agents tend to be more suited to
this.

~~~
tiatia123
"engineer its DNA (not a trivial task)."

It is actually trivial. You can look up the DNA for most viruses. If it is
long you could just ligate a few oligios together. I wonder if oligo synthesis
companies have black lists of specific sequences if a random person orders
them.

Also, many viruses are RNA based. More tricky than DNA. But in that case I
would just order DNA, build a complementary DNA virus and then use RNA
polymerase to make the RNA.

" for military purposes, you would want something controllable"

I am not sure the guys that I am thinking of that would be interested in such
a thing would worry too much about that. I guess if you are willing to fly a
plane into a tower then you would not be too concerned of dying in the process
of developing a virus.

Better than a virus may be the Mutagenic Chain Reaction. Maybe my genes are
better than yours and I want to spread mine? ;-)

[http://science.sciencemag.org/content/348/6233/442](http://science.sciencemag.org/content/348/6233/442)

"The threshold necessary for small groups to conduct global warfare has
finally been breached, and we are only starting to feel its effects. Over
time, in as little as perhaps twenty years and as the leverage of technology
increases, this threshold will finally reach its culmination -- with the
ability of one man to declare war on the world and win." Brave New War back in
2006

~~~
gnode
> It is actually trivial. You can look up the DNA for most viruses.

I had assumed here that "writing" meant engineering new code, not copying from
existing viruses.

Terrorists getting hold of viruses has always been possible. If someone were
so inclined, I'm sure they could get hold of an Ebola strain and pass it
around somewhat, using no special technology.

------
macca321
I had this neat idea that someone could write their name in their dna along
with a a gene drive using Crispr and so that it would be copied forever to all
of my descendants, until eventually my name was spread throughout all
humanity.

