Hacker News new | past | comments | ask | show | jobs | submit login
DNA Data Storage: The Entirety of YouTube Could Fit on a Teaspoon (popularmechanics.com)
36 points by dpflan 9 months ago | hide | past | favorite | 22 comments

The first write-only (to a first approximation) memory storage. Reading back petabytes of DNA is going to be a challenge.

Indeed, one could only imagine the cost of sequencing the dna to retrieve the data (not to mention the current lack of random access). Illumina's highest capacity sequencer will do 6Tb (terabases). The machine costs about half a million dollars and each run is tens of thousands of dollars, not to mention the lab costs of preparing/storing the dna. Additionally, the depth at which one would have to sequence to get _all_ of the data back reliably would be >1 meaning that every base would have to be sequence more than once (to avoid sequencing errors).

"each run is tens of thousands of dollars"

I'm not familiar with sequencing economics, would you mind explaining the cost of a "run"?

A sequencing run (at least in the context of Illumina's technology) requires a few very expensive consumable reagents: First being the flowcell (microscope slide that the dna sticks to while being read by a laser), the reagents (containing enzymes with fluorophores and other reagents for amplifying and manipulating DNA), and the actual power consumed by such machinery. This does not factor in prep/lab costs (which can be kept at a minimum with automation, but that also is a high startup cost endeavor). Each sequencing run can take ~1-3 days depending on the format.

Edit: this video may be able to explain a little better how this process works: https://www.youtube.com/watch?v=fCd6B5HRaZ8

Thanks for the explanation. I’ve seen research on graphene-based nanopore sequencing, but my knowledge and understanding are shallow.

Illumina has been stuck at 1000$ per human genome for a while.

New methods are already here, long read sequencing direct from source with minimal preparation [0]. It doesn't say much about cost, but considering reduced preparation step and smaller equipment it should be a fraction of Illumina.

[0] https://www.nature.com/articles/nbt.4125

Nanopore sequencers may require less prep and may be cheaper, but their sequencing error rate is astronomical, so you'd end up doing a lot more sequencing of the same material before you reached a consensus sequence.

I believe the nanopore sequencers would fare a lot better if the parasitic capacitance across the membrane could be minimized, by decreasing their surface area, alternatively fluorescent readout of the pore itself

Do you know anything about the speed of and techniques for read? What advances in sequencing will be required to support efficient and quick read?

rather than just talking about the density of storage it would have been interesting to get some figures on the read and write speed. I figure that writing things to DNA and/or reading it is incredibly complicated and slow. What's the use case for something like this, putting information into DNA and then freezing it somewhere?

I work for a commercial synthetic DNA producer in R&D. At the moment writing DNA is the slow part. There has been an almost Moore's law like reduction in price/increase in speed for sequencing, but so far synthesis has lagged. The state-of-the-art is phosphoramidite synthesis which has been around since the 80's. We have optimized and streamlined the process, but a single base addition (writing one 'bit') is still in the range of 100's of seconds. Also the length of DNA produced by phosphoramidite synth is limited to a couple hundred bases in most cases (although it's possible to extend that out to hundreds of bases at increased cost/reduced efficiency). So long DNA has to be stitched together from shorter strands. The other issue with synthetic DNA is quality so methods like microarrays are not necessarily the best (quality cannot be objectively measured on a per strand basis).

There are some interesting developments out there, like using enzymatic methods to synthesize long strands, but a lot of that technology is still in it's infancy.

I would be happy to discuss the field if anyone is working in it (or has interest)

I’d like to discuss! Email in my profile

Hi Dan, I've actually been meaning to reach out to you, your company is doing some really exciting things. I don't see your email in your profile, so I will send a quick email to your company info email. Otherwise feel free to send a quick message to sbearden at idtdna dot com

Seems like the ideal use case would be 1) massive amounts of data needing to be stored 2) data integrity is important but non-essential 3) data density is more important than data transfer.

Maybe sending data via rocket ship (since every ounce counts), when interception of signal is a concern (some reason to avoid using light)?

there isn't a use case (or an economic story) yet.

Yup. These ideas have been kicking around for 25 years.

In theory, you could use DNA-based reactions to solve problems. Since DNA is quite small and the reactions run in parallel (more or less), you should be able to brute-force some otherwise intractable problems. The classic example of this is Adleman (1994), which uses DNA to find a Hamiltonian path. (here: https://www2.cs.duke.edu/courses/cps296.4/spring04/papers/Ad...)

In practice, anything involving DNA is a bit fussy and error prone, and exponential growth means you'll eventually drown in DNA too.

When Adleman gave a seminar I was an undergrad at UCSC. We chatted with him afterwards and it was fun to calculate that, using his design, if we wanted to scale up to solve more interesting problems, you'd have to have a bathtub's worth of solvent and very good mixing (better algorithms aren't as wasteful).

I don't see any DNA-based computing or storage systems as being capable of outdoing a general purpose computing system that already exists off the shelf, for the foreseeable future, for any reasonable use case.

Reminds me of ST:Voyager's bio gel packs

But ST:Voyager never had the DMCA.

Imagine the copyright strikes of storing (and replicating) all of YouTube in a teaspoon.

It's great until your hard drive gets the flu.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact