I just ran a couple of flowcells last week. I've ran 8 total so far now. My impression is that it can be very inconsistent depending on the quality of the flowcell that they send you and your DNA prep. Great for smallish genomes (bacteria), not enough throughput/quality for large eukaryotic genomes (1gb+) unless you got money to burn.
It’s also comparatively expensive compared to other platforms (you can get a full human genome sequenced at high coverage for between 1000 and 3000 USD).
The error rate is stupidly high (somewhere between 10 and 20%) compared to Illumina or Ion Torrent who give error rates far less than 1%.
It can give very long reads, which are useful in some niche applications. But it’s been massively over-hyped (and over capitalized).
The neat thing is that it’s very small. But that isn’t really compelling given the very low accuracy.
> you can get a full human genome sequenced at high coverage for between 1000 and 3000 USD
This is not a "full" human genome, but a collection of 150bp fragments that can be realigned to an existing human genome. You cannot take this and infer the whole diploid genome of the individual. There is a huge amount that will be missed, and all of our current knowledge is based on this gappy picture of what's going on in single genomes and human populations.
> It can give very long reads, which are useful in some niche applications. But it’s been massively over-hyped (and over capitalized).
I think you're dismissing the technology out of hand because of biases derived from much more limited short-read technology that only allows us to reliably see small variants <50bp.
Without these long reads we can't see structural variation (SVs). There is an increasing amount of evidence that much of adaptive variation is driven by these kinds of variants. If you want recent evidence, see https://www.nature.com/articles/s41588-017-0010-y. There has long been evidence that there are huge copy number variations in humans, but these are still not evaluated reliably: http://science.sciencemag.org/content/330/6004/641.
We should be open to the possibility that our observational techniques are limiting our understanding how how genomes work. This has consistently occurred in the history of every observationally-driven science.
It's amusing to me that people assume that SVs are "niche" when even the limited surveys of genomes we've been able to do with short reads show that roughly an equal number of base pairs in the human population vary due to small variants like SNPs and indels and big ones like deletions, insertions, and large scale copy number variation: http://science.sciencemag.org/content/330/6004/641
I think you are not sufficiently recognising how much structural variation can be resolved from short reads. There is certainly some that can't but a large proportion can be with the right tools.
I've participated in several large projects that worked to detect SVs from short reads in humans. The results, which remain best of class, are simply disappointing. A tiny fraction of the variants detected were actually resolvable to near-base pair resolution. The vast majority were described in approximate terms, using estimates of breakpoints and allelic structure.
Most structural variation I've seen based on whole genome assemblies is not even classifiable into neat categories like "deletion" or "insertion". If you think that "most" things are detected with short reads then you are deluded by the dominant technology.
"No human genome has ever been completely sequenced" https://news.ycombinator.com/item?id=15534325 There are a lot of gaps where the amount of repetition makes it impossible to reassemble a complete picture from tiny fragments.
I’d agree with you, that long reads would be useful if the error rate wasn’t so shockingly bad.
There is, likely value in long reads, but what non-niche research applications are there for highly error’d reads that justify a valuation of several billion dollars?
Virtually all applications can benefit from long reads. There are already hybrid assemblers out there which take Illumina, Pacbio and Nanopore reads. The long reads tie the short reads together, whereas the short reads improve the accuracy.
The area where DNA sequencing will first be revolutionizing clinical practice is in sequencing pathogens for sake of identification. In these instances nanopore sequencing rules, because it can give answers in minutes.
Most clinical applications don’t need long reads. Pathogen identification from short reads is easy. Blood tests for cancer, and NIPT (which will likely be the first big applications) both use fragmented DNA in the blood, so long reads are not useful. Depth (lots of sequencing) and quality are far more important.
It's worth noting that those clinical applications were developed when technology didn't allow long reads, so "clinical applications don't need long reads" is at present a truism. There may be potential applications that require long reads that simply couldn't have been invented yet (albeit I haven't the slightest what those would be.)
Yes, but I would say quality is most important in almost all cases. Well, quality being defined as <1% error rate, which isn’t such a high bar.
The most compelling near term applications (NITP etc) use fragmented DNA, and long reads will have no benefit here.
So, yes. Long reads are useful, but you need to have at least reasonable performance in other respects. The same thing has been seen with PacBio, who have not played well in the market, despite having a read length advantage.
How long does it take to get the answer? Even if a big, expensive short read sequencing machine is in the building, it still takes a day or two to reach the necessary data.
The per base error rate is bad. In the case of pacbio, this error process approximates white noise, and so you can deal with it perfectly by increasing read coverage. Things are somewhat complicated with the nanopore tech described in this post, as errors may be correlated due to the way the basecalling is done, but in practice it's nearly as big a problem as you think it is.
For things approaching a read length the per-base error rate of a single read is simply irrelevant. In practice, with sufficient coverage (e.g. 20x) you simply don't care about the per base error rate of the reads.
>The error rate is stupidly high (somewhere between 10 and 20%)
The Insertion/deletion error rate is 20-30%.
The point mutation error rate is something 0.1-1% (higher than HiSeq but not crazy high).
This means with a semi-decent reference genome you should be able to do re-sequencing fairly accurately. It also means, that in conjunction with HiSeq reads you can do cheap genome assembly, using the HiSeq reads for coverage, and the minion reads for scaffolding.
$1,000 for the starter pack - what kind of use can you get out of that, and how much are the ongoing costs for the flowcells and anything else that needs to be replenished?
Are you using it for fun? professionally? academically?
Flow cells are 1000$ each (unless you buy a lot of them), the sequencer is basically a USB dongle with a uC in it and all the important stuff is in the flow cell. Flow cells are basically disposable (but you mail them back). You get one-ish use out of a flow cell though they can be "washed" and realoaded.
Reagent kits run a few hundred dollars depending on what your doing and you get several uses out of them.
IIRC it's not certified for use as a medical diagnostics device so it's "research only".
It's a nice system with great (none) upfront costs that has a lot of potential. But read quality and quanity per dollar aren't as good as what we get with our NextSeq (which by contrast costs something crazy like 300-400k)
No sequencer on its own will ever be certified. You always need a pipeline from sample to a clinically useful answer to be certified.
The problem here is that everything in that pipeline keeps improving at a fast base. Also the reference databases are updated all the time. Whereas sequencing will probably always give the right answer to the precise question it was given, I don't see any luck in certifying these methods any time soon.
NextSeq is expensive, but there are cheaper options (from 30k) and you can just send your samples to a sequencing service. I’ve seen costs for a whole human genome at high coverage of between 1 and 3000USD.
Be careful about defining "high coverage" as 30x. Many applications (especially cancer) really require 100x or more to overcome purity and ploidy, or to identify subclonal populations.
I am using it professionally for a lab. The best bang for the buck right now is to buy their starter pack which includes library prep kit and 2 flow cells. This is limited to one per person. I got two members of the lab to buy two starter kits, effectively getting the flow cells for ~500 each. I am sure a lot of people are doing this too.
I remember looking into this device before, but as I recall it was intended for small bacterial genomes. A quick Google turns up that a paper using the device to sequence a human genome has only just been published [0] hence the update to the website. I'm not at all familiar with the technology, but doesn't the fact that experts have only just now been able to sequence human DNA mean it would be pretty unlikely the device could be used as-is for "production" usage?
A few years ago, the sequencing field was really excited for this platform as an alternative/competitor to Illumina. Unfortunately, the MinION platform has never been able to generate enough high-quality data for most eukaryotic sequencing use cases. The MinION still has it's niche... long read sequencing can resolve problematic regions of the genome, and the minION is the only option for rapid sequencing in the field. For production genome sequencing centers though, Illumina's still the dominant tech. Their HiSeq X platform can put out a human genome right about $1k.
> Their HiSeq X platform can put out a human genome right about $1k.
That's what Illumina will tell you. But that really depends what you mean by "a genome" and what you intend to use it for. Getting anywhere near the quality of the Human Genome Project is not possible.
> Getting anywhere near the quality of the Human Genome Project is not possible.
That is an unrealistic and inappropriate metric to use.
The HiSeq X isn't mean for making new human reference genomes. It's meant for re-sequencing at scale, and speaking from personal experience, it does an incredible job in quality, speed, and cost.
Even if we could reach reference assembly quality with a hiseq run, we wouldn't want to use it that way. Having
reference genomes and annotations (hg19 and hg38) means we can compare and contrast individuals from a common foundation. It would increase the cost and time of genome analysis 100 fold if we had to do denovo assemble for each individual sequenced.
I think we should be clear that for $1k you get something that is entirely different than a human genome. It's just a set of short reads that can be mapped onto a reference genome. You can't de novo assemble it, so you won't be able to see a lot of variation. In fact we can't even estimate how much large scale variation there is because all the studies are based on the short reads.
While I've been out of the area for almost 3 years now (I sysadmin-supported GS FLX from Roche/454, MiSeq and HiSeq from Illumina, and PGM from IonTorrent), the thing to remember is that each sequencer has it's strengths and weaknesses depending on what you want. Do you want to focus on read length, run time, reads per run, reagent cost, equipment cost, accuracy, etc.
As for pacbio, if I remeber correctly, their cost was low (high equipment cost though) and sequencing was relatively fast, but at a reduction of accuracy. Never dealt with them first hand though.
Niche uses where long-read sequencing is required (denovo assembly, reference improvement, bacterial genomes, whole-isoform transcriptomes. Nanopore is starting to eat into that market, though.
pacbio has this fundamental problem that it's a single molecule process (as does MinION) and deconvoluting errors becomes a problem of unit parallelization.
For an interesting contrast, here's a look inside two DNA sequencers from the last decade, costing two orders of magnitude more (and also roughly two orders of magnitude larger in volume and weight):
They are orders of magnitude more precise, and this is the important part. A human genome has 3 billion basepairs, even if you have 99,99% accuracy, this is not enough.
I'm pretty sure we are now just a couple of marketing cycles away from a version of this being re-packaged and sold to police departments around the world as a tool for identifying suspects in a cloud database of DNA sequences belonging to People of Interest.
(You don't need a complete sequence for that, just enough unique markers to ID somebody.)
There was a cool visualization from data produced on nanopore on /r/dataisbeautiful recently [1]. It really shows the strengths of the device - ~50kb region with 3kb imperfect repeats. This would be impossible to align properly with <1kb reads from illumina. The paper is worth a read if you're interested in sequencing. They basically did a de novo sequencing of a bacterial genome on the cheap. And in the assembly they had more than 1000 reads longer than 100kb. Using this in combination with an Illumina sequencer is probably the best though.
That’s great, but for a company that has raised on a valuation of >2BUSD how are they going to compete with Illumina.
The read lengths might be long, but the error rate is a couple of orders of magnitude higher. I can’t see a market large enough for highly error’d log reads to support their valuation.
Oxford Nanopore is burning through at least $1 million a week of their investors income with virtually no sales to support their market valuation. Will they be the next Theranos?
I'm somewhat confused by that article, one of its key pieces of data (saying that a full human genome sequence would cost $90,000 using this technology) comes from an expert comment made on Quora. The Quora post made the $90,000 estimate in early 2016, then updated it in July 2017 to $10,000. This article was written in December 2017, so why does it use the original figure and not the updated one?
The technology really works and is generating results at a rapid pace. This will absolutely not be the next Theranos, although you may be right that there are issues of burnrate and maybe even mismanagement.
To me the fact that they have a patent monopoly on "putting DNA through a protein pore with voltage sensing" is tragic. Who knows were we would be today if these patents had been granted to the public domain.
Let's assume the company statements are true, they are shipping on time; what is the break even point for shipping $1,000 starter kits, flow cells, and sample prep items? How many devices are they shipping? What's their revenue?
Their sales data is not available. The MinION program does not seem to be designed to finance the company, though it might actually be profitable. The PromethION on the other hand, is selling faster than they can ship the devices.
Their valuation is not about short term sales. Their technology really is somewhat of a holy grail. There is no "simpler" way to read DNA in terms of technology. Everything else involves complicated biochemistry. Illumina beats Nanopore Technologies right now because of the scale of their machines. But they can't scale them down much further without losing performance. The most efficient Illumina machines will always be cupboards or bigger.
No competitor can come up with something better than ONT. Nothing smaller, for certain. The only way to beat Oxford Nanopore in the future will be to implement a very similar nanopore sequencing scheme.
The accounts of all UK companies are filed publicly. They had about 1M GBP of sales in 2016 from memory. I would guess with overheads, each unit is currently sold at a loss.
Illumina doesn’t beat nanopore because of the “scale” it beats it because the technology produces fundamentally higher accuracy data. The error rate on Nanopore reads is >10%. Even on first generation Illumina machines the error rate was 1%, and is now significantly lower.
Oxford Nanopore have been working on this for 15 years, this is the best they can do. It might be possible to create a Nanopore (maybe solid state, not protein) system with a lower error rate, but I think they have little hope of doing it.
I think they’d need to sell >10,000 units a month. That is if they are making any kind of profit. Unfortunately I don’t think there’s a market for anything like that number of flow cells.
I think they like to say they have a patent monopoly, but I don’t believe they do. Genia (which also has issues) are developing a protein nanopore platform.
Illumina (see recent IP battle with Oxford Nanopore) also have key IP in this space (for mspA).
Maybe it's just the company's spin, but apparently they are not keeping up with the orders. Also, the technology is working, and it is working competetively well for certain applications.
One thing I'm excited for here is targeted sequencing using read-until. In this method, you monitor the current trace coming off of individual pores, and if you determine that the DNA in that pore is not part of your sequence of interest, you can reverse the voltage to remove the DNA and start sequencing another molecule. I think this will open up a lot of applications for human genomics.
If you're interested in Oxford Nanopore, you might also keep an eye out for Roswell Biotechnologies (https://www.genomeweb.com/sequencing/roswell-biotechnologies...). TL;DR: their sequencer involves immobilizing polymerases in circuits so you can measure the current changes that occur as the polymerase adds bases to a strand. This might be a good approach, since it doesn't involve optics (like IonTorrent and minION, keeping costs down) and you get a current event per base (as opposed to per 5-6 bases as with minION), making basecalling easier and potentially more accurate. And it will be fast, since you're reading as fast as the polymerase can work. They want to get up to 10kb reads, and I imagine they could increase the consensus accuracy per read by looping around a single molecule of DNA several times, like with PacBio. Seems feasible too (after all, PacBio has successfully been able to integrate polymerases into very small features rather well, so I see no showstoppers there).
There is a group in UK (Matt Loose's group) that I think is working a lot on read-until. I haven't really kept up with it. I think the key for read-until to work is ultimately probably going to be better hardware as you need to be able to analyze the trace signal and compare it against a database fast enough to tell the device to kick it out. There are plenty of software optimizations that can be done to compress the signal or extract relevant regions, but the comparison to a database will be difficult to do fast.
Also as a biologist by trade and an audio dsp nut by hobby I loved that they were using algorithms initially developed for music information retrieval to extend the capabilities of a sequencing instrument.
The computational barriers are definitely an issue. I thought I had read rumors on twitter that ONT was trying to implement read-until on one of their larger-scale sequencers (one of the ones with compute integrated).
Barring that, the Cas9-mediated targeting method looks somewhat promising (https://www.youtube.com/watch?v=DGDH-FdoARM). No need for amplification! Though for applications with human DNA you might run into some issues with the copy number of a sequence of interest if you’re starting out with a normal amount of genomic DNA.
Read-until has had diminishing utility. Over the last year or so, the speed at which a read can pass through the pore has increased so that by the time you call the bases and determine what it is, the read is likely to have already passed through the pore.
Since 2013, Illumina's brought the price of a genome to about $1k with the HiSeq X, which essentially scales up their existing tech. No idea what the next few years hold, but I'm excited to see insurance companies beginning to cover genome sequencing as an affordable diagnostic tool.
The question is also what exactly you get. Illumina's technology gives you very short pieces of DNA. I'd call it confetti sequencing. Pacbio and Nanopore both go for longer sequences. This is important when trying to assemble the pieces to a bigger picture.
I think I could help answer this. The cost of computation has gone down so much that its just a fraction of the price for your whole genome sequencing. The cost comes from preparation, reagents, and storage. Illumina sells their own reagents, so the cost is really dependent on them. We've worked really hard to get the computation cost of genome sequencing as low as possible, and we just passed a major milestone last year.
Source: Software Engineer for a well-known non-profit sequencing lab/center.
I think we're already there. You can probably get around a $50 dollar bacterial genome by using robots and doing 384-well plates of samples at a time to get economies of scale.
A lot of the costs are outside of the sequencing itself. You have to extract the DNA from the sample and prepare that DNA for the sequencing platform you're using. These costs are both the reagents needed to do this, and also the associated labour costs, even when done at scale.
Perhaps you meant $100 human genome though? However I think roughly the same principles apply.
You're buying a service for the 500-800$. Here you're buying a flowcell for 1000$ (the "sequencer" is more or less free). The technology is pretty different and a lot more bleeding edge. If what you want is just a genome aligned to a known reference and you can wait a few days then this is not what you want.
We use ONT's minION for strange niche stuff, where either the DNA prep method matters or the latency (the time from loading to the first data out of the machine) matters. Read quality is typically not as good as illumina, so for applications where we need quality we stick with the NextSeq. There's no reason to think nanopore technology won't get better though, but for now, if you've got the money illumina is still the way to go most of the time.
A MiniSeq or MiSeq would be cheaper by the time you do the 3 flow cell/person (minimum -- they claim 10-20GB/flow cell but that's bullshit -- so at least ~4K, probably closer to $8k of chemistry/person). You can get a used MiSeq and for ~$50k then its ~1500/person
If you want to do absolutely everything yourself from DNA extraction to sequencing and data analysis, then yes, this is probably the cheapest way.
But you also need to consider how you prepare your DNA to go into the Nanopore. It's a lot more investment than you might think. Extracting your DNA in a clean enough way for it to work with the Nanopore will require equipment/facilities that most people will not have access to (phenol/chloroform is the optimal way to extract DNA, which will require a fume hood and toxic chemical disposal). Depending on how you want to prep your DNA, you might need more specialized equipment/reagents (Ampure beads, ligation kits, end-prep/TA-tailing, etc).
There is a good reply to you that is currently dead for some reason, so I’m reposting it here for people who don’t have dead visible:
>
You're buying a service for the 500-800$. Here you're buying a flowcell for 1000$ (the "sequencer" is more or less free). The technology is pretty different and a lot more bleeding edge. If what you want is just a genome aligned to a known reference and you can wait a few days then this is not what you want.
We use ONT's minION for strange niche stuff, where either the DNA prep method matters or the latency (the time from loading to the first data out of the machine) matters. Read quality is typically not as good as illumina, so for applications where we need quality we stick with the NextSeq. There's no reason to think nanopore technology won't get better though, but for now, if you've got the money illumina is still the way to go most of the time.
>
The minion is a device you own. You don't need to ship the samples (which can be a problem, imagine ebola samples). You wait a maximum of 2 days for most of the results.
A more interesting race for bio-entrepreneurs. Rather than $1K human sized whole exome. May be the race to develop a $1 plasmid, BAC or small microbial genome sequencer. At that cost, seed level investment in single purpose bio-factories becomes very attractive.
As some that works in this industry the most amazing thing is the original sequencing technology (Sanger) is still going strong and about as much Sanger sequencing is being done now as back in the early 00s [0].
Can we ever imagine dna sequencing been done like a remoter sensor? E.g. automated sampling for e(environmental) dna,in-situ sequencing (single and multi species targeting), results transmitted back wirelessly. A biodiversity IoT system?
Yes, if you were willing to shell out 1000 bucks for the test, and if the bacteria was in your bloodstream (unlikely). If you swabbed, cultured, and sequenced, it'd be a better test.
I've personally used the oxford nanopore to diagnose malaria subtypes from ~1ml blood draws. Though it took awhile to do so, well beyond clinically relevant time periods. Our best turn around was about 2 days for sequence, but analysis takes much longer.
Culturing is the problem, though, right? Many types of bacteria (including some pathogens) really do not like to grow on most media. There's no doubt that sequencing is the future of infection diagnosis. It's just a matter of how long it takes to get there (cost and complexity both have to drop).
It's really comparing apples and oranges. If you want a 30x whole genome for genotyping, you'll use short read technologies and it'll be way less expensive.
The enormous reads offer a different window into genomes, by spanning repetitive regions where short reads can't be accurately placed. For lots of applications, the sweet spot is to use long reads for "scaffolding", then build up coverage with short reads.
That seems expensive. I got a personal sequence done (at 15x) four years ago for 750 usd. It was part of a group deal so maybe it isn't relevant to your search.
Software is only 1% of the story. We need reliable public data from many people linking genomes, diet, life history, and phenotype to do the other 49%.
The washing procedure is a simple 2-step process and the kit contains buffers to remove the previous library and equilibrate for reuse or storage. There are sufficient reagents supplied for 12 washes.
The flowcell is a consumable. You can run it for ~2 days on average. You can wash it out and run other samples during the 2 days if you want, but it will stop working after ~2 days of running.
And in my experience, the washing buffer doesn't do a great job. I read somewhere there is ~10% carryover.