That said, I love Nanopores, I use them in my business, and those error rates you can hack around if you know what’s going on under the hood.
there's a reason I went into automated biological robots.
Going on the same theme, what's an absolutely terrible example of technique?
Modern scientists move small amounts of biological materials using a tool called the pipette. Pipettes can work with very small amounts of liquid- down the microliter. When you're running a delicate experiment being able to deliver the precise amount of liquid is critical.
Pipettes need to be calibrated. How do you calibrate a device that works with volumes of liquid? Volumes of liquid are hard to measure. Fortunately, water at STP (standard temp and pressure) has a known mass, so you attempt to draw 1 mL, and weigh it. 1 mL of water weighs 1 gram at STP (this is not a coincidence- it's by definition).
OK so you're weighing 1 gram of water and adjusting the pipette's calibrator knob so that 1mL on the pipette weighs 1 gram.
I guess that means your weighing scale needs to be calibrated, too. Huh. These sorts of scales aren't just "weigh some flour for baking", either. They have to be accurate to the hundredth of a gram, and have walls to avoid fluctuations due to air currents(!!!!) and minor temp changes. The scales are calibrated using calibrated test weights.
Oh dear. Calibrated test weights? If you follow the turtles all the way down, you find that there is actually a tracebility chain from your calibrated scale back to one of the defined weights held by NIST, the NIST equivalents in France and Japan (they all share their weights). So you can actually calculate- using those weird rules of error propagation you forget in high school- the error of your scale as a product of the errors in that chain (often, knowing your error bars is more important than knowing the accurate answer).
But that's not all. Those defined weights? They're obsolete. Le Grand K (the origin of the kilogram, still kept under lock and key) changed weight over time due to subtle metallurigcal details.
The new definition of the standard is created by an obscure machine at NIST, just like the time standards. https://en.wikipedia.org/wiki/Kibble_balance is the tool used to do it, and it depends on the NIST time reference.
So, turtles all the way down until you get the rubidium fountain.
Disclaimer: Co-Founder of BugSeq
Sounds like Elon calling biology a “software problem”.
Not saying that you’re wrong, just saying that the computational folk tend to discount the challenges and skills required in the wet lab.
That being said, we see a future where someone without advanced molecular training can put a sample (whether that's a nasal swab, concerning white powder received in the mail or lab-grown meat) in a black box and get out a meaningful report.
It's time to bring in the industrial automation folks. They probably won't invent a fancy new algorithm to reduce the time to splice the pieces together, but they'll fine tune and automate your reader to the 9's.
I just realized industrial automation sounds really interesting. What would my chances be for someone who never got the chance to study math?
(Basically in 1998 it was illegal to change schools in Australia regardless of how much of an eyebrow-raising situation you might've been in. Had to homeschool, without any resources. Only realized ~20 years on just how much opportunity I'll never get back.)
(Heh, I'm pretty much expecting the only obvious possible answer at this point, I was just curious if the answer is "yeah no" or "it depends".)
Do a thousand readings, fix the parts that don't match across the board?
That said, that’s basically how a lot of NGS works in things like cancer sequencing on Illumina platforms.
Seems to me, that stuff is getting cheaper all the time.
We've actually started BugSeq to help labs get into nanopore sequencing - improving these open source tools and also writing our own. Orgs like FDA, USDA, big food co's, CDC, etc are now all adopting nanopore sequencing. Happy to see the industry taking off, this will be a step function improvement for public health in general.
(disclaimer: founder of BugSeq)
Nanopore sequencing very much has the potential to deliver this personalized treatment, without looking at any human genes or panels. If we could rapidly sequence bacteria in the bloodstream and predict their antimicrobial susceptibilities, we can make a difference.
What I'm saying is that nobody has delivered on any of the huge claims about the genome which genomicists made for the last 20 years, specifically in terms of actionable human health.
it's time to start calling the bluff.
The following have been revolutionized by the human genome project and subsequent technological innovation in sequencing:
-Non-invasive prenatal diagnostics
-Screening for cancer with cell-free DNA
-Rapid and accurate diagnostics for children with suspected genetic disorders
-Targeted cancer therapeutics
Many of these are already in routine clinical use in high income countries and result in significant improvement in human health.
I worked in genomics for 20 years. I have deep knowledge of biology and medicine. And the reality is, for the amount of money invested, the actionable medical returns have been relatively tiny and industry continues to not invest in sequencers for a good reason.
I agree with this, but I disagree with the following:
> most of the progress did NOT come from HGP data.
Without HGP (Human Genome Project), many biological discoveries in the past two decades would have become much more difficult.
> it's a huge waste of investment until we understand the multigenicity of diseases better
If you don't invest, you will never approach a solution. Applied science goes nowhere without a solid foundation in basic science.
NIPT uses low-coverage sequencing to identify aneuploidies for chromosomes 13,18,21 and some larger microdeletion syndromes - this is not WGS.
Cell free cancer screening is panel based and assays specific, known driver mutations.
Rare disease diagnostics can be WGS based (and some of the rapid 48h WGS studies of NICU babies are compelling from a technical standpoint) but most diagnoses identified via WGS can also be found via WES + chromosomal microarray.
Targeted cancer therapeutic target identification is panel based for most patients, as WGS doesn't identify too many targets for FDA-approved therapies that a panel + IHC + FISH + fusion testing won't.
I mean. Sure, sequencing the human genome didn't solve our problem overnight, and you can't sequence a genome at a vending machine for a nickel to tell your future, but I think there has been an avalanche of medical data derived from the genome and that is only continue to get bigger.
Now that we are really starting to figure out the polygenic risks and the single deleterious variants and their links with phenotype, people will have a much better picture of what their future might hold (and how to prevent it).
I don't think it was ever a bluff. The problem just turned out harder than we thought it was going to be.
I had my genome sequenced a few years ago by Illumina. They had a big slick presentation, blah blah blah, ApoE1, etc. When the genetic counsellors came to my genome they said "huh. you don't have any risk factors". I checked and each of their risks was from an existing gene panel, so the WGS wasn't valuable (it's on PGP, if you want to work with it https://my.pgp-hms.org/profile/hu80855C).
I talked in more detail with the counsellors. Turns out, whenever they saw a novel variant that wasn't covered by a gene panel they were googling the variant and skimming the abstracts of papers.
It was at that point I realized the difference between research, PR, and actionable medical data.
I've done my as well. Most of the "company" sites don't tell you much, which I think is a legal thing. They aren't cleared to release clinical predictions from genotypes, so they just... don't. I ended up running my through promethease (which mines SNPepedia) and found quite a bit more than what was reported.
I work with some certified clinical geneticists and yeah they do take a much closer look, but at the end of the day its all just sequencing and interpretation. I think its mostly just safeguards to keep bad actors at bay.
PGP looks interesting. I see that you submitted phenotype data. I didn't know they had a questionnaire with that. That's actually really interesting. I need to see what kind of questions they ask.
But agreed, it is about time we start to understand regulatory regions better. But that will require gathering more WGS data, and indeed most data is Whole Exome or Panel.
Source: Am MD and practice laboratory medicine.
The other thing you have to realize is that because of the regulatory burden, it takes a while for these tools to make it into practice. Many of the successful genetic tests today were approved 20 years ago. Look up Oncotype Dx which is used in a huge % of breast cancer surgery, for example. WGS and WES will undoubtedly be far superior but it takes a while to get these things into practice.
HLA associations with autoimmune disorders are extraordinarily strong. Same applies to infectious diseases, vaccine efficiency and checkpoint inhibitor efficiency.
While you can type HLA with classical techniques, the only really reliable way is really to use long reads.
Same applies to CYP enzyme superfamily, where variation is linked to some rare drug toxicity events for example.
We should all know our HLA and our CYP genotypes. Why 23andme does not even attempt to impute HLA is beyond my understanding.
I have consulted to National Marrow Donor Program/Be The Match  off and on for several years. There are typing labs using long reads but most reporting/matching/analysis is still performed at the nomenclature level .
I hope in the near future we'll be able to simply assemble the entire MHC for each sample, as messy as it might be, see e.g., "A diploid assembly-based benchmark for variants in the major histocompatibility complex" .
But we know less associations about them. Same applies to TCR genes. A chicken-and-egg problem, we need good massive GWAS to find out.
My background is in CS, AI and statistics. But I've done lots of graduate research in genetics and epigenetics. I'm very interested in understanding the interactions between HLA and commensal / pathogen epitopes in health & disease. Also in vaccine design.
How about you? I can see from your posts you are with the Big Data Genomics team at UC Berkeley AMPLab.
We have internet. Great. But look at the dark side. DNA is great like target medicine but you have totalitarian regime which might use it.
Need some sort of awareness. How to deal with the two sides, let us discuss once you know there is a very dark side to it.
> Then Alice starts asking questions. I was not prepared for questions
> Then Alice started asking questions about my research
Wow, that is kinda creepy. You're seeking to pay them money for a product they sell, and they want to ask all these intrusive questions about what you do? Is there a reason for this? I mean imagine if DigiKey did this, what a nightmare that would be for electronics design.
Are they worried about you using it as part of a lab producing bioweapons? But frankly anybody with a shot at pulling that off would already know enough to lie very convincingly. I can't imagine this interview would be a hurdle to those folks.
The only other place I've come across this behavior is when seeking to buy wafers from semiconductor foundries. In that situation the foundries see themselves not as vendors to the chip designers, but rather as investors in the chip designers -- investing not money but rather production capacity, and earning not dividends but rather wafer purchases.
> I repeat something about bacterial colonies from my card but she isn’t buying it. I manage to get out that I understand this isn’t a spit-in-the-tube-and-done thing and that’s all I’ve got. She keeps pushing
I mean you told her pretty clearly that you didn't think it was like 23andMe.
Why would she keep pushing?
Also: truly remarkable phd thesis!
Error rate for MinIONs is still quite high (10-15%), so a human genome sequencing would be quite inaccurate in some regions.
Sequencer is quite cheap, reagents and flow cells are a little bit more expensive.
My desired hobbyist use case is to key out plants, lichens and mushrooms that I find in the field. I have the bioinformatics knowhow just need the hardware. 3-4h seems lika a long time for a genome that is <30k nucleotides long. Mushrooms on average seem to have almost as many genes as coronaviruses has nucleotides. I guess partial sequences (and thus reduced comp time?) might do the trick but it's probably hard to target those partial reference sequences with a long-read method like NanoPore.
Some good info on next-gen sequencing techniques: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3841808/
On the other hand, when doing a genome assembly, the Nanopore reads are good for a draft sequence and then the Illumina reads can be used to polish the sequence.
I don't know the answers to all your questions, but I do know that the emphasis is on research, not consumer (or hobbyist) use. I believe the devices are ~free, but each run requires using a consumable part that has to be either disposed of or returned for refurbishment, and I believe these are hundreds of dollars each.
The big advancement is the size and cost of the devices, the fact that a lab can have one on every desk rather than a communal machine that you have to queue your samples up for, or a device you can transport in field kit.
They do have cloud services that do much of the processing for you, but I suspect you'd want to be able to manipulate the data so you'd need your own data processing tools locally. It's not going to give you a 23andMe style report, it's more likely to say "yep, that's a human" vs "you're ecoli". I believe they do have training for how to do this data analysis, but I suspect this is targeted at customers on large contracts.
Also that's great you have some bioinformatics experience, but are you sure it will apply here? I'm very unknowledgeable about this so forgive me if I'm wrong, but I believe that traditional machines do short-reads, whereas Nanopore does long-reads, which I believe invalidates many techniques for reconstructing the data, the tools, etc. This might not be something you have to be concerned about depending on the level you're working at, I don't know (maybe this is all "solved" by the time you're doing analysis?).
Flowcells last for one sample. The machine should last indefinitely. You can sometimes add more of the same DNA to a flowcell after one use to get a bit more out of it, but the quality degrades quickly. 500-1000 dollars each for flowcells, depending on how much you order.
My experience in field use, I was using Oxford Nanopores software which does processing remotely and was able to run the the platform on just a regular 2015-era laptop.
The flowcell gets contaminated with your sample after one run so they are 'one time use'. The nanonpore protein eventually stops working also.
They are expensive because doing molecular biology is expensive. It requires expensive machines and expensive reagents at atomic scales to create. Thus money is required.
The expensive part is not the chemistry. Each flowcell has a very expensive piece of metal that senses the very small current variations that each kmer causes when going through each pore. They've actually come up with a device (horribly named "flongle") that has the same shape of a flowcell but no pores, and the mini flowcell it uses is ~90USD (against ~900USD for a full flowcell). Of course, yield is much lower.
But even on a site like Stackoverflow (hey I can trust Joel right?), and even after coming here and reading "hey yes we build / use those too" I am struggling to believe this.
What else don't I know about in biotech? How far ahead is the industry compared to where the average man on the clapham omnibus thinks it is.
Please stop the world I want to get off.
Could you elaborate / give an example? Are the errors deterministic? Is it like ISI (Inter-Symbol Interference) in signal processing, where some symbols interfere with the reception of the next symbol(s)? Are there short range errors (one letter) or long continuous errors?
There is a real example I ran a few months ago. How to read it is here https://en.m.wikipedia.org/wiki/Pileup_format
Positions like 172 have errors more often than not because the basecaller is wrong sometimes (note: this is from a sequence verified sample).
The errors come up more often in some sequences than they do in others. I’m not really sure about symbol processing, but if you have any beginner resources for that I’d appreciate them!
Visually, I think, you can see that it isn't THAT bad (low coverage at the ends is because of how I barcoded the sequences).
I hate to be that guy, but have you actually used the technology? And if so, approximately what year? Unacceptable for what procedure? Do you have any raw reads that have been troubling you?
The errors nanopores get are gaps, not base pair substitutions. So with things like viral or bacterial sequencing you don't really have huge issues.
When you are doing large eukaryotic sequences with lower coverage on average, you start picking up a lot of deletion artifacts. Which isn't a huge deal if you have a very well annotated genome like human, but if you are doing pioneer genomics it can create some difficulties. Often if the genome isn't well annotated, its best to pair nanopore with short reads.
The actual base calling is on par with Hi-seq in my experience. In software terms, you are missing chunks of code, but aren't flipping bits.
This is important because in certain experiments, you care less about those gaps (scaffolding for example). So you can get a lot of cheap utility out of nanopore sequencing.
The Bird Genoscape Project was also showcased in this excellent Nat Geo video.
Here are some press releases related to articles I published during my PhD:
Technological singularity is here! :)
Literally nobody did it for a couple years, so I ended up taking out the bitcoin to pay for more DNA synthesis a few years ago. I actually did delete the bitcoin private key though, so I had to pay for sequencing it back out...
Nice video here https://www.youtube.com/watch?v=1_mER5qmaVk
Found some previous discusion of HW: