While the description of method (page 11 from this paper: http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf) is pretty vague, they make clear that much of their scoring and structure refinement uses the scoring function from Rosetta. That's a tell that the neural-network part of the method probably isn't sufficient to pick out good structures. The AI, in this case, is generating fragments (which is not exceptionally different from what Rosetta already does), and doing a beta-carbon-distance score.
Basically, the machine-learning part is generating protein fragments and quickly stack-ranking structures that are created by monte carlo search. Everything else is done by a much more complicated physical model that has little/nothing to do with AI.
Forget clunky mechanical robots, Boston Dynamics can just engineer fleshy bullet proof self healing skin system. Think skinned dogs.
Imagine any natural input that life can read (light, heat, glucose levels, hormone levels, force, etc.) and any natural output that life can produce (temp, colors, fluorescence, electrical impulse, etc.). For many of those options, we can design a novel protein that achieves a linkage between that I/O.
However, our approach to the problem is very much not like AlphaFolds - we don't try to scan the 20^600 space by changing individual amino acids, but rather we don't worry about folding or structure (too much) and instead play around with discreet functional modules that already exist in nature. Our approach is a bit more sociological than it is a simulation of physics/chemistry. But it works.
Optogenetics tools, CARs, SynNotches, BaseEditors are all curious examples, and there are many more coming online right now.
Less long than you think; more funding ;)
I know that's a bit cheeky, but the ability to understand and properly manipulate the chemistry of life (not just the D/RNA coding) is finally coming online. Efforts such as the OP's are along those lines.
Many still do not understand (myself included) how revolutionary CRISPR and other such techniques are. The world of fine-tuned, genetically based, auto-organizing chemistry is within sight. And man, it is going to weird.
Academician Prokhor Zakharov was wrong, we will be able to put an elephant's nose on a giraffe .
1.0 summary: http://web.mit.edu/cortiz/www/3.052/3.052CourseReader/3_Engi...
There are of course, 'uniform' polymers made from a single repeating subunit. However, natural non-uniform ones like proteins are also polymers, as are synthetic ones with two repeating units (EVA, for example).
> Polymers that contain only a single type of repeat unit are known as homopolymers, while polymers containing two or more types of repeat units are known as copolymers.
Trivially, there are protein homopolymers like polyglycine. They are definitely proteins.
In fact, simple structural proteins (like keratin, collagen, etc) are much more like synthetic homopolymers than the complex nanomachines such as ATP-synthase or the proteosome.
1) What is the architecture of the generative network and where exactly does it fit in the pipeline?
2) What is the interaction with the database? Is there an encoder being trained with real sequences further augmented with variations using the generative network?
3) What is the structure of the neural network that encodes the sequence? Is it a graph network, LSTM or simple conv-net?
4) The gradient descent step is very vague. Is it a physically based differentiable model (not a neural network) whose parameters are being optimized with gradient descent using automatic differentiation? Or something else? In short, there's some detail on scoring but how are the proposals being generated?
Questions aside, the results speak for themselves and are head and shoulders above all other showings. I wonder what it feels like for someone whose been in the field for years.
Despite the high score, there's still a long way to go before results reach real world utility. It's also worth keeping in mind that from a systems biology perspective, protein folding is only a small part of what makes getting clinically useful results difficult.
I might have missed something but I could not find anywhere indications of an intention to publish further details. That would be disappointing if such were indeed the case.
I have a hunch on how they are doing this.
From the blog post it appears that the network is using angles and distances of aminoacids in a given sequence in know structures to predict good starting point(s) for regular molecular dynamics-based structural optimization (what is called the "gradient descent step" in the post).
If I'm wrong then I'll just have to try this approach myself one day...
The document also answers some of my questions. Some confusion from the fact that they are using multiple methods in parallel. The scoring networks are resnets. The database was used in training one of the networks.
The generative network section is still unclear, but it looks like it was used in one of their methods, fragment assembly step and network architecture is DRAW (trained with DB). Network generated Fragments inserted using simulated annealing, ranked by either just their conv-net or both existing methods and conv-net. Simulated Annealing hyperparameters were optimized with evolutionary search.
Their third approach is what is most similar to what you describe. It looks like they combine lots of approaches, including features, convnet scores and predictions from other methods. Still needs a lot of detail to be filled in such as more detail on how they integrate memory and how gradient descent is actually done directly on protein chain structures.
It's possible their computational advantage made a large difference in being able to arrive at this result; existing researchers need not feel too bad.
For such a breakthrough, there is a good chance they are writing another article for Nature or Science, even aiming for another cover feature.
Ability to do this at scale could mean protein folding is effectively "solved" and we can move onto the next phases of computational design in systems biology. Namely: protein interaction prediction, design of antibodies, vaccines, novel assemblies, and on and on.
There are >100K molecules in the Protein Data Bank. With new ones added everyday. But folding and interaction data is opaque. If immediate predictions could be made that will in turn create a feedback loop informing design.
Actually pretty excited about the implications for Zero-G protein factories. Pharmaceutical companies could be a commercial driver for space based protein crystal fabs.
The coming of age of de novo protein design
I have a feeling they’re going after the biologics market with this. Predict structure directly from DNA sequences, simulate affinities, then make a batch and test in-vitro. Throw in a loop to feed back data to make a better DNA sequence. Definitely heading down the road to automated protein design.
The tools and technologies sometimes end up translating but that's a long-term process.
You might argue that you could then predict the structure of the best variants, but predicted structures are all but useless for drug discovery.
You now have a way of navigating sequence space far more effectively, so you can explore more of it. You could also potentially use the results to feed back into the system regarding function, so it could become smarter over time.
If that’s so, can you link to any supplementary material about it? Particularly with respect to how machine learning is being used, how the candidate selection process works etc. I’m curious about the subject.
> Except for predicting the desired function, which AlphaFold doesn't do either - as the structural genomics projects of the 2000s found out, having the protein structure doesn't magically tell you what it does in vivo.
Protein function prediction is a real thing, and it requires knowing the structure. Good structure prediction is a step towards this.
No problem, shame though!
It seems to me there must be scope for using AI to improve this process given the results it achieves in other domains, and the alphafold result is very encouraging. Maybe that order of magnitude improvement will eventually be possible.
- chaperones. Not all proteins fold by themselves, quite a few bind to an additional protein that helps them fold in the desired shape. It means that the final state is impossible to achieve from the "initial" state by gradient descent.
- proteins don't necessarily exist in the minimum potential energy state. Moreover, sometimes the state flips on addition of a ligand (e.g. myosin's relationship with ATP) and that's crucial for the protein function.
So static folding only gets you so far. Unfortunately, nature is hideously complicated and "entangled", so there is a tremendous gap between even perfect protein folding and real in vitro results.
I am much more excited about the application of AI to more complex problems like metabolic engineering/synthetic biology, literature mining, and genome-wide association studies. It's a shame the training data are such an incomplete mess, but that'll improve slowly.
We think science that comes first and from science we derive technology. But it is usually the other way around. Technology comes first and from technology, we derive science and even mathematics.
A canonical example is the telescope. It's invention led to newtonian physics and calculus. Another is the microscope, which led to much of modern biology.
If anyone is interested in the history of technology and science, I recommend
"Science and Technology in World History: An Introduction" by McLellan and Dorn
Also, if you have netflix, they have a great documentary called AlphaGo. I don't play Go, but I was able to appreciate the documentary. If play chess like I do, there are lots of youtube videos on the AlphaZero's games against stockfish.
If DeepMind's system is as general as claimed and as competent in other fields as it was in Go and Chess, I think it's fair to say that it could give us insights into current and new science/mathematics.
Yes. "Quantum" is used in so many wrong instances that the default reaction to a stranger using it probably should be eyerolling.
>Does quantum computing not offer our best shot at solving currently unsolvable problems?
Not in general, there are very few problems where shor's algorithm or grover's algorithm apply. There are many more intractable problems that are out of reach regardless of computing power. There are though a lot of 'unsolvable' problems today that are just a matter of more (non-quantum) hardware and software.
If you want to rid yourself of quantum computing delusions in particular, try reading Aaronson. https://www.scottaaronson.com/democritus/ and maybe for a primer on the sort of problem shor's algorithm can help with: https://www.scottaaronson.com/blog/?p=208
I think it's better fit than AI.
I’m curious how this competition works exactly — it seems like a set of label predictions are submitted and some form of accuracy result feedback is provided (a single accuracy score for the whole prediction set?). And that there are a certain number of allowed submissions ...? How much of the ultimate strategy for playing this game at a high-level ends up being around optimizing for receiving as much leaked information from the test set as possible — is best guess at this point that this result is likely to be a good indicator of a true increase in prediction capability ...?
(skip to 01:30 for real start)
Small Molecules: http://www.twoxar.com/ & http://atomwise.com/
Multi-Domain proteins: https://serotiny.bio (my company)
> But protein folding is far from a solved problem, fear not. XKCD’s take on this remains accurate! It’s going to be very interesting indeed to see the progress over the next few years in this area, but that progress is not going to be the discovery of some general solution. It’s going to be a mixture (as mentioned above) of better understanding of the physical processes involved, larger databases of reliable experimental data covering more structural classes, and faster/more efficient ways for searching through all these (both the possible structures and the real ones) and generalizing rules to tell us when we’re closing in on something accurate.
I thought this was only true if you already have a structure; otherwise you typically run into the [phase problem](https://en.wikipedia.org/wiki/Phase_problem), which is often a significant hurdle. But I haven't done much biochemistry in years, so there might be better approaches than MAD/MIR/etc. that make the phase problem a non-issue.
Computationally, this is statistical analysis, and I doubt that AI would be able to offer anything unique. Protein folding prediction, on the other hand, is more of a question of "where do you start to arrive at the answer most efficiently", and AI is well suited for this, and it would be much better than humans at prediction using methodologies and correlations far outside of human brain capability.
Now people just try to malign AI, genetic engineering, nuclear power etc.
Maybe it balances out any irrational exuberance about the arrival of new technological frontiers. Or maybe some folks are just grumpy.
so, no collateral damage? Targeted assassinations? This is the CIA's wetdream.
Whether we are wise enough to handle our intelligence is still an open question.
I hope that they will be able to abstract some formulas or rules about protein folding from the mess of statistics. I imagine that having a rulebook would be much more efficient than using AI, because protein folding isn't so much a game of chance as it is an extremely convoluted puzzle.