I am a structural biologist studying Archaeal viruses and CRISPR/Cas proteins. From my point of view, AlphaFold has basically just gotten better at multiple sequence alignments. It's not a bad thing, but it's unfortunate useless to me because sequence divergence happens so quickly in the organisms I study that even the best results are still basically made up. It's nice that AlphaFold got better at generating sequence alignments, but it's not a magic bullet (a la folding figured out from first principles.)
Interestingly enough, if I get experimental data of an archeal virus protein, it almost always uses a conserved fold. There's just no evidence at the amino acid level.
I agree. AlphaFold's approach isn't what I was hoping. Something ab initio would be ground-breaking. Especially if you could apply it to chemistry more broadly than protein folding. AlphaFold's approach seems like a recipe for over-fitting.
I would be honestly surprised if a true simulation is possible that can also run on a classical (non-quantum) computer, but I've been surprised before.
One crazy idea I have is to run some very crude ab-initio QM or DFT stuff starting with folded proteins, and gradually running the temperature higher until it unfolds. Then amass a dataset of protein structures + positional delta vectors. Then time-reverse the dataset (flip the sign on those vectors). Then train a 3d convolutional NN on the reverse-melting curve to obtain heuristic rules for folding in whatever universe the shitty physics engine represents.
Then it doesn't matter if the QM simulation is very crude and deeply flawed, so long as it gets to the right answer at the end.
IIUC the folding and unfolding pathways of a protein are not time reversed wrt each other.
But you would still enjoy reading this:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC17732/ the work in this paper led to folding@home because vijay couldn't get enough computer time to run his simulations
if you really had a lot of computer time to waste, you could imagine doing simulations where you titired in or out some guanadinum chloride and inspected how disrupting h-bonds (versus hydrophobic collapse) contributes. Chaotropes are better than temperature for probing unfolding.
> IIUC the folding and unfolding pathways of a protein are not time reversed wrt each other.
That's more or less correct, but in my proposal, we are not modeling the folding pathway of a protein. One way to think of it: we would be modelling the reverse unfolding pathway of a protein in a universe with different (simpler) physics than ours, physics in which (hopefully) the time reversed unfolding pathway yields the correctly folded protein in OUR universe.
Off base. I'm using universes as a florid description of the fact that our QM models are crap but they might be useful of the laws of physics were different.
Check out DESMOND and ANTON, that's basically what DE Shaw Research is doing. It does seem like, to get static structure predictions, it's going to be hard for anybody to find anything that does marginally better than AF2 at this point, but since static structure predicitons are just mostly useful for brainstorming, I think ANTON may end up being more useful, in terms of applied science.
I wonder if a transformer with 1 recurrent layer (or a transformer Deep Equilibrium Model) could work well. Transformers are almost like a physics simulation in that they sum vector-valued interactions which depend on distance in some space, and then add the result to the state of each particle / element.
Reasonably speaking, I would expect that within 2 years 10 groups will be as proficient as AF2 at predicting static structures. I don't think anybody who is trying to emulate physics simulations will be in that group, just folks who have learned enough tricks to quickly incorporate all the evidence during training and choose how to apply it during prediction.
I expect to see a "multiple feature embedding heads on top of 2 fullly connected layers" (as used in modern ads training) will end up being the simplest architecture capable of folding proteins well.
No, because 3D folds are conserved. That means, short of one of the unknowns containing a novel fold (which they havent), theres always a possibility kf overfitting.
One can dream about being able to calculate energies with the accuracy of DFT calculations... and do dynamic simulations on the time scale of ball-and-stick molecular modeling sims.
Would be amazing for homogeneous catalysis design.
How do you know the results are basically made up? Have you compared AlphaFold's predictions to your experimental data.
I think you are right that most of the predictive power derives from super-enhanced multiple sequence alignments, but I think you underestimate AlphaFold's ability to generalize to novel cases
I think parent meant conservation of the amino acid is weak and still the structure remains the same overall. So sequence similarity is not everything.
Reason might be the overall protein fold is guided also by something else than detailed side chain contacts.
BTW: Hydrogen bonding and salt contacts do not drive protein folding at least not thermodynamically because it does not matter whether polar/charged residues interact with others or with water. Rather, the reason why proteins fold is the same why oil and water do not mix: Hydrophobic amino acids avoid water. This is an entropy driven process where electrostatic interactions do not matter. See also the „molten globules“ model. Basically it means a predecessor of the protein folds early on due to a collapse of the hydrophobic core. Tertiary structure is then refined due to residue/residue interactions. In the end, it’s the distribution of hydrophobic amino acids in its sequence that’s most important for the conservation of a structure. Surface residues can vary quite a lot.
Is there nothing 'new' that this cutting edge technology can be used for that allows fast iterations given the limitless folds or whatever is available. Ie. Not homosapien medicine.
Look for your keys where the light is shinning, not in the dark.
Interestingly enough, if I get experimental data of an archeal virus protein, it almost always uses a conserved fold. There's just no evidence at the amino acid level.