More Protein Folding Progress – What’s It Mean?

COGlory · on July 26, 2021

I am a structural biologist studying Archaeal viruses and CRISPR/Cas proteins. From my point of view, AlphaFold has basically just gotten better at multiple sequence alignments. It's not a bad thing, but it's unfortunate useless to me because sequence divergence happens so quickly in the organisms I study that even the best results are still basically made up. It's nice that AlphaFold got better at generating sequence alignments, but it's not a magic bullet (a la folding figured out from first principles.)

Interestingly enough, if I get experimental data of an archeal virus protein, it almost always uses a conserved fold. There's just no evidence at the amino acid level.

the__alchemist · on July 26, 2021

I agree. AlphaFold's approach isn't what I was hoping. Something ab initio would be ground-breaking. Especially if you could apply it to chemistry more broadly than protein folding. AlphaFold's approach seems like a recipe for over-fitting.

Filligree · on July 26, 2021

I would be honestly surprised if a true simulation is possible that can also run on a classical (non-quantum) computer, but I've been surprised before.

It would indeed be ground-breaking.

dnautics · on July 26, 2021

One crazy idea I have is to run some very crude ab-initio QM or DFT stuff starting with folded proteins, and gradually running the temperature higher until it unfolds. Then amass a dataset of protein structures + positional delta vectors. Then time-reverse the dataset (flip the sign on those vectors). Then train a 3d convolutional NN on the reverse-melting curve to obtain heuristic rules for folding in whatever universe the shitty physics engine represents.

Then it doesn't matter if the QM simulation is very crude and deeply flawed, so long as it gets to the right answer at the end.

dekhn · on July 26, 2021

IIUC the folding and unfolding pathways of a protein are not time reversed wrt each other.

But you would still enjoy reading this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC17732/ the work in this paper led to folding@home because vijay couldn't get enough computer time to run his simulations

if you really had a lot of computer time to waste, you could imagine doing simulations where you titired in or out some guanadinum chloride and inspected how disrupting h-bonds (versus hydrophobic collapse) contributes. Chaotropes are better than temperature for probing unfolding.

dnautics · on July 27, 2021

> IIUC the folding and unfolding pathways of a protein are not time reversed wrt each other.

That's more or less correct, but in my proposal, we are not modeling the folding pathway of a protein. One way to think of it: we would be modelling the reverse unfolding pathway of a protein in a universe with different (simpler) physics than ours, physics in which (hopefully) the time reversed unfolding pathway yields the correctly folded protein in OUR universe.

d110af5ccf · on July 27, 2021

Your description kind of reminds me of the 2018 World Models publication. Are you familiar with that work and/or am I off base? (https://worldmodels.github.io/) (https://arxiv.org/abs/1803.10122)

dnautics · on July 27, 2021

Off base. I'm using universes as a florid description of the fact that our QM models are crap but they might be useful of the laws of physics were different.

dekhn · on July 26, 2021

Check out DESMOND and ANTON, that's basically what DE Shaw Research is doing. It does seem like, to get static structure predictions, it's going to be hard for anybody to find anything that does marginally better than AF2 at this point, but since static structure predicitons are just mostly useful for brainstorming, I think ANTON may end up being more useful, in terms of applied science.

eutectic · on July 26, 2021

I wonder if a transformer with 1 recurrent layer (or a transformer Deep Equilibrium Model) could work well. Transformers are almost like a physics simulation in that they sum vector-valued interactions which depend on distance in some space, and then add the result to the state of each particle / element.

dekhn · on July 26, 2021

Reasonably speaking, I would expect that within 2 years 10 groups will be as proficient as AF2 at predicting static structures. I don't think anybody who is trying to emulate physics simulations will be in that group, just folks who have learned enough tricks to quickly incorporate all the evidence during training and choose how to apply it during prediction.

I expect to see a "multiple feature embedding heads on top of 2 fullly connected layers" (as used in modern ads training) will end up being the simplest architecture capable of folding proteins well.

FartyMcFarter · on July 27, 2021

> AlphaFold's approach seems like a recipe for over-fitting.

Doesn't the CASP competition use unknown structures, which would have defeated over-fitting?

COGlory · on July 27, 2021

No, because 3D folds are conserved. That means, short of one of the unknowns containing a novel fold (which they havent), theres always a possibility kf overfitting.

l33tman · on July 27, 2021

It does. I'm not sure what the poster above is referring to.

isoprophlex · on July 26, 2021

One can dream about being able to calculate energies with the accuracy of DFT calculations... and do dynamic simulations on the time scale of ball-and-stick molecular modeling sims.

Would be amazing for homogeneous catalysis design.

jostmey · on July 26, 2021

How do you know the results are basically made up? Have you compared AlphaFold's predictions to your experimental data.

I think you are right that most of the predictive power derives from super-enhanced multiple sequence alignments, but I think you underestimate AlphaFold's ability to generalize to novel cases

ramraj07 · on July 26, 2021

Can you explain further what your second paragraph means?

G3rn0ti · on July 26, 2021

I think parent meant conservation of the amino acid is weak and still the structure remains the same overall. So sequence similarity is not everything.

Reason might be the overall protein fold is guided also by something else than detailed side chain contacts.

BTW: Hydrogen bonding and salt contacts do not drive protein folding at least not thermodynamically because it does not matter whether polar/charged residues interact with others or with water. Rather, the reason why proteins fold is the same why oil and water do not mix: Hydrophobic amino acids avoid water. This is an entropy driven process where electrostatic interactions do not matter. See also the „molten globules“ model. Basically it means a predecessor of the protein folds early on due to a collapse of the hydrophobic core. Tertiary structure is then refined due to residue/residue interactions. In the end, it’s the distribution of hydrophobic amino acids in its sequence that’s most important for the conservation of a structure. Surface residues can vary quite a lot.

dekhn · on July 26, 2021

your "BTW" is still a huge area of argument in protein folding, the claim you are making is just one perspective and is not well-established.

strbean · on July 26, 2021

I never considered that there would be viruses that infect Archaea. That sounds incredibly cool!

Any fun tidbits about them you'd like to share?

kleton · on July 26, 2021

There's a theory that the eukaryotic nucleus originated as an archaeal virus https://en.wikipedia.org/wiki/Viral_eukaryogenesis

strbean · on July 26, 2021

That is super cool!

mrfusion · on July 26, 2021

Could people like you help to improve alphafold? Did they already train it on the proteins you work with?

joshtam · on July 26, 2021

AF2 certainly moves us forward, especially for proteins where no structure was previously available

aaron695 · on July 27, 2021

Is there nothing 'new' that this cutting edge technology can be used for that allows fast iterations given the limitless folds or whatever is available. Ie. Not homosapien medicine.

Look for your keys where the light is shinning, not in the dark.

gtmb · on July 27, 2021

Does anyone have any bibliography to get started in this problem? I think a Nobel in chemistry would look nice in my resume :P