
End-to-end differentiable learning of protein structure - tepal
https://www.biorxiv.org/content/early/2018/02/14/265231
======
nabla9
It would be cool if machine learning researchers would start participating
CASP and CAPRI. If you crack Go, you get fame, but if you crack protein
prediction, you get Nobel price and completely revolutionize biochemistry and
medicine.

[http://predictioncenter.org/](http://predictioncenter.org/)

[http://www.ebi.ac.uk/msd-srv/capri/](http://www.ebi.ac.uk/msd-srv/capri/)

edit: Why there is no XPRICE for protein folding?

~~~
cing
DeepMind and others are trying. "Hassabis said the company is now planning to
apply an algorithm based on AlphaGo Zero to other domains with real-world
applications, starting with protein folding."

[1]
[https://www.bloomberg.com/news/articles/2017-10-18/deepmind-...](https://www.bloomberg.com/news/articles/2017-10-18/deepmind-
s-superpowerful-ai-sets-its-sights-on-drug-discovery)

~~~
RivieraKid
That doesn't make any sense unless I'm missing something, A0 is suited for a
completely different problem than protein folding...

~~~
rytill
The AlphaZero algorithm (monte carlo tree search with value estimator trained
by reinforcement learning) works on any environment you can simulate during
play time, single player or not.

~~~
eutectic
Any environment with finite action and state-spaces.

~~~
rytill
No, the key requirement which makes it difficult to use on real-world tasks is
that you must be able to do a forward rollout of your environment in your
decision-making process.

------
dr_coffee
I work on protein structure, albeit not from a computational standpoint, and
it struck me as odd that none of the work from the Baker group (Univ
Washington) e.g. Rosetta
([https://www.rosettacommons.org/](https://www.rosettacommons.org/)) was
mentioned. Rosetta can be used to predict tertiary structure from amino acid
sequence. Does anyone familiar with the field know how the methods used by
software like ROSETTA differ from those presented in this paper?

~~~
superfx
Hi! I’m the author of the paper. Not sure why you say Rosetta isn’t mentioned?
It’s extensively referenced throughout the paper, discussed in the discussion
section, and is one of the top 5 CASP servers compared to in the results
section.

Also as for how it’s different from what’s described in the paper, that’s the
topic of the introduction of the paper. Rosetta uses both fragment assembly
and co-evolution methods.

~~~
dr_coffee
Oops I seemed to have skimmed it a bit too quickly. Thank you for the kind
reply.

------
sungam
This is a very interesting approach. Clearly a lot more work to do but the
robust prediction of protein structure from sequence would be an absolute game
changer for biomedical science so I hope that this opens up new strategies.

------
RivieraKid
What are the real-world applications of protein folding (preferably, some
specific example)? I always hear that it's really important for drug design
and biotechnology but have a hard time imagining something concrete.

~~~
superfx
Re drug discovery, often times in “rational” drug design, medicinal chemists
try to make small molecules that bind snuggly into a binding pocket on the
protein. Having the structure of the protein aids greatly in that process.

~~~
xiao_haozi
Yes! And I'd also add that there are others that come into play...

* Elucidating function by identifying similarity to other known structures

* Finding novel signaling mechanisms (see work on PHinder)

* Modeling co-receptor/ligand dynamics

* Identifying function of orphan receptors

* Working with ancestral genes by identifying descendant structure

* Classifying and clustering proteins based on solved structure

* Learning new biochemical mechanisms through active vs inactive state structures

...

------
resiros
Cool method! Are you planning to participate in the next CASP? Do you plan to
open source the code?

~~~
superfx
Yes! Certainly on the source code, and hopefully on CASP13 too.

~~~
resiros
Thanks for the answer! I hope then to see you in CASP (and CAMEO too, it is a
great tool to test/refine your method). I was discussing a paper with a co-
worker of mine (we also work on psp, we work on RBO Aleph). We had a hard time
pinpointing the thing that made your method finally work. You have mentioned
in your blog post that you have been working on it for years now, and I guess
a lot of other people had the idea of using deep learning for psp. But what
was the insight that made it all work, using LSTM? or was it many small
refinements and hacks?

~~~
superfx
I would say the biggest thing is obviously the architecture, coupling LSTMs
with the geometric units that spit out the actual 3D structure that can then
be directly optimized via the dRMSD loss function. That's the biggest point of
distinction from everything else out there (no contact map prediction, etc.)
So it really is about end-to-end differentiability IMO, which hasn't been done
before.

As for why it took so long, it is and it is not fine-tuning. Getting RGNs to
train _at all_ was a rather difficult process, and required a lot of finicking
around. But since I got them working, I haven't actually spent all that much
time fine-tuning them, and so I expect there to be a lot of low-hanging fruit
in terms of optimizing performance (starting from the baseline I found.)

