
How to determine a protein’s shape - Nuance
https://www.economist.com/news/science-and-technology/21716603-only-quarter-known-protein-structures-are-human-how-determine-proteins
======
drug_design_PhC
I was surprised to see the "breakthrough" in this article. The idea to use
correlated mutations to identify contacting residues has been around since the
90's [1](if not earlier), but it seems like it's just now getting
implemented/benchmarked in reliable prediction workflows. As someone finishing
a PhD in this field, my perspective is that there's a big lack of software
engineering talent here.

There are ongoing efforts in the computational drug design field to make these
problems more accessible to AI/ML specialists. While some biochem knowledge is
required, packages like OpenMM, PyRosetta, and MDTraj automate out a lot of
the details of working with protein structures. Further, there are a number of
public contests to identify the best-performing approaches, with no
degree/publication requirements for entry. These challenges include CASP (run
every other year for 3D structure prediction -- Jinbo in the article did very
well in this), CAMEO (run weekly for structure prediction), D3R Grand
Challenges/CELPP (run at different frequencies for drug design), and probably
many others I'm not remembering.

New folks to the field might start with FoldIt[2], a sort of protein-folding
video game that has a LUA interface, to get familiar with protein folding. I'd
be interested to know what sorts of resources we could make available to make
the field more accessible to AI/ML talent.

[1]
[http://csbg.cnb.csic.es/rev_coevol_nrg/](http://csbg.cnb.csic.es/rev_coevol_nrg/)
[2] [https://fold.it/portal/](https://fold.it/portal/)

~~~
harveywi
> As someone finishing a PhD in this field, my perspective is that there's a
> big lack of software engineering talent here.

As someone with a PhD in computer science who did a postdoc in a
bioinformatics lab, I can vouch for that.

~~~
x0x0
I'm a data scientist, and I left a biolab to do adtech.

My salary tripled. That may have something to do with it...

~~~
AStellersSeaCow
Yep, I did software development in academia for a biochem lab and they paid
less than 1/3 what I make in industry. Not only that: I was lower on the totem
pole than a first semester PhD student, there was zero potential for career
growth of any sort (the prof I worked for laughed out loud when I asked about
it), and my job security was entirely governed by the grant approval/extension
whims of the NSF and NIH.

Foreknowledge of all that wasn't enough to keep me from working at the job for
a while. It was a super interesting experience, and I learned an enormous
amount about biochem, comp bio, synthetic bio, and several other fascinating
subjects.

What eventually caused me to leave was the continuous, losing battle for sane
software development practices. It wasn't just that lab: everyone I
encountered in the techy side of bio - save for the oddball comp bio or synth
bio prof/student with a CS background that included industry experience - was
completely adverse to treating their software as anything other than a means
to an end. In the year and a half I lasted before taking a job in industry,
that one lab easily wasted hundreds of work hours navigating easily
preventable tech debt, writing the exact same code for the Nth time, fixing
the same deployment or revision control mistakes for the Nth time, etc, while
any attempt on my part to put in standards and practices to alleviate any of
it was dismissed out of hand as a waste of time.

In short, I agree that there's a shortage of software engineering knowledge
and skills in the field, but beyond the obvious financial, organizational, and
career development hurdles keeping talent away, there's a major attitude
adjustment required by the researchers themselves.

------
j_m_b
Protein folding researchers typically use monte carlo and gradient descent
methods. It makes sense to look to machine learning to expand the available
toolsets for determining protein structure.

------
fabian2k
There's quite a difference between modelling a structure, and actually
determining the structure. Methods that don't use any real data, but entirely
predict folding will have to get a lot of testing before they'll be trusted at
all.

The failures of these models are probably more likely in the really
interesting cases. The boring ones that are very similar to existing
structures are likely much easier to predict.

