
AlphaFold: Using AI for scientific discovery - sytelus
https://deepmind.com/blog/alphafold/
======
sytelus
This is a fantastic achievement. About 8 years ago I'd an interesting
conversation with a friend on how any material with arbitrary desired
properties can be constructed using proteins. You can build something as hard
as turtle's shell or as soft as jellyfish. You can build liquids that
dissolves plastics or you can build the most flexible fibers known in the
universe. One way to think about proteins is generalized parametrized
materials. If you knew the inverse function of properties -> protein
structure, it would change the world far beyond dollar value of the invention.
The DNA mechanism is evolution's best effort yet to build exactly that. This
was such a beautiful insight that I remember rushing to Amazon and getting few
books to understand basics of protein folding. The subject has some extremely
beautiful foundational simplicity that is easy to understand but it quickly
gets complex enough that it would be hard to navigate without
interdisciplinary mind melds. With this new progress, hope is that protein
engineering through AI would get huge boost in community attention and more
accelerated progress!

~~~
ziont
How far are we from this technology and what can we do to get there?

Forget clunky mechanical robots, Boston Dynamics can just engineer fleshy
bullet proof self healing skin system. Think skinned dogs.

~~~
jfarlow
It is our job at Serotiny. We take plain-language requests for proteins with
novel functions (not just binds more tightly, or produces this enzymatic
product over that one), and we use the catalog of existing natural protein
components to build you a protein that has those desired capabilities.

Imagine any natural input that life can read (light, heat, glucose levels,
hormone levels, force, etc.) and any natural output that life can produce
(temp, colors, fluorescence, electrical impulse, etc.). For many of those
options, we can design a novel protein that achieves a linkage between that
I/O.

However, our approach to the problem is very much not like AlphaFolds - we
don't try to scan the 20^600 space by changing individual amino acids, but
rather we don't worry about folding or structure (too much) and instead play
around with discreet functional modules that already exist in nature. Our
approach is a bit more sociological than it is a simulation of
physics/chemistry. But it works.

Optogenetics tools, CARs, SynNotches, BaseEditors are all curious examples,
and there are many more coming online right now.

------
Cybiote
The blog is light on details but it appears _Alpha_ is being used for branding
here. AlphaFold seems to depart in a significant way from the perfect
information game playing agent's architecture. A few questions spring to mind:

1) What is the architecture of the generative network and where exactly does
it fit in the pipeline?

2) What is the interaction with the database? Is there an encoder being
trained with real sequences further augmented with variations using the
generative network?

3) What is the structure of the neural network that encodes the sequence? Is
it a graph network, LSTM or simple conv-net?

4) The gradient descent step is very vague. Is it a physically based
differentiable model (not a neural network) whose parameters are being
optimized with gradient descent using automatic differentiation? Or something
else? In short, there's some detail on scoring but how are the proposals being
generated?

Questions aside, the results speak for themselves and are head and shoulders
above all other showings. I wonder what it feels like for someone whose been
in the field for years.

Despite the high score, there's still a long way to go before results reach
real world utility. It's also worth keeping in mind that from a systems
biology perspective, protein folding is only a small part of what makes
getting clinically useful results difficult.

I might have missed something but I could not find anywhere indications of an
intention to publish further details. That would be disappointing if such were
indeed the case.

~~~
hmartiniano
I'm not really "in the field" but I did some computational work on protein
folding in the past.

I have a hunch on how they are doing this.

From the blog post it appears that the network is using angles and distances
of aminoacids in a given sequence in know structures to predict good starting
point(s) for regular molecular dynamics-based structural optimization (what is
called the "gradient descent step" in the post).

If I'm wrong then I'll just have to try this approach myself one day...

~~~
natechols
gradient descent != molecular dynamics; the latter is simulating forces on
atoms, not attempting to optimize a target function. You _can_ use molecular
dynamics for optimization too - it can be very powerful, just slow - but I
don't think that's what they're describing here.

~~~
electricslpnsld
I read it as they are using a molecular dynamics package to define a function
(potential energy of the protein structure) and are using gradient descent to
(slowly) reach stable extrema (static configurations of the protein).

~~~
natechols
That makes more sense, but the energy function isn't really a feature of
molecular dynamics, anything that does molecular optimization or modeling will
have this information built in.

------
pplonski86
The ranking of the methods is here:
[http://predictioncenter.org/casp13/zscores_final.cgi?formula...](http://predictioncenter.org/casp13/zscores_final.cgi?formula=assessors)
\- the improvement is very similar to ImageNet improvement when deep neural
networks were successfully used in image recognition comparing to
'traditional' methods.

------
jf-
> Our team focused specifically on the hard problem of modelling target shapes
> from scratch, without using previously solved proteins as templates.

I have a feeling they’re going after the biologics market with this. Predict
structure directly from DNA sequences, simulate affinities, then make a batch
and test in-vitro. Throw in a loop to feed back data to make a better DNA
sequence. Definitely heading down the road to automated protein design.

~~~
ramraj07
Predicting protein structure de novo is still a long way from actual drug
discovery. You can. Generate antibodies much fasterif all you want is some
protein that binds your target anyways

~~~
jf-
That’s just minimising the accomplishment, predicting structure need not be
the end of the process for it to be a significant advance.

~~~
timr
No, you're exaggerating. I work in the field. Ab initio structure prediction
(what this is) is an interesting technical challenge, but has little to no
direct impact on biologics (or any other kind of) drug discovery.

The tools and technologies sometimes end up translating but that's a long-term
process.

~~~
jf-
I think you’re suffering from a lack of imagination. It’ll be interesting to
see where closed loop directed evolution[0], which is what I was alluding to
above, ends up in five years with these kinds of advances.

[0][https://en.m.wikipedia.org/wiki/Directed_evolution](https://en.m.wikipedia.org/wiki/Directed_evolution)

~~~
timr
Ab initio structure prediction has nothing to do with directed evolution. It
doesn't enable it or make it better. Directed evolution is a laboratory
technique that depends on scale and cycle speed -- huge numbers of variants
are screened rapidly against an assay during each round. Ab initio structure
prediction adds nothing to the process.

You might argue that you could then predict the structure of the best
variants, but predicted structures are all but useless for drug discovery.

~~~
jf-
Think of this: You have a digital library of sequences that are fed into an AI
that predicts structure. The structures are fed into an AI that predicts the
desired function based on that structure. You use an optimisation algorithm to
identify candidates. You synthesise the most promising candidates and use
those as the basis for generating a physical library, and then screen as
usual.

You now have a way of navigating sequence space far more effectively, so you
can explore more of it. You could also potentially use the results to feed
back into the system regarding function, so it could become smarter over time.

~~~
natechols
You're describing what protein design researchers have been doing already for
years already. Except for predicting the desired function, which AlphaFold
doesn't do either - as the structural genomics projects of the 2000s found
out, having the protein structure doesn't magically tell you what it does _in
vivo_.

~~~
jf-
> You're describing what protein design researchers have been doing already
> for years already.

If that’s so, can you link to any supplementary material about it?
Particularly with respect to how machine learning is being used, how the
candidate selection process works etc. I’m curious about the subject.

> Except for predicting the desired function, which AlphaFold doesn't do
> either - as the structural genomics projects of the 2000s found out, having
> the protein structure doesn't magically tell you what it does in vivo.

Protein function prediction is a real thing, and it requires knowing the
structure. Good structure prediction is a step towards this.

~~~
natechols
Sorry, to be clear, there usually isn't any machine learning involved (at
least not in the examples I'm familiar with), but the rest of the process is
very similar. My point is just that it's not a new workflow and it's not
something that the existing tools can't do; better predictions can reduce the
search space and/or the number of iterations, but unless they're suddenly an
order of magnitude more accurate, it's still an incremental improvement. It's
difficult to guess from the CASP results how well this approach will reduce
the number of false positives, which as I understand it is a big bottleneck in
the design process - IMHO that's a much more interesting problem to solve than
ab initio prediction, although they're closely related.

~~~
jf-
> Sorry, to be clear, there usually isn't any machine learning involved (at
> least not in the examples I'm familiar with), but the rest of the process is
> very similar.

No problem, shame though!

It seems to me there must be scope for using AI to improve this process given
the results it achieves in other domains, and the alphafold result is very
encouraging. Maybe that order of magnitude improvement will eventually be
possible.

~~~
lambdadmitry
To add to the GP, there are at least two more fundamental problems with the
computational approach:

\- chaperones. Not all proteins fold by themselves, quite a few bind to an
additional protein that helps them fold in the desired shape. It means that
the final state is impossible to achieve from the "initial" state by gradient
descent.

\- proteins don't necessarily exist in the minimum potential energy state.
Moreover, sometimes the state flips on addition of a ligand (e.g. myosin's
relationship with ATP) and that's crucial for the protein function.

So static folding only gets you so far. Unfortunately, nature is hideously
complicated and "entangled", so there is a tremendous gap between even perfect
protein folding and real in vitro results.

------
cjohansson
It seems like this AI makes probabalistic guesses based on empirical data, so
you need to validate the guess by comparing it with reality. Sometimes reality
is not available and from my ignorant perspective, a much better way would be
to discover the mathematical principles behind protein-folding and deductively
know the exact folding with certainity. Focusing resources on a deductive
solution to the problem is much better than creating a statistical guessing
machine. AI solutions often seem to shift focus from deductive solutions to
statistical ones and I don't think that is progress in the longterm

~~~
jf-
It’s the only progress we’re likely to get on mathematically intractable
problems, until maybe a few generations into quantum computing.

~~~
opless
The hopes pinned on quantum computing seem to be like hoping holistic medicine
cures cancer.

~~~
jf-
Correct me if I’m wrong, but that reads to me as thoughtless eye rolling
cynicism. Is there a basis to it? Does quantum computing not offer our best
shot at solving currently unsolvable problems?

~~~
Jach
> Is there a basis to it?

Yes. "Quantum" is used in so many wrong instances that the default reaction to
a stranger using it probably should be eyerolling.

>Does quantum computing not offer our best shot at solving currently
unsolvable problems?

Not in general, there are very few problems where shor's algorithm or grover's
algorithm apply. There are many more intractable problems that are out of
reach regardless of computing power. There are though a lot of 'unsolvable'
problems today that are just a matter of more (non-quantum) hardware and
software.

If you want to rid yourself of quantum computing delusions in particular, try
reading Aaronson.
[https://www.scottaaronson.com/democritus/](https://www.scottaaronson.com/democritus/)
and maybe for a primer on the sort of problem shor's algorithm can help with:
[https://www.scottaaronson.com/blog/?p=208](https://www.scottaaronson.com/blog/?p=208)

~~~
opless
I can't really add anything to this reply, it hits the nail on the head with
some references I've not seen before.

Thanks Jach.

------
dekhn
A long-running joke at CASP was that eventually, a neural network would be
trained that could predict structures as well as Alexey Murzin. Alexey is the
guy who would routinely "win" CASP by explaining how to memorized the sequence
on a 6 months before from a preliminary structure poster at some random
conference (he also helped create SCOP).

------
melling
I imagine there are hundreds, if not thousands, of scientific problems that
DeepMind could help with. It's just a matter of getting people working on the
problems.

~~~
wrinkl3
I wonder if they're gonna collab with Neuralink. Among all of Musk's startups
those two are poised to synergize the most.

~~~
forgotmyhnacc
Deepmind is owned by alphabet, not Elon musk.

~~~
wrinkl3
Mixed up Deepmind and OpenAI again, my bad.

------
breatheoften
Reading the description of the submission process is pretty interesting:
[http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf](http://predictioncenter.org/casp13/doc/CASP13_Abstracts.pdf).
There were multiple submissions and manual synthesis of different model
results in some cases.

I’m curious how this competition works exactly — it seems like a set of label
predictions are submitted and some form of accuracy result feedback is
provided (a single accuracy score for the whole prediction set?). And that
there are a certain number of allowed submissions ...? How much of the
ultimate strategy for playing this game at a high-level ends up being around
optimizing for receiving as much leaked information from the test set as
possible — is best guess at this point that this result is likely to be a good
indicator of a true increase in prediction capability ...?

~~~
kxs
The targets are completely unknown. They were experimentally solved, but not
submitted to the Protein Data Bank. You basically get a target every day
(meaning sequence of amino acids) and then depending on the category, you have
three days to predict it (I think upto two weeks for "human" servers). In the
modeling category they participated in, you can submit 5 models. A model is a
fully predicted 3d structure of the protein. The targets are mostly
independent. Some targets were split over a few days. But in principle, a
target on day X has no connection to a target on day Y. The targets have
varying difficulty. There are two categories: TBM (templated based) and FM
(free modeling). You don't know which protein corresponds to which category,
you can just guess by looking at the available template data. They focused on
FM targets. Meaning there are no homologous available. It's hard to say how
good of an indicator the results are. Looking at the contact prediction
results, many methods are getting very good at constructing MSAs (gathering
similar sequences). We already saw this at CASP12 - I think the FM targets are
getting "easier" in that sense. There is basically zero feedback throughout
the whole competition. Some targets are released after the deadline (because
of publications), but in general, you don't know anything until the CASP
meeting, which currently takes place. The competition ended in August.

------
derangedHorse
This was a pretty well written article. It was slightly technical, but still
accessible to the average lay-person.

------
mikhailfranco
Tangential, but great interview with Demis Hassabis here:

[https://www.bbc.co.uk/sounds/play/p06qvj98](https://www.bbc.co.uk/sounds/play/p06qvj98)

(skip to 01:30 for real start)

------
lawrenceyan
It's great that computational protein design is finally reaching a point where
our computational and algorithmic capabilities have made it possible to
actually start implementing legitimate real potential products. This is what
people like D.E. Shaw Research, David Baker's Lab, Vijay Pande's Lab, and
countless others have spent the better part of their lives on, and I'm
incredibly excited for what we can and will achieve with this technology.

------
stdplaceholder
I wonder if team “zhang” is angry. They’re way ahead of the rest of the
entrants and would have had an impressive result, if they hadn’t been blown
out of the water by A7D.

------
nie
I have wrote an intro to the problem for lay-person. To the best of my
ability. I hope this will be helpful for some of you[1].

[1] [https://blog.nilbot.net/2018/12/pipeline-protein-
structure-p...](https://blog.nilbot.net/2018/12/pipeline-protein-structure-
prediction/)

------
syntaxing
Is there something equivalent but for medicine? Essentially AI assisted
medicine generation. If not, does anyone know what the field name is (I
thought it was molecular biology but I might be wrong)?

~~~
klmr
You might be interested in computer-aided drug design [1], in particular
computational target prediction. It’s a fairly big field (both in academic
research and in commercial application), and not exactly new. This _is_
technically a subfield of molecular biology but that term is extremely broad.

[1] [https://en.wikipedia.org/wiki/Drug_design#Computer-
aided_dru...](https://en.wikipedia.org/wiki/Drug_design#Computer-
aided_drug_design)

------
atomical
[http://blogs.sciencemag.org/pipeline/archives/2018/12/03/the...](http://blogs.sciencemag.org/pipeline/archives/2018/12/03/the-
latest-on-protein-folding)

> But protein folding is far from a solved problem, fear not. XKCD’s take on
> this remains accurate! It’s going to be very interesting indeed to see the
> progress over the next few years in this area, but that progress is not
> going to be the discovery of some general solution. It’s going to be a
> mixture (as mentioned above) of better understanding of the physical
> processes involved, larger databases of reliable experimental data covering
> more structural classes, and faster/more efficient ways for searching
> through all these (both the possible structures and the real ones) and
> generalizing rules to tell us when we’re closing in on something accurate.

------
xvilka
Is this able to solve all pending Folding@Home tasks?

~~~
comepradz
Possibly not as in new FahCores. They might be useful in conjunction to each
other, as AlphaFold is useful for protein structure prediction, Folding@home
can confirm the predicted structure through simulation.

~~~
TomMarius
Is the confirmation significantly faster or is the same amount of work needed?

------
YetAnotherNick
How does it compare with foldit top entries? I thought semi guided way is the
best way by far to solve this.

------
kinchen
What's the path from protein folding to 'new potential within drug discovery?'

~~~
cing
The path is most likely through reliable structure prediction of drug targets.
That would open up rational drug design projects that may have previously been
impossible. The only problem is that experimental structure determination is
so good in pharma, that it's hard to compete. For example, on a structure-
enabled project, it may be possible to experimentally solve multiple high-
resolution 3D models per week with an order of magnitude higher accuracy than
predicted models. Once you can routinely get structures, there's still the
rest of the drug discovery pipeline left to go.

~~~
burning_hamster
> experimentally solve multiple high-resolution 3D models per week

I thought this was only true if you already have a structure; otherwise you
typically run into the [phase
problem]([https://en.wikipedia.org/wiki/Phase_problem](https://en.wikipedia.org/wiki/Phase_problem)),
which is often a significant hurdle. But I haven't done much biochemistry in
years, so there might be better approaches than MAD/MIR/etc. that make the
phase problem a non-issue.

~~~
kinchen
If you already have the structures, what is there left to solve?

------
buboard
How about they scale this up to predict human phenotypes from DNA sequences.

~~~
assblaster
Deriving phenotype from genomics is partly a problem of data. To give a
computer the best chance to definitively correlate phenotype with sequence,
you need millions to tens of millions to hundreds of millions of DNA sequences
from unrelated individuals from every possible genetic lineage, and you'd also
need objective analysis of phenotypes without cultural influence, ie someone
who is boisterous in one culture would be normal in another culture,
independent of the actual genetics.

Computationally, this is statistical analysis, and I doubt that AI would be
able to offer anything unique. Protein folding prediction, on the other hand,
is more of a question of "where do you start to arrive at the answer most
efficiently", and AI is well suited for this, and it would be much better than
humans at prediction using methodologies and correlations far outside of human
brain capability.

~~~
buboard
ok. but how about just faces? i wonder if it is possible to find enough data
for that

~~~
assblaster
It is hard enough finding the similarities between children and parents. The
assumption of face phenotyping is that you could be able to roughly guess what
someone will look like based on their grandparents, which seems unlikely for
an AI to accomplish, and humans are designed evolutionarily to assess
subtleties in faces to identify friends and family from foe.

------
grondilu
Is there any chance this could be used for materials science instead of drug
discovery? I personally doubt the world needs more drugs.

------
ghani
Awesome!

------
John_KZ
Scary. I never thought I'd hate to see protein folding being solved.

~~~
chii
why is it scary?

~~~
alexgmcm
It's weird that people seem incapable of being optimistic about scientific and
technological progress unlike say in the 50's where it seemed many genuinely
believed we would explore space and have robots etc.

Now people just try to malign AI, genetic engineering, nuclear power etc.

~~~
jf-
It’s not unreasonable to be cautious. Nuclear accidents did happen, which is
what lead people to fear nuclear power, even if it is safer than the public
realises. Not barging full speed ahead into unknown technologies with large
downside potential is only sensible.

~~~
ogrisel
The public is also afraid of nuclear technology because of nuclear weapons
used to destroy the cities Hiroshima and Nagasaki with a single bomb each
time.

------
brootstrap
Is this science , or machine learning disguised as science to try and appear
more useful?

------
MakeDLgr8again
Any data available to create an open source version?

------
Invictus0
Their methods don't sound particularly groundbreaking, the achievement seems
to be in the implementation. I hope people don't jump to the conclusion that
AI can do science because there isn't actually any experimentation here, just
a statistical analysis of a few key amino acid properties.

I hope that they will be able to abstract some formulas or rules about protein
folding from the mess of statistics. I imagine that having a rulebook would be
much more efficient than using AI, because protein folding isn't so much a
game of chance as it is an extremely convoluted puzzle.

~~~
natechols
The rulebook approach is basically what groups like David Baker's have been
doing for years. However actually applying the rulebook is a difficult
problem; I expect AI would actually prove much more efficient in the end,
especially with optimized hardware.

