

Atomwise (YC W15) Discovers Drugs for Diseases That Don’t Even Exist Yet - mudil
http://techcrunch.com/2015/03/06/y-combinator-backed-atomwise-discovers-drugs-for-diseases-that-dont-even-exist-yet/

======
aheifets
I’m the cofounder and CEO of Atomwise and, since this is Hacker News, I
thought I’d cover the technical details a bit more: We run deep neural
networks on one of the biggest supercomputers (#74 in the world,
[http://www.top500.org/site/50424](http://www.top500.org/site/50424)) to
predict whether a molecule will stick to a disease target (its “binding
affinity”). Understanding the binding affinity of a molecules is one of the
essential questions in finding new medicines; it comes up over and over in the
drug discovery pipeline, including in hit discovery, toxicity prediction, and
personalized medicine.

Our goal is to bring to medicine discovery the same kind of incredible
efficiency gains that computation gave us in aerospace and mechanical
engineering design. Today, people have to physically synthesize and physically
test molecules to figure out how they’re going to behave. That’s incredibly
laborious, expensive, and time consuming. We’re able to get the same results,
but in days instead of months or years.

Given all of the new and re-emerging diseases we’re encountering (such as
Ebola, measles, malaria, and drug-resistant infections, to name a few that
we’ve worked on), I think our species needs all of the help we can get in
finding new medicines. I’m happy to answer questions about what we’re doing,
or the challenges we encounter when we take deep learning algorithms out
beyond image classification.

~~~
somesaba
Thanks for answering questions! I'd be curious to know where you got your
(presumably massive) data from to train a NN to spit out what seems to be
binding affinity between two candidates (drug and target). Do you guys use a
NN for each target? I know you may not be able to answer these questions :)

I hope your team succeeds, keep up the hard work!

~~~
aheifets
Thank you for the kind wishes!

Over the past few years, there's been a huge increase in the amount of data
available for this kind of machine learning. We curate our data from a number
of private and public sources. For example, as part of my doctoral work
([http://en.wikipedia.org/wiki/SCRIPDB](http://en.wikipedia.org/wiki/SCRIPDB)),
I learned how to parse chemical information out of U.S. Patent data, which is
public domain. That said, if you're interested in working on something like
this and need a quick million data points, I'd point you to PubChem as a first
step: [https://pubchem.ncbi.nlm.nih.gov/](https://pubchem.ncbi.nlm.nih.gov/)

------
et2o
MD/PhD student in computational biology:

What kind of information actually goes into your neural net to predict binding
affinity? How does your model actually work?

I've some experience with MD simulations; even these, which have had 1000s of
man-hours of parameterization, are often not very accurate. I'm curious what
you are using to evaluate your model's predictions.

How are you deciding what your targets actually are? How do you relate a
particular drug target to a disease in the case of complex phenotypes? This
seems to me to be by far the most challenging part of pharmacology.

I'm definitely enthusiastic about applying computation to biology, but as
someone also on the frontlines, it's definitely not a clean analogy to
computation in engineering. Biological systems are far more complex,
interconnected, and nonlinear which is something that is sometimes
underestimated.

~~~
dluan
Sometimes instead of serializing the man hours, it's more effective to
parallelize them (see FoldIt, GalaxyZoo, etc).

I got to work in the Baker Lab that developed this:
[http://www.ncbi.nlm.nih.gov/pubmed/22267011](http://www.ncbi.nlm.nih.gov/pubmed/22267011)

~~~
et2o
I'm definitely familiar with foldit. I TA'd a class where we have extra credit
to students for playing around with it. It's very cool! Thanks for your work.

------
adamio
Any hard examples of actual cures found (or existing medicines independently
derived) ?

~~~
aheifets
The typical timeline to get an actual cure all the way through the drug
discovery pipeline is about 14 years. While we haven't been around long enough
for that, we have had our algorithmic predictions validated by follow up
physical experiment. This was even for very different diseases, e.g. multiple
sclerosis and drug-resistant antibiotics.

Also, as I described in my above answer to et2o, we do large retrospective
tests to evaluate our predictive accuracy.

------
88e282102ae2e5b
Do you have any way to predict whether the molecule will be toxic to humans,
or to determine if it will get metabolized in some way that reduces its
effectiveness?

~~~
aheifets
Today, those tests are done physically. But, you're right: if you have a good
system to tell if a molecule will stick to a given protein, there's no reason
to constrain your tests to the protein you _want_ to hit. You can also predict
whether the molecule will go around sticking to necessary proteins in the
heart (e.g., hERG channel), liver (e.g., cytochrome P450), kidney, brain, etc.
Internally, we have a panel of a couple of hundred proteins against which we
can predict these kinds of off-target toxicities.

