
Predicting Properties of Molecules with Machine Learning - happy-go-lucky
https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html
======
dhruvp
If you're interested in this field check out this talk by Bharath Ramasundar
from Prof. Pande's lab at Stanford:
[https://youtube.com/watch?v=sntikyFI8s8](https://youtube.com/watch?v=sntikyFI8s8).
He's also the author of [http://deepchem.io](http://deepchem.io) \- a deep
learning library for drug discovery.

------
dnautics
why I am NOT bullish on this solving molecular medicine discovery problems:

medchem is notoriously NOT generalizable. A crude example is the reason why
the developed heroin is because in the early days of medchem, the reasoning
was acetyl-salicylic acid is awesomer than salicylic acid, so therefore
acetyl-morphine must be awesomer than morphine. Actually, in many ways it is
awesomer (and that's why it's a bad drug).

Consider Gleevec. Even if you knew the structure of gleevec's target (BCR/ABL)
you would not be able to predict Gleevec, because it works by displacing an
entire segment of the protein out of place which happens to be
thermodynamically more stable (but kinetically disfavorable). Gleevec is a
medchem drug (discovered through combinatorial synthesis) but sadly the
insight into this mechanism is only generalizable in the conceptual sense, if
you take that molecular fragment and graft it onto another molecule intended
for a different target, it probably won't work.

Deep learning depends strongly on generalizable knowledge, and medchem is
notoriously not easily generalizable for well-understood reasons.

Some aspects of medchem - like optimizing bulk synthesis reactions, picking
synthetic routes, guessing at bioavailability, stability in formulations,
might be amenable to ML, but I am not bullish on discovery. Let's hope I'm
wrong.

~~~
refurb
I used to work in medicinal chemistry and agree 100%.

It's not to say ML has no value, but predicting molecular behavior, even in
the simplest system is really dam hard.

Wheb you only under 10% of the factors influencing behavior, ML doesn't get
you far.

~~~
visarga
That means they need 90% more data, right? Like, how other molecules behave in
context, not stand alone. Some of them have similar properties.

~~~
refurb
That's true, but we don't even know what to look for to find that remaining
90% of data!

I remember working with some computational scientists:"just put a methyl group
on this nitrogen and we should increase binding by 100x!".

So we make the molecule, give it to the biologists and find out binding is
actually 1000x worse!

------
sgt101
Amazing - worth looking at Muggletons publications for work using logical
models rather than deep models

[http://www.doc.ic.ac.uk/~shm/mypubs.html](http://www.doc.ic.ac.uk/~shm/mypubs.html)

------
aliakhtar
> One reason molecular data is so interesting from a machine learning
> standpoint is that one natural representation of a molecule is as a graph
> with atoms as nodes and bonds as edges. Models that can leverage inherent
> symmetries in data will tend to generalize better — part of the success of
> convolutional neural networks on images is due to their ability to
> incorporate our prior knowledge about the invariances of image data (e.g. a
> picture of a dog shifted to the left is still a picture of a dog).
> Invariance to graph symmetries is a particularly desirable property for
> machine learning models that operate on graph data, and there has been a lot
> of interesting research in this area as well

------
itchyjunk
Google has a lot of ML experts. Lot of fields can potentially benefit from ML
and also contribute data and concept to ML but these fields don't have ML
experts. I was thinking about this only a few days ago. I am so glad google is
looking to personally contribute to every field possible.

Though property predicting is a hard problem,I think there are low hanging
fruits in other fields. For example, Anthropology where only partial skeletons
are found but we know there is symmetry there. Software regeneration is slow
and expensive and doesn't exploit the symmetry a lot.

A joint project between google and CERN also sounds really cool to me. Or
maybe google can set up a system where researchers with large data can
approach google and see if a symbiotic relationship can be formed.

~~~
deepnotderp
Yes,CERN which does work using advanced mathematics every day doesn't know how
to use multivariate calculus. /s

You do know that CERN, regularly publishes papers in machine learning , right?

~~~
laingc
A personal anecdote to support your point: The first company I worked for in
Europe was a well-established ML company that had been doing predictive
analytics long (10+ years) before the current fad.

Founded by a former professor from CERN, and staffed about 90% from CERN
postdocs. I was the only member of my team who was not a co-author on the
Higgs boson discovery paper.

So yeah, people at CERN are pretty well aware of what can be done with ML.

------
DrNuke
Materials science is buzzing in this respect from nanoscale to microstructural
level but ML cannot predict the existence and, in case, the stability of the
novel items you find. It is more a targeted, extrapolated attempt (because of
some exact properties we wish to improve for targeted applications) than magic
8 ball.

~~~
selimthegrim
No, but you can use ML to improve the accuracy of density functionals, even
those targeted towards prediction of specific properties.

~~~
marcosdumay
Can you? People already tested a big number of sub-exponential algorithms
without any luck. If it was something easy to interpolate, one'd expect
somebody to have some amount of success already.

------
nkjoep
I'd like to know what the Dr. Prof. Vijay S. Pande, father of the Folding at
Home project, thinks about that.

~~~
ktamiola
Probably rolling out of laughter. You don't need ML to "predict" properties of
molecules. Not a single physicist will but ML predicted molecule properties.

------
eutectic
I wonder how long it will be before state-of-the-art SAT solvers start
incorporating similar networks.

~~~
letitgo12345
I doubt anytime soon (at least as part of current SAT solvers). A big barrier
to applying ML to the process of SAT solving is that a lot of times, it's just
faster to do a search with a simple heuristic for variable selection than try
a much more time consuming ML method (and neural networks will be quite time
consuming relative to the kinds of heuristics usually used) to do variable
selection better.

Quantum is a special case really as DFT computation is already very time
consuming.

~~~
deepnotderp
Iirc graph ConvNets have been successfully applied to some Boolean Sat
problems.

~~~
letitgo12345
Only ones I've seen try and predict the satisfiability directly. But usually
in SAT, you're either interested in a solution or a proof of infeasibility.
Prediction can't do the latter and afaik, existing non-ML based SAT solvers
are far better at doing the former.

------
cody8295
I'm taking an AI class and decided to choose computer aided drug discovery as
my research topic. So it's pretty cool to see stuff like this come out.

------
ktamiola
Yet another Arxiv "paper"...

------
mtgx
Can't wait until we can combine quantum computers with machine learning. It
should lead to a revolution in material science and medicine.

