
Applying deep adversarial autoencoders for new molecule development in oncology - nafizh
http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B0%5D=14073&path%5B1%5D=44886
======
41321121
It seems to me a fundamental problem with this paper is that they're training
an autoencoder on a relatively small set of drugs that have been tested on
cancer. They are then trying to approximately index a much larger dataset of
chemicals and claiming these are novel possible cancer treatments because some
of them have been previously considered as cancer drugs.

It's not clear that this is doing much more than finding drugs similar to the
training drugs in the new dataset. Given that a large part of
pharmaceutical/chemical development is based on slightly modifying existing
compounds, it is not obvious that this adds much to the discovery pipeline, or
that it is robust to false positives (drugs that have similar "fingerprints"
to those seen in training set but are part of a large class of chemicals that
mostly don't fight cancer and are therefore unlikely to appear in the small
training set of possible cancer-fighting compounds).

Fundamentally machine learning works better with large data. It is hard to
believe that training on 6000 chemicals, given the overall diversity in
chemical space (~72 million chemicals in the dataset they're indexing) is
likely to lead to a real "understanding" of what constitutes a cancer-fighting
drug as opposed to parroting existing drugs they're trained on.

The effort from Ryan P. Adams' group
([https://arxiv.org/pdf/1610.02415v2.pdf](https://arxiv.org/pdf/1610.02415v2.pdf))
is I think more reasonable because it trains on larger sets of chemicals, and
is also truly generative in the sense of being able to create new chemicals as
opposed to signatures "similar to" existing ones. Though I should note that
they also had trouble generating plausible chemical structures despite a
larger training set.

------
deepnotderp
Be careful with this one, the premise is sound (coming from a DL background),
but I skimmed the paper and didn't see them comparing it to existing simple
compounds and don't show if their compounds are actually any better.

~~~
kobeya
That's typically something that you to very expensive lab work to be able to
claim.

~~~
stenl
Expensive or not, they didn't do it, so we don't know if those compounds
actually work.

Am I missing something or did they also not use a held-out validation dataset
to assess performance? It seems to me they ran their autoencoder, got some
suggested compounds, listed anecdotal evidence about those compounds, and
called it a day.

------
akhilcacharya
This looks really interesting! Could this sort of approach be used for
immunotherapy as well?

