Hacker News new | past | comments | ask | show | jobs | submit login

I met a PHD chemist who spent 30 years developing drugs. Over that period he developed 20,000 APIs. 4 were approved. 1 made it to the market.

Sounds like a very fruitful career then.

A recent read of mine was "The Drug Hunters" [1] which talks about the "highly improbable quest" for a drug that actually makes it to market, and the history of those that have.

Author stated it was not uncommon for some researchers to never have a single discovered API make it to market in a total 20-30 yr career.

1 - https://www.amazon.com/Drug-Hunters-Improbable-Discover-Medi...

I wonder how many times over tests were done by competing companies, duplication, and how much waste/inefficiency there was because of data not being shared - and how much further duplication may continue to happen.

My project in University (and now my company) annoyed a lot of companies when we first published DrugBank in 2006. We basically opened up the data on potential APIs and their targets into a downloadable and useable data set. I remember going to conferences and being both lauded by academics and maligned by pharma folks. This was before Wikipedia or things like Pubchem and ChEMBL were really a thing.

Why would a drug company care if you share public information? Because you make it easier to search?

Good question. They had built a lot of internal tools and datasets that we rebuilt and released. It was a competitive advantage before.

Weren't these things already public knowledge? Did the pharma folks just dislike that you made it easily accessible?

How do you make money off of DrugBank now?

Much of the information about existing drugs was organized, but within textbooks (Merck manual, etc).

It had never been systematically structured and organized online (but likely internally within pharma). The data was (is) manually curated, included off-targets and potentially new targets, along with a suite of deep chemistry features and spectra that linked small molecules to their targets. In addition it was really the first place to organize biologic drugs, with their sequences (largely extracted manually from patents).

DrugBank was all part of a larger goal, which was to decipher the human metabolome. However, it turned out to be more successful that than (http://www.hmdb.ca is the current version of the human metabolome database, something I was also intimately involved in).

In terms of how we make money, we sell access to additional manually curated datasets (with the help of a bunch of NLP stuff for initial extraction and for QA). These datasets are structured for ML applications and integration into pharma pipelines or medical software. Additionally we sell access to an API that provides advanced queries useful for drug discovery, repurposing, and generally looking up drug information in a more uniform way. We focus on developer happiness, good documentation, and speed. Even just getting a drug product list from various jurisdictions, and keeping it up to date, is a surprisingly hard problem that the API solves.

However, keeping the data open and available for academic / student research, as well as publishing and updating drugs through the website is something we love. It's been nice to find a balance where we can get out of the cycle of grant funding but still offer something to the community and general public.

Probably lots. Its a huge issue thats driving the reproducibility crisis in science and academia, so I imagine it would also be a problem for industry.

That's more than what most PHD produce in their careers (in terms of go to market).

Sure, in large part because "going to market" is not the primary objective of many PhDs.

It's still very fruitful compared to PhDs in terms of the pharma industry.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact