Hacker News new | past | comments | ask | show | jobs | submit login
How NLP can help cure cancer? [pdf] (mit.edu)
98 points by polm23 23 days ago | hide | past | web | favorite | 19 comments



So, I worked at a startup in the early 00s trying to do exactly this — detect disease trends from medical chart data. We had an advantage in that all the data came from a nationalized health care system, so it was relatively consistent.

We were using slightly different NLP techniques (and were actually a leader in the NLP space), but ultimately NLP wasn’t really our restriction: we learned that we couldn’t trust the doctors to enter the data consistently. The data came in like 8 different versions of HL7 per hospital, and we had to fix our parsers every time some department decided to upgrade their records systems. The notes contained so much jargon (that was different in every department and often using the same acronyms with different meanings in context) that we ended up pivoting the tech to another industry.

Medical chart data sucks. HL7 sucks. Pretty much everything about doing deep learning on medical data sucks. Everything about working and selling in the hospital EMR space sucks. There might be something there, but have fun building a profitable company around it.


I did that data curation and analysis in the mid 2000s, before it was all hip and well paid, and the CS group employed data curators because you need and experimented human being, to analyze the data in a peer-reviewed paper, and compare it to the authors’ conclusion. Half the time it doesn’t.

Finding reproducible or reproduced experiments happens at an even lower rate.

Then, context matter, and we know very little about each of our cell, even less about each one behaves in an ensemble.

So no amount of NLP will cure cancer, currently. It’s a simple garbage-in 101 case. It will hopefully change...

However, there is a good chance that traditional approaches to science will have worked well enough for most cancers, and NLP will just be a tool to validate after the fact, build comprehensive database and assist with incremental update to a treatment.


The main problem with this effort is statistical. Medical databases are not accurate samples of the population; they are biased in hundreds of ways (doctor X is running a clinic for disease Y, but doesn't see many poor people because of Medicaid issues, etc.). People get excited about these because there is "so much data", but since it isn't collected prospectively, even if we could bulk extract all the data (which we can't, it's a fucking mess) any conclusions we drew would be invalid.


Great work. At Lawrence Berkeley National Lab in the biosciences (formerly life sciences) division, headed by Mina Bissell[1], one of the leading researchers of breast cancer, we developed NLP-based vector space modeling (embedding) in 2002 to augment genomic research in breast cancer, the extracellular matrix and aging. That was about 2 years after the first human genome was sequenced. The Lab patented[2] some of our methods and it helped pave the way for approaches like word2vec and a few others. There's a lot of great work yet to be in this area and we continue to make slow and steady progress defined by generating new insights, hypotheses and sometimes discoveries using approaches in NLP/NLU. We've recently been applying vector space modeling in NLU to LET radiation research (DNA damage repair) associated to space biosciences[3] which is exciting for extending and protecting human lifespan for space travel. Interdisciplinary aspects are key.

[1] https://www2.lbl.gov/LBL-Programs/lifesciences/BissellLab/ma...

[2] http://www.google.com/patents/US7987191

[3] https://www.nasa.gov/ames/research/space-biosciences/for-res...


This looks very interesting, I wonder if anyone has a link to the video version of these talk slides.

I couldn't find it, but here is the same professor, Regina Barzilay, on the same topic (but a different talk).

https://www.youtube.com/watch?v=424zoKmpZvg


Slide 12/13 talk about how these areas are "devoid of Computer Scientists". In traditional areas like Biology and Chemistry, there is an acceptance that Computer Science is "not a science". They still view CS as IT (service / software). There are amazing broad applications of advanced computational techniques like Natural Language Processing, Probabilistic Learning Approach and Machine Learning within the space but if there's no acceptance of it, we will hinder progress.


For what it's worth, I've been working for a few years in CS + bio at the NIH, and while demand for CS as IT exists, there's also now a pretty strong and broad demand for computer scientists for scientific research. Between all of our institutes there must be a couple dozen labs with projects in the areas you mention. One of the biggest things holding us back is just finding enough people with a CS background and interest in bio problems! From what I gather at conferences, there's a lot of work being done by many groups in the CS/bio/biostatistics space, and likely more to come.


This is changing.


I heard the author’s keynote speech at NAACL a few years ago, basically the material in this paper. I found it inspirational, especially when mixed in her own story on fighting cancer and disappointments in how much potentially useful data is “siloed.” An AI system is proposed that can read and understand all research, lab notes, etc., etc. This may be only achieved in the distant future, but I believe we will get there.


I would like to see the meningof the acronyms in the title. Natural language processing ? Nonlinear programming? Neuro-linguistic programming?


If there were a Neuro-Linguistic Programming Language, its definitive textbook would be called "The Structure and Interpretation of Magic".

https://www.goodreads.com/book/show/964154.The_Structure_of_...

https://www.goodreads.com/book/show/43713.Structure_and_Inte...


I thought "that's silly, but I'm sure it's in the article" - I couldn't find it. I flipped (quickly) through every slide and couldn't see what they meant by NLP. We can infer, but I couldn't find a place where we are implicitly told, and for some reason that's maddening to me.


The title made me afraid it was about Neuro-Linguistic Programming (a New Age "psycho-religion") instead of Natural Language Processing:

https://en.wikipedia.org/wiki/Neuro-linguistic_programming#A...

Alternative medicine

NLP has been promoted with claims it can be used to treat a variety of diseases including Parkinson's disease, HIV/AIDS and cancer.[62] Such claims have no supporting medical evidence.[62] People who use NLP as a form of treatment risk serious adverse health consequences as it can delay the provision of effective medical care.[62]

[62] Russell J; Rovere A, eds. (2009). "Neuro-linguistic programming". American Cancer Society Complete Guide to Complementary and Alternative Cancer Therapies (2nd ed.). American Cancer Society. pp. 120–122. ISBN 9780944235713.


Thats sad. NLP started off as a more scientific take on Hypnosis, as california required certification for hypno, but none for other, identical processes, hence the name change. However the 'inventors' left it in the hands of people who made it a sketchy MLM pyramid of trainers trainings. There IS some good insight, but most of it ia just rehashed lessons from Milton H Erickson retold by Richard Bandler. Then came the pickup artist trend, also latching on to similar techniques. That said, most 'Neuro Linguistic Programming' published in the last 20 years is mostly crap.


Sad they were eventually exposed and debunked? Nope. I'm just sad it took so long.

The inventors, Richard Bandler and John Grinder, were greedy bickering charlatans, both just trying to make a buck from the very start. It was pure crap from day one.

https://en.wikipedia.org/wiki/Neuro-linguistic_programming#I...

>On this matter Stollznow (2010)[18] comments, "[i]ronically, Bandler and Grinder feuded in the 1980s over trademark and theory disputes. Tellingly, none of their myriad of NLP models, pillars, and principles helped these founders to resolve their personal and professional conflicts."

>[...] In 2009, a British television presenter was able to register his pet cat as a member of the British Board of Neuro Linguistic Programming (BBNLP), which subsequently claimed that it existed only to provide benefits to its members and not to certify credentials.

http://news.bbc.co.uk/2/hi/8303126.stm

https://neurobollocks.wordpress.com/2013/03/29/neuro-linguis...

>Despite being nearly 40 years old, and a ridiculous, facile hodge-podge of concepts from psychology, philosophy, linguistics and new-age twaddle with absolutely no support from any reputable sources, amazingly, NLP is still very much alive and kicking. Bandler has kept on developing (and ruthlessly trademarking) a load of new techniques including ‘Design Human Engineering™’, or ‘Charisma Enhancement™’. A lot of his recent work also appears to include hypnosis. His website is essentially one big advertisement for his books, CDs and speaking gigs; and there are literally thousands of individuals, businesses, and ‘institutes’ offering NLP training for a bewildering variety of purposes and people. Bandler has even latterly jumped on the ‘Brain training’ trend with a new company called ‘QDreams‘ (‘Quantum brain training!’; ‘Success at the speed of thought!’ FFS…). Searching on Twitter turns up many, many people earnestly tweeting away about the benefits of NLP. Why is it so persistent? Partly this is because of Bandler’s clear talent for slick marketing, re-invention and dedication to innovative bull-shittery, and partly because NLP was never really clearly defined in the first place, which makes it highly malleable and adaptable to any pseudo-scientific new-age trends that come along. Despite a hiccup in the mid-90s (when Bandler tried to sue Grinder for ninety million dollars) it seems to be as popular as ever, and to be attracting new adherents all the time.

>In my opinion the real stroke of genius in NLP, and perhaps the reason why it’s been so successful, is simply the name. These days we’re used to putting the ‘neuro-‘ prefix in front of everything, but back in the ’70s, this was way ahead of its time. Obviously there’s nothing remotely ‘neuro’ about it, though. Plus the ‘programming’ bit has a deliciously Orwellian appeal; promising the potential to effect change in oneself or others, if you just know the right techniques.

http://skepdic.com/neurolin.html

>"For example, I believe it was very useful that neither one of us were qualified in the field we first went after - psychology and in particular, its therapeutic application; this being one of the conditions which Kuhn identified in his historical study of paradigm shifts. Who knows what Bandler was thinking?" -John Grinder

>postscript: On a more cheerful note, Bandler has sued Grinder for millions of dollars. Apparently, the two great communicators and paradigm innovators couldn't follow their own advice or perhaps they are modeling their behavior after so many other great Americans who have found that the most lucrative way to communicate is by suing someone with deep pockets. NLP is big on metaphors and I doubt whether this nasty lawsuit is the kind of metaphor they want to be remembered by. Is Bandler's action of putting a trademark on half a dozen expressions a sign of a man who is simply protecting the integrity of NLP or is it a sign of a greedy megalomaniac?


I second this, had a 100%-off code for udemy courses and picked NLP as it was featured and "bio-hacking" seemed like an interesting subject. Thought the post was going to be a satirical take =P

The first 10 hours of the course were literally "when you've finished this course; your life will be perfect and you can spread this information to your friends and be a hero! Here's a success-story about a totally real person that finished the course and rated it 5 stars" - on repeat.


What could also help cure cancer is to have anonymized medical records on Kaggle. However, I'm pessimistic that this will ever happen.


I've wondered - what if Facebook for example found a typing pattern that predicted something like Alzheimers in 95% of cases, but couldn't tell any users because of GDPR, or GDPR prevented these typing patterns from being analyzed for this purpose in the first place?


I would personally expect how NLP can help detect cancer as being more rewarding - and cynically speaking that Obama quote in the presentation about America being the country to cure cancer, hell America it seems can't even cure measels. Maybe I'm just having a bad day though.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: