
IBM Watson Overpromised and Underdelivered on AI Health Care - SiempreViernes
https://spectrum.ieee.org/biomedical/diagnostics/how-ibm-watson-overpromised-and-underdelivered-on-ai-health-care
======
1e-9
To me, the main mistake was the series of commercials giving the strong
impression that IBM already had this incredible Artificial General
Intelligence that was indistinguishable from a highly intelligent human and
was solving a myriad of difficult practical problems better than any expert. I
suspect that most who were well-versed in AI felt the ads were disingenuous
from the start. I know I did. I think the marketing campaign would have better
served IBM had it laid out their commitment to achieve these things without
sounding like they were already there.

~~~
hopler
IBM have been doing this since Deep Blue, if not older. It's their core
business model: build cool tech demos and then sell consulting deployments
that use nonexistent features of the tech

~~~
ethbro
> sell consulting deployments that use nonexistent features of the tech

Sell consulting deployments that _hope to fund development_ of promised
features of the tech.

Consulting is a bit like used car selling. If you're not playing close to the
line, then someone else is.

Which, okay. Is what it is. But every customer should be aware this is what
IBM et al.'s business model is, unless you're simply buying bodies.

------
tosser1234
I worked on this project and there were a lot of issues. Two of the biggest
were:

* Whatever the quality of the technology (which I personally never saw as that compelling) was wrapped up in terribly written research code, making it practically impossible to setup and use.

* The Jeopardy demo was made possible by the existence of a marked-up source of general knowledge (Wikipedia), a ready-made bank of questions and answers from past shows (j-archive.org), and the fact that practically anyone had the ability to curate more Q&A pairs. This is almost totally different than the medical use case where the knowledge is wrapped up in proprietary textbooks and papers and the only people able to curate training data are medical professionals.

~~~
telchar
That's not really true. The UMLS has a large graph of marked up medical domain
knowledge that can be used. It's not as specific as one might want for
developing an AI autodoc, but it's quite a bit better than what is available
in most fields. It's actually quite similar to what one can derive from
Wikipedia.

~~~
nickpsecurity
Quora has a nice set of answers about why machine learning isn't used so much
in medical:

[https://www.quora.com/Why-isn%E2%80%99t-machine-learning-
mor...](https://www.quora.com/Why-isn%E2%80%99t-machine-learning-more-widely-
used-for-medical-diagnoses)

Skip to Jae Won Joe's answer to see a case study with a patient showing the
interactive process. Then, especially look at ending questions about missing
or deceptive data. Seems like their good diagnoses comes from a combination of
domain data and _expertly reading people in front of them_. Machines suck at
the part I emphasized. The data sets might not reflect a lot of stuff like
that, too. Who knows.

~~~
tosser1234
This is a great run-down of the realizations that a lot of the engineering
team came to and what I was trying to touch on in my reply above. There is
just so much information & context within a doctor's head and so much that is
going on during an examination that looks simple to our eyes. Even in the
idealized case of having accurate, structured patient data and focusing only
on a simple disease, it's a challenge to diagnose and prescribe a correct
treatment. Even a system that can accept the text of a patient's medical
record and successfully pull out relevant details would be challenge.

~~~
treis
I think this article addresses the half the medical landscape where the
problems are hard and computers are worse than humans at solving them. The
other half is where the problems are "easy". Practices where Doctors spend ~5
minutes with each patient doing physicals and treating minor ailments. In that
half doctors can already easily handle the problems so there's no real value
add for a computer.

It would take a paradigm shift before AI became useful. Going to watson.com
and getting a prescription for antibiotics would be useful and technically
feasible. So would augmenting lower trained people to be able to deliver more
care. Neither are possible legally and it's not exactly a field you can get
away with disregarding the law.

------
sumoboy
The best reference about Watson last year was on reddit, "Watson is a brand,
not a "thing". It covers any technologies that IBM sells in the cognitive
space. They are further broken up into the business areas."

All hype and marketing built on open source tools, all while AWS, Google, and
Azure build out bigger cloud offerings. As OP mentioned, they used to be a
great company.

~~~
altmind
I've heard some day: "Watson is a brand name for IBM consulancy services".
These people formulate what watson is for your specific business.

~~~
SCAQTony
Watson should rebrand then. You can't have a colossal failure such as the one
reported by Stat News: “multiple examples of unsafe and incorrect treatment
recommendations.” — [Memorial Sloan Kettering Cancer Center] and expect to
move on from it as if they were growing pains.

~~~
ethbro
I call this the Netflix ML effect.

If your answer is mission critical, probabilistic ML isn't advanced,
explainable, or reliable enough for your problem.

If your answer is of the great-to-be-right, meh-to-be-wrong sort (e.g. ranking
movie recommendations), then you can and should go nuts with ML.

And if someone really wants to do an ML project on the former, do everything
you can to transform it into the latter.

~~~
djakjxnanjak
If you can tune the specifics-sensitivity curve, you should be able to handle
both cases. You have to be willing to refuse to provide an answer when you
have low confidence.

~~~
AlotOfReading
The existence of adversarial attacks with high confidence on virtually all
production ML systems should indicate that confidence numbers are not enough
to rely on.

~~~
djakjxnanjak
Do these attacks require fine control over the input? Eg. if you are scanning
a patient’s body, does it matter if the model can be fooled by editing the
values of individual pixels? This implies that you have a threat model where
the data coming from the sensor is being manipulated, in which case all bets
are off (the image could be entirely replaced). It doesn’t seem much different
from a statistical model that you can blow up by feeding in values designed to
cause a divide by zero (values that wouldn’t appear in real-world data).

It seems like a problem when classifying user-provided images (eg. identifying
obscene images on a social network) but not so relevant when you own the
sensors.

~~~
govg
You do have a point - all the adversarial attacks on ML models rely on full
adversarial control on the inputs, which probably isn't the case with medical
records. If there was unconstrained access to a patient's MRI scans by an
adversary, then I don't think adversarial attacks on the ML diagnostic models
are the biggest problems you'll face.

------
jacquesm
Anything labelled 'Watson' should be seen as marketing, not product or
consulting.

~~~
mark_l_watson
I was going to agree with you, but then my last name is Watson :-)

Seriously, I used IBM Watson on a consulting project a little over two years
ago and I was disappointed. To be fair I should take another look. IBM has
bought some good companies whose work is exposed in BlueBiz and other web
services and the permanent free tier levels they give away rival the free tier
levels from GCP.

I think AutoML and auto data science web hosted offerings will be a huge
market with right now Google, Amazon, and Microsoft leading the way. If I
worked at IBM in a position of power, I would work hard to create a great
developer experience for simplifying the use of machine learning, NLP, etc.

~~~
jacquesm
IBM bit off more than they could chew with the AI/NLP project and their brand
suffered as a consequence. Some problems are simply hard in the general case,
though there are some spin-outs and spin-offs of that project for more limited
domains that are successful.

~~~
mindslight
Isn't arbitraging off developed brands just a common contemporary playbook?

------
bitcharmer
Don't want to sound too mean but this is not the first time IBM overpromised
and underdelivered.

Anyone surprised by this?

~~~
PaulHoule
To be fair, IBM has succeeded at big jobs too.

For instance, they developed software for the Apollo mission and the Space
Shuttle. The IBM/360\. The IBM PC. AS/400.

~~~
oldgradstudent
I don't think there's anyone who disagrees that IBM was a great company once.

Do you have a more recent example? something that happened several years after
Louis Gerstner first assumed leadership.

~~~
Barrin92
They're still solid in the supercomputer space, no? BlueGene, and Summit and
Sierra more recently are IBM projects.

Admittedly though given their pretty large size I can't name much else.

~~~
trimbo
> Admittedly though given their pretty large size I can't name much else.

Exactly. Even if 20,000 people worked on those supercomputers, mainframes and
Watson, what do the other 350,000 employees work on? Consulting, and it's been
that way since Gerstner.

~~~
mjfl
and what is "consulting"?

~~~
kikoreis
Customer specific projects similar to what other integrators like Ericsson,
Nokia (telco) and Accenture, ATOS, etc do elsewhere.

------
Randypea
I was recently diagnosed with neuroendocrine cancer, which my PCP had been
misdiagnosed for 10 years as IBS. This is more the norm than the exception for
people with this type of cancer, Steve Jobs included. It's a perfect example
of where AI can likely diagnosis what my PCP could not. AI tools need to find
cancer problems to solve which are more suited to their capabilities. e.g.
does anybody know of a company working on "AI for cancer screening"? This is
desperately needed and would have helped me.

~~~
1e-9
Unfortunately, there are still many cancers detected far too late for
effective treatment. It sounds like you were indeed fortunate to have a less
aggressive form. AI for cancer screening generally falls under the category of
"Computer-Aided Detection" or CAD. The commercial and academic CAD efforts
tend to be organized by the primary anatomical site of cancer and the
detection method (e.g. X-Ray, CT-Scan, PET, ultrasound, blood test) . Was your
primary the pancreas or intestine? Are you wanting to contribute to an imaging
detection method or something else? I might be able to help you identify
someone working in the area depending on your goals.

~~~
Randypea
My cancer was finally found in my intestine. This is another opportunity. My
primary oncologist and local surgeon were telling me the primary tumor, which
has metastasized to a very large liver tumor, could not be found. I did my own
research and found they had ordered the wrong type of imaging scan. Only after
I pushed to have the correct scan (Gallium 68 PET/CT scan) was the primary
tumor found. This was a "lack of information" for my local oncologist.
Computer-aided diagnosis would have helped him. An additional new symptom
(flushing) appeared and my PCP recognized a specific cheap blood test was
needed that led to the cancer being found. I am happy to contribute to an
imaging study. But I want to work on cancer screening. What kind of
automation/screening would be needed to prevent 10 years of misdiagnosis by my
PCP? ... not only for this type of cancer but for all of the top 15-20 types
of cancer. People are not being screened. How can we make screening
affordable? And how can we raise awareness of possible misdiagnosis and or
affordable screening?

~~~
1e-9
There currently is no good candidate for a imaging modality that can be used
for a general screening program to find the top 15 to 20 cancers and I am
unaware of anything on the near horizon. Such a scan would have to examine the
neck thru the groin area to cover even just 10 out of the top 15 or so cancer
types. Since screening involves patients with no symptoms, most patients won't
actually have any disease and thus the imaging must be inexpensive, must have
high sensitivity, must have a reasonable false positive rate, must involve
little to no radiation, and must not require injection of contrast agents or
radioactive tracers. That eliminates all of the imaging modalities I can think
of that can examine large areas of the body for cancer. The best we have today
are compromises on these criteria for patients that are at relatively high
risk, such as a smoker or a cancer survivor, or for highly focused screening
programs such as what we have for breast cancer.

------
enthd
I think quite a few companies(and maybe even a good majority of them) will end
up overpromising and underdelivering when it comes to AI products.

~~~
mojuba
That will likely include Google with their effort to use "AI" in data analysis
and search.

------
dre85
I always find it kind of silly when AI is just thrown into a field with the
notion that they'll just deal with the messy, subjective and unstructured data
(like hand written medical notes for example) as is. For me it makes much more
sense to try to clean up and structure the data from the start instead. Maybe
come up with some data acquisition compromise that is both UX friendly and
give rise to more structure and consistency.

~~~
jackfoxy
My company does machine learning checked by physical models. Our single
biggest problem (and management is finally waking up to it) is curating the
incoming data. And this is in a mature industry (oil & gas).

~~~
ethbro
My biggest surprise has been how little everyone is aware of their data
quality.

The only explanation I've been able to come up with is that when it's all
human processed, Joe 2nd-link-in-the-chain just deals with all the
inconsistencies as best he can to get his job done, and never reports issues
up.

~~~
mr_toad
Without those data inconsistencies to fix up every week, Joe would probably be
out of a job.

~~~
ethbro
But Joe typically hates dealing with the inconsistencies, and he can tell you
exactly how they could be fixed.

It generally seems like (a) the suggestions for fixes are impractical to
implement (overly detrimental effect on counterparty), (b) Joe isn't empowered
organizationally to suggest fixes that will be implemented, or (c) Joe doesn't
have access to the IT tools to implement fixes himself.

~~~
mr_toad
From my experience the answer is usually (d) all of the above.

------
rocgf
At one point, there was an API available via the IBM Cloud that you could use
in order to ask Watson questions. I've rarely been that underwhelmed.

After using that, I realized that Watson was mostly a gimmick.

~~~
zeropnc
It’s called Bluemix - it’s hilarously underwhelming

~~~
barbecue_sauce
The greatest thing about BlueMix was its promotional budget. Saw The Force
Awakens on IBM's dime, and got about $50 worth of movie theater gift cards.
All I had to do was sit through a 20 minute presentation about BlueMix case
study integrations.

~~~
chaoticmass
I saw Force Awakens via IBM as well, and I got a cool light saber toy!

[https://twitter.com/chaoticmass/status/677708888083439616](https://twitter.com/chaoticmass/status/677708888083439616)

I did look at the BlueMix platform, and it seemed like a big mix of various
APIs I could tap into. Some of them looked neat, but I never had a reason to
really use any of them.

~~~
barbecue_sauce
Apparently they decided BlueMix was a stupid name and just renamed it IBM
Cloud.

------
taylodl
I figure we’re going to be reading articles similar to this about blockchain
in a couple of years.

~~~
seibelj
The fact Bitcoin is worth $90 billion and has survived a decade of endless
criticism, means blockchain is successful beyond anyone's wildest dreams a
decade ago. I would say Bitcoin and Ethereum have been way more successful
than Watson has.

~~~
freehunter
People kept investing with Bernie Madoff for decades too, because his
investments kept giving returns. People will keep shoveling money into Bitcoin
as long as its value keeps swinging wildly. That's hardly a measure of
success.

~~~
seibelj
How much will Bitcoin have to be worth, and for how long, to make it
successful in your eyes? Or is the entire idea of a digital asset with
intrinsic value impossible to you?

~~~
SiempreViernes
Slot machines pull in a lot of money too, I guess you would label them a
success?

~~~
seibelj
USDC ([https://www.circle.com/en/usdc](https://www.circle.com/en/usdc)) is a
dollar-backed cryptocurrency operating on Ethereum. Over $260 million has been
converted into USDC, and it is used all over the globe to move small or very
large amounts of money securely within seconds
([https://etherscan.io/token/0xa0b86991c6218b36c1d19d4a2e9eb0c...](https://etherscan.io/token/0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48)).
This is a novel application of blockchain technology that is beyond a slot
machine.

------
0898
IBM sold Watson as a black box with a magic genie inside.

You could point it at your data and it would tell you the answer before you'd
even thought of a question.

No wonder we're disappointed.

------
randartie
>”But Watson won’t change its conclusions based on just four patients. To
solve this problem, the Sloan Kettering experts created “synthetic cases” that
Watson could learn from, essentially make-believe patients with certain
demographic profiles and cancer characteristics.”

Is this standard practice in machine learning? This sounds more like regular
programming to get exactly the outcome you want.

~~~
scottlocklin
It's standard practice for idiots. And apparently "Sloan Kettering experts."

There are semi-supervised techniques to do stuff like this in a more
systematic/automated way, but you still don't get anything for free: the
outcome depends on the priors used to do the semi-supervised voodoo. In a
generous moment I might assume this is what they meant, but it's still dumb.

------
imglorp
Not sure about Watson, but there was just this week a story about using
pattern matching to improve breast cancer screening imaging. It seems the tech
is decent but the problem is, it's unclear whom to sue if needed, so there's
an obstacle to using it.

~~~
kochikame
If they'd marketed it this way from the beginning it would have been fine. AI
is great at that kind of pattern-matching.

The problem is that Watson was sold as a kind of "AI Doctor", which is pretty
much ludicrous

------
fybe
During my time in IBM I was walking by the "main" Watson server everyday into
work. I was always impressed at the package, and what ever they said it could
do.

Finally got my hands on it and was incredibly disappointed. Lacking features,
archaic interface, and not delivering on what it was promised. Talked to some
guys working for the Watson team and they told me that half of the AI stuff is
just them doing it manually under the guise of "integration and
configuration".

I did have fun "talking" to Watson whenever I was bored at work, trying to
make him swear

~~~
colejohnson66
So Watson is just IBM’s Mechanical Turk?

------
newen
Main problem is Watson its like a million different things all with the same
name. So no one actually knows what they are selling when they talk to you
about Watson. Literally I had one IBM guy ask to check out their Watson and it
took 5 minutes of them showing me the demo before I figured out what it was
(cloud based Nvidia Digits alternative). If that particular Watson had an
actual name, I probably would've heard about it before and I'd actually be
able to recommend it to someone.

------
Cactus2018
[from wikipedia] Thomson Reuters sold Thomson Healthcare to Veritas Capital
for US$1.25 billion On June 6, 2012. The new company, Truven Health Analytics,
became an independent organization solely focused on healthcare. Truven is a
portmanteau of the words "trusted" and "proven". IBM Corporation acquired
Truven on February 18, 2016, and merged with IBM's Watson Health unit.

------
maxander
A bunch of the projects described, and the (technical) difficulties
encountered, make me wonder if GPT-2-style systems would have better odds. Is
anyone looking into applying that to medical/scientific text NLP problems?

I mean, GPT-2 still often produces nonsense more similar to dream imagery than
useful reasoning, but I gather most medical residency students are half-asleep
most of the time anyway, so... :)

~~~
return0
> still often produces nonsense more similar to dream imagery than useful
> reasoning,

Is there any example of gpt-2 in the wild that is not nonsense?

------
tus87
AI over-promising and under-delivering? Never.

------
himaraya
> IBM

> Overpromised and Underdelivered

Sounds kind of catchy, to be honest.

~~~
mlthoughts2018
If you could work in a clever bit about ageism and poor severance packages,
you’d have a strong contender for IBM corporate mission statement.

------
qwerty456127
What actually is so hard about AI in health care? Why not just take a set of
diagnostic indicators for inputs, map to conditions/treatments as outputs and
train a neural net?

~~~
Ensorceled
The problem is that the numerous easy cases do not result in a useful network
... any 1st-year intern will get the easy results already.

The hard cases, which would be useful to a doctor, occur very rarely. My
father's unusual reaction to a post-bypass drug regimen was something like the
3rd time that happened in Canada. How do you "train" that into a neural
network?

~~~
draugadrotten
Sounds like anomaly detection would be quite useful for monitoring the
expected reactions and comparing with actual outcomes. It may not be the
answer to "what is the right drug?" but it may be enough to say "what he got
ain't right for him"

------
densone
IBM MAKE great commercials. Those commercials are just as gimmicky as most of
the products.

