

IBM wants to use big data to predict heart disease long before it strikes - smanuel
http://venturebeat.com/2013/10/09/ibm-wants-to-use-big-data-to-predict-heart-disease-long-before-it-strikes/

======
waterside81
I work at a company that does applied machine learning and health care is the
#1 source for prospective business. The hardest part isn't the analysis - it's
getting good, valid data. Health care data is so sparsely collected, poorly
structured (if at all) and the privacy issues surrounding gaining access are
very strict (perhaps rightfully so).

The key, as IBM is doing, if working with a large HMO or health care network
who hopefully have switched to a sensible EMR and have built up a good amount
of historical data on patients.

I'd add one last key to getting this right isn't only the breadth of the data,
but the depth. Knowing some superficial aspects of a person (age, weight,
habits) is too naive. You need family history, you need psychosocial aspects
(nightmares, trouble at work, marital problems etc.). If you can get _that_ ,
then you're cooking.

~~~
drcode
I'm also in this field and couldn't agree more. I hate to be a cynic, but the
reason IBM is going down this road is almost purely because it allows them to
capture a lot of poorly allocated health care resources (in essence, this is
all about government rent seeking.)

~~~
lambersley
Being a former IBM'er, I can tell you this, there is nothing altruistic in
their motives. They invest billions into research with plans to reap the
rewards. Health informatics, in Canada is brutal to say the least. See eHealth
disaster. IBM employs some really intelligent people who intent to capitalize
on this situation.

~~~
drcode
To clarify my remarks, I don't really have a problem with a profit motive...
my argument is more that you could make money from (1) coming up with better
treatments and selling them or (2) extracting poorly allocated resources from
the government. We need fewer health care projects that are the latter.

------
drcode
We need to start using all the big data muscle on basic research of cell
chemistry, less on analysis of patient data.

I could probably fit all well-structured data elements in existence for all
medical patients in the US for a year on a single DVD. (and even the
unstructured non-image data is pretty sparse.)

This is not really a problem where "big data" techniques are the right tool-
There's too little data. However, I believe we should be experimenting on cell
cultures in laboratories at large scales. None of this is happening, I
believe... no laboratory is running hundreds of thousands of cultures in
parallel with carefully manipulated chemical environments and generating data
from them.

A data set generated from such cell culture analysis could be petabytes in
size. With that type of data set I believe it would be possible, using big
data techniques, to get a lot of rigorous and causative details of chemical
cell pathways relevant to disease formation that we currently lack. This is
the direction I think we need to be going.

I certainly would love to be wrong and would love to get news that IBM has
cracked heart disease with this new project. However, my guess is that this is
a very inefficient way to apply big data techniques to cure diseases.

~~~
bigwaff
"I could probably fit all well-structured data elements in existence for all
medical patients in the US for a year on a single DVD."

As someone who builds and supports a massive "big data" healthcare platform, I
have to say you are horribly mistaken. Non-image well structured patient data,
even if compressed, across all healthcare, with potentially multiple EMRs and
data collection systems per healthcare organization, even if just the last
years' worth of data, is "big data" and can benefit from "big data"
techniques. Like all "big data" endeavors across any number of problem
domains, health care application of "big data" is in its infancy.

Will the use of "big data" techniques over patient data discover disease
cures? I don't know. But there is gold in them there hills... what kind? We
will have to wait to find out.

~~~
drcode
I agree I might be wrong, but let me try to do the math (so you can improve it
with your knowledge.)

Let's say 10 million hospitalized patients a year, 100 data points entered by
a nurse, another 300 from lab tests during the visit, 50 entered by a doctor,
50 entered by a pharmacist.

Let's postulate 8 bytes per data element (most are numeric or ordinal.) This
leads to:

10,000,000x8x500=4E10 bytes=40GB

Compressed, this would fit on a DVD (8.5 GB)

(I agree the numbers are somewhat larger if unstructured text is added to the
mix, but it still seems to fall short of the terabytes of data I would expect
for a "big data" approach.)

Usually I thought the "big data" moniker is applied to web sites where every
minuscule site interaction is audited from every user across 10mil+ users,
leading to much larger data sets. Or, it is used in the context of crawling
the entire web where many many petabytes of data become available.

You might have better knowledge of the numbers involved and can correct me,
please do.

------
pasbesoin
For whose benefit? Speaking generally [1], as a patient in the U.S., I no
longer believe it is for my (i.e. the patient's) benefit.

This is the result of both reporting on the health care landscape that I've
digested as well as repeated personal experience.

The U.S. system is, at scale, profit-driven. Profit in general serves a
purpose; however, in the U.S., it has superseded that of providing effective
health care.

In other words, in the U.S., it has become solely about short-term, private
profit. Public good and longer term, societal benefit have been relegated to
imagery.

\--

1\. meaning not as someone having the specific condition under consideration

------
madaxe
This, of course, has nothing to do with improving quality of life or
preventing heart disease, but will have plenty to do with denying health
insurance or increasing premiums.

