Hacker News new | comments | show | ask | jobs | submit login
Google to Collect Data to Define Healthy Human (wsj.com)
85 points by ismavis on July 25, 2014 | hide | past | web | favorite | 33 comments

There are a few flaws in the thinking here: One, that we can define a "healthy human" or that any humans can be found that are operating at full health. Two, that there is a single set of traits that defines a healthy human, or that it could be distilled via sampling and synthesizing data about many humans. Three, that any such indicators or traits they discover will not change over time.

There is undoubtedly the need for tremendous amounts of research into human physiology and health, but there's a very naive hubris that runs through all these grand analysis projects that promise to solve all human health woes. Our bodies are the most complex systems that we know of in the entire universe. The idea that we are anywhere close to understanding how they should optimally work is just completely daft.

Just look back at the promises made by the Human Genome Project in the early 90s. Yes, we've learned a lot since then, but mostly we've learned that the relationship between our bodies and our genes is far far more complicated than we imagined 25 years ago.

The limiting factor right now isn't the lack of clever algorithms or computational power, but the lack of data that covers enough people over enough time with enough depth and quality. Also, more of former can't make up for a lack of the latter, even if it's sexier, especially for a company like Google.

They should consider partnering with these guys: http://www.nature.com/news/medicine-gets-up-close-and-person....

An alternative limiting factor could be the lack of great, low cost ,measurement tools. But there are some signs we're getting there, like this[1] chip based platform that can test for 170,000 proteins on a single test, cheaply.

Maybe the thing behind this google move is an understanding that it's possible, in some time feasible for google's projects, to commercialize such testing platforms, and they want in at the opportunities that will result ?


If they can truly protect the anonymity of donator data, I wouldn't mind to be part of it.

Heck, when I donate my blood I am already giving a lot of information to the blood center and they know my blood type and can tell if I have certain diseases of not. When I volunteer to do bone marrow match test, I don't know how the firm handles my data. I supposed they can protect my data.

I am not worried about Google knowing more about me or humans. I am more worried about how safe my records are with my clinics and hospitals. See [1]

Lastly, personal question: how do you get into Google X?

[1]: http://www.scmagazine.com/rhode-island-hospital-to-pay-150k-...

I would be more worried that the results of these studies are not made freely available to everybody. I prefer having research funded by public money without seeking a profit.

What do you mean they are funded by public money? I thought Google is paying a clinic to get samples and pay researchers to work on the project, but the researchers have to get approval from the review board of their respective institution. The review board is funded by public money (more or less) I guess, but other than that, what else?

As I understand it, the researcher works for Google, so I suppose that his results belong to Google. Is this not the case?

> I prefer having research funded by public money without seeking a profit.

Now that I read what you wrote twice I might interpret your original sentence wrong. I thought you were implying the work they doing with Google is using public money ("I prefer research funded by public money [done by these researchers to be non-profit].")

You meant you prefer to have public institutions to carry out the research.

Well, in that case, it's only temporary. There are public and private AIDS research, but in the end, the public research is used to seek for profit. Say Google, IBM, Microsoft donates 10B to make this a non-profit research, the end product (the algorithms, etc) will be used by private companies anyway. So I don't see how public vs private really matter if the data and results are going to be publicly known by private company later on for profit.

But if you are going for "pure knowledge", sure, public research would be ideal :D

I don't know how you get in, but you can apply to Google X by searching their jobs site for "Special Projects" or the specific project (for example "Makani", "Loon", etc)

Have there been any reports of anything bad happening to any of those 12,000 people because of the leaked data?

see the page linking to the release (http://www.mass.gov/ago/news-and-updates/press-releases/2014...). But the tapes are missing so who knows :(

I was just googling data breach for hospital; not rare.

This reminds me of something Larry Page said at TED this year:

I say, wouldn't it be amazing if everyone's medical records were available anonymously to research doctors? And when someone accesses your medical record, a research doctor, they could see, you could see which doctor accessed it and why, and you could maybe learn about what conditions you have. I think if we just did that, we'd save 100,000 lives this year.

So I guess I'm just very worried that with Internet privacy, we're doing the same thing we're doing with medical records, is we're throwing out the baby with the bathwater, and we're not really thinking about the tremendous good that can come from people sharing information with the right people in the right ways.

Source: https://www.youtube.com/watch?v=mArrNRWQEso&t=839 at 14:00.

I agree with Larry, at the same time google is one of the last companies on the planet that I would trust with that data.

Much of this internet privacy backlash is aimed specifically at Google and Facebook (and of course the NSA, but let's assume that they don't have any problems getting to your medical data anyway).

Why is that? The data are anonymized.

But then even if they weren't anonymized, I think Google would be one of the company I would trust the most for that. They know what they're doing in terms of security and of all the companies that know a lot about me, they're the one who consistently delivered great value from the data they have. I'm thinking of things like search, Google Now and the maps interface for my location history. Plus I'm happy that the monetization of my data serves to fund great projects like this one. It didn't have to be so...

Even anonimized data that seems very generic and nonspecific, when corolated with other data points paints a very precise and specific picture of an individual, including enough to accurately identify them. This is Google's entire business model.

Health data is very far from this idealized anonimization-friendly case. It is some of the most specific data avalible about you, and it can easily be abused to disadvantage you in ways you can do nothing about (insurance, job prospects, social standing, even as far as getting your life threatened by people who disagree with things like sexuality and abortion)

Im sure Google appreciates your faith in them. But you should carefully consider the consequnces. Once the information is out there, there is no getting it back.

You're making a good point with the "there is no getting back". I kind of trust Google based on their track record but you can never be sure about the future[1]. But all in all what matters is not whether I can trust them perfectly, it's that to me the expected benefits are concrete and vastly outweigh the theoretical risks. I guess I'm in a privileged position such that I wouldn't even mind having most of these data made public to help with this kind of research[2]. To me it is more about progressivism vs. conservatism than about faith in a particular company (though I do worry that one company might concentrate too much power).

[1] Although Google's direction is arguably more stable than is usual for publicly traded companies, given the particular share structure designed to keep control in the hands of the founders.

[2] Mostly, being in generally good health and living in a country where basic health insurance is guaranteed and good enough, with premiums that don't depend on your health.

Anonymizing data is a lot harder than it seems at first glance. So far any large enough dataset that was deemed to be properly anonymized has been used to prove the opposite by identifying individuals.

The only way you can properly anonymize such data is by reducing it to aggregates but then most of the utility is lost.

Taking the combination of google search data and medical data it would not at all be hard to tie some of the searches (and thus identified users) to medical records. Especially not if the diseases / symptoms are rare.

Others have pointed out the issues and difficulty of anonymizing the data, but I'd also be concerned about Google having that kind of data because of their existing business relationships. Google's current customer base is large and small corporate advertisers the world over. If they have a new set of data available internally, I find it difficult to believe that they would not try to sell some kind of service based on that data to their existing customer base. Without extremely strong corporate governance the temptation would be just too high for someone to say "I know - let's make some of our anonymized health records available to this existing large customer of ours."

The question about medical records is a red herring since the problem is mostly solved already. You treat the records as private, which is only unlocked to other doctors who has a recorded valid reason for accessing it. This is how many journal system operate today at hospitals and police station, and except for the occasional bad monitoring of it, it seems to operate quite well.

Researcher should be forced to follow ethical guidelines in order to prevent data mining that try to unearth facts written between the lines, like sexual orientation. Patients should also be asked if the journal can be shared with researchers, in similar fashion as asking patients if they will accept to have an intern in the room. Last, there need to be some liability if journals get leaked. It is very important to create economic incentives for functional security, which experts are paid to regularly verify and test.

This has of course nothing to do with keeping large databases about people and selling it as a service or product. Nothing at all about Google business model.

This is BS on several levels. Then again, that a Goog head honcho is not a friend of “Internet privacy” isn't really a surprise. And him misrepresenting the reasons and consequences of that privacy is to be expected. Never let the truth stand in the way of profit or power.

As for medical records, no institution hoarding them has any interest in sharing them. Patients are not consulted on this, so their privacy is only a tiny part of the reason. As far as patient privacy goes, “anonymising” them is not so easy either, as another commenter already explained.

Yeah, 'cause Google and Facebook, with all of their extensive online surveillance and profiling, have done such a bangup job giving me ads I care about. Fuck that.

EDIT: Seriously, if the yardstick is "but but but think of all the great things we could do, just like we have with the internet" I'm going to request proof that we've in fact accomplished that tremendous good in terms of advertising and whatnot.

If they can't even show that for their main line of business, any claims about medicine seem shaky, no?

I'm excited to know which diets, exercises and lifestyle define a healthy human, which I'm hoping this information will give me (assuming the information will be released).

What if there is no such universal thing? What if our DNAs, tRNAs, upbringing etc. generate a very wide set of health requirements that aren't in sync between all humans but instead compete with each other? Some people eat bacon and trans fat every day and live up to 100 years. Some people die in their 30s on the same diet etc.

Are we going to end up with an average that doesn't fit anyone?

A sort of gmail interface redesign problem, but people's lives are on the line.

Google has shown itself to be driven by metrics to almost the sole exclusion of anything else, metrics often dont tell you what you think they do.

I work in medical imaging and if there is something I've learnt over the last 20 years it's that we understand 'normal' less than any other disease state.

"I'm excited to know which diets, exercises and lifestyle define a healthy human"

The thrust of this approach (as mentioned in the article) is that they're searching for the magic of optimal biomolecular processes, not necessarily a universal regimen.

That's exactly what I'm hoping to find out. Optimal (and achievable) biomolecular processes should be the empirical drive behind any regimen.

This is very exciting at one level. We need a project like this to accomplish a "Species" level map of our biology.

Knowing a little bit about how data moves thru the Labs & clinics systems however, and the widespread tentacles of bad actors like the NSA, this could be too juicy a prize to ignore. What is to prevent the Feds (for instance) from knocking on their doors with an NSL for the data based on some trumped up link?

anyone remember Gattaca ? well we could end up with data based classification version of that here !

A more mundane concern is insurance and financial (dis)incentives for behavior change, http://www.salon.com/2014/05/30/in_the_future_insurance_comp...

"23andMe Is Terrifying, But Not for the Reasons the FDA Thinks", https://news.ycombinator.com/item?id=6811167

"23andme has suspended health-related genetic tests", https://news.ycombinator.com/item?id=6859732

"A problem with much statistical analysis is ignoring the fact that humans, umm, react to things around them. (Social science jargon for this is reflexivity). I know this seems so simple, but it’s amazing how much predictive analytics don’t factor this in.", https://medium.com/message/learning-from-natesilver538s-omg-...

That last link on the shortcomings of most predictions based on statistical analysis was a great read. Thanks for sharing.

Are humans deterministic machines, or do they impose their own freewill on to the environment? Predictions based on statistical analysis probably can't account for freewill.

Statistics can account for free will just as well as they account for other complex factors, which is poorly, with no ability to say what will happen in any particular case. There's no need to get metaphysical.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact