Ask HN: What medical datasets do you need?
We recently announced YC AI (https://blog.ycombinator.com/yc-ai/). This is only the first step. Our long term goal is to democratize AI development. We want to make it easier for startups to compete with the big companies.

One thing large companies have is data. We're experimenting with ways to allow startups to get similar assets, and we're starting with medical data.

If you're working on AI and need medical data, please help us by filling out this form: https://goo.gl/Dr9FzB.






Structured longitudinal patient data (diagnosis and procedure codes, lab data, step data from fitbits, etc), but with AI unstructured may become more useful as well. This is probably an opportunity in itself.

Any tagged data sets like CCD's with SNOMED/LOINC encoding. Basically anything that is serialized in HL7/FHIR for a large enough population longitudinally. It's the time oriented set of population data for a region, like a major health center over a period of five to ten years or better.

I'm not sure such thing exists: "large companies with lots of medical data". Medical data is often confidential and belong to hospitals.

I think health insurers know every little medical detail about you from every interaction with the health care system you have that they process a claim for.

The dataset spanning all of them is likely to be in the tens or hundreds of TB range, if not PB.

In the US hospital systems are increasingly huge companies.

How about pricing data?

Tagged medical notes from Medical Records. The same way there is ImageNet.

