
How the Chan Zuckerberg Science Initiative plans to solve disease by 2100 - nature24
https://www.nature.com/articles/d41586-017-08966-z
======
neuromantik8086
Is anyone on here involved with the HCA by any chance? I have a question about
your efforts.

When I was working in neuroimaging, one of the greatest hindrances to
effective open data sharing was the lack of a standard for how data should be
organized. Basically, what this amounted to was people from different labs
would just throw their data on hard drives and ship them out to people who
requested it as-is, with a variety of different organizational schemes (or no
organizational scheme at all / raw data directly from a scanner with lots of
missing and somewhat important metadata). There was an effort to change that
([http://bids.neuroimaging.io/](http://bids.neuroimaging.io/)) although I'm
not sure how well it has caught on. I've had discussions with friends in other
fields (bioinformatics) where apparently there really isn't any standard at
all and data organization is basically done according to the researcher's (or
postdoc/grad student's) own idiosyncratic methods.

I've always felt like this sort of "here's the dataset as-is, take it or leave
it" distribution severely limits the discoverability of open data, since it
makes it substantially harder to index and make it easy to search using a
variety of relevant criteria (in the case of neuroimaging, it'd be nice to be
able to narrow down the kinds of brain images out there based on scanning
modality or scan sequence). It also hurts the potential re-usability of
datasets, since distributing data without properly populating key metadata
fields can make it effectively useless in many cases.

With this context, I was wondering what efforts the HCA folks are making
towards dataset standardization. Maybe this is all still in the works / a
little to early to ask, but how is the data ingestion phase described on the
HCA website [1] going to conform data and ensure that data is usable?

[1] [https://www.humancellatlas.org/data-
sharing](https://www.humancellatlas.org/data-sharing)

~~~
neuromantik8086
Nevermind, I found the whitepaper:

[https://www.humancellatlas.org/files/HCA_WhitePaper_18Oct201...](https://www.humancellatlas.org/files/HCA_WhitePaper_18Oct2017.pdf)

Although if anyone from HCA does happen to find this post and has any further
remarks I'll probably remain just as interested to hear them!

------
tritium
_tl;dr_ By re-applying lessons learned in mapping biological functions to
genetic origins in c. elegans, one idea is to use computational biology, and
inventory all human cell types, to accurately trace and debug human physiology
at a molecular level. Through a culture of open creativity, other ideas are
also welcome.

