

Ask HN: Is there a GitHub-like service that can host scientific data sets? - niels_olson

As a biomed researcher just starting to collect my first data, I would like to make it public, but I don&#x27;t want to be responsible for the maintenance. I can make the code available. Is Github the right answer for the datasets too?
======
dalke
You need to figure out a few things.

Who is going to use your data? Do they know the format? Do they know how it
was collected or generated? Can they reproduce it? Can they understand why
it's useful? Is there a paper or other document they can use to understand
things?

If someone has questions about the data, who do they contact? (I mention this
because that's part of the broader sense of 'maintenance'.)

How big is the data? If it's 1K then you can store it just about anywhere. If
it's 1TB then you are much more limited.

How long do you want the data to last? How important is it that it survives
past your winning the $70 million jackpot tomorrow and deciding to retire on
an internet-free atoll in the south Pacific?

In most cases I deal with, the answers are: 1) almost no one cares about the
data until it's been published, 2) it's maybe a few MB, and 3) 5-10 years is
fine.

If so, then github, bitbucket, etc. all work fine.

------
webmaven
A project you should look into and keep an eye on is Dat, a "Git for Data":
[http://dat-data.com/](http://dat-data.com/)

------
bendmorris
Figshare: [http://figshare.com/](http://figshare.com/)

DataDryad is another, but it's specifically for data associated with journal
publications: [http://datadryad.org/](http://datadryad.org/)

------
fundamental
You might want to be more specific about the exact type of data that you're
dealing with as there might be something relevant to the particular subfield.
In general though, I do not know of an all encompassing throw-your-huge-data-
here-for-free website.

------
skram
Check out
[http://opendata.stackexchange.com/](http://opendata.stackexchange.com/) \- a
very similar question has been asked there.

------
frewsxcv
How is the data formatted?

