
Large Public Datasets - amazedsaint
http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public?share=1
======
robgolding
I'm the tech lead on the team that runs [http://police.uk](http://police.uk),
the UK Police crime mapping website, and our data is available to download and
through an API at [http://data.police.uk](http://data.police.uk). The dataset
isn't nearly as big as some of the ones in this list (~40MM rows), but it's
nice that the Home Office are open with their data.

People have built some cool apps with it which are showcased[1] on the site,
there's even a Pebble watch app[2] which is due to be added to that list
shortly.

We also have an open-source Python client[3] for the API, which I'm planning
to post here once the documentation is finished.

[1]: [http://www.police.uk/apps/](http://www.police.uk/apps/)

[2]: [https://git.bengcooper.co.uk/bengcooper/pebble-
crimewatch/wi...](https://git.bengcooper.co.uk/bengcooper/pebble-
crimewatch/wikis/home)

[3]: [https://github.com/rkhleics/police-api-client-
python/](https://github.com/rkhleics/police-api-client-python/)

------
chestnut-tree
The open data index ranks countries by how much data they make publically
available based on 10 areas (transport, budget, spending, etc). It isn't a
full global picture and it doesn't cover all datasets in every country.
However, it's still useful as a comparison tool and a way of seeing what data
is (and isn't available).

They have rankings for 70 countries. The top 10 (from October 2013) are

    
    
      1. UK
      2. US
      3. Denmark
      4. Norway
      5. Netherlands
      6. Finland
      7. Sweden
      8. New Zealand
      9. Australia
      10. Canada
    

[https://index.okfn.org/country](https://index.okfn.org/country)

------
andrewguenther
Tangentially related, my master's thesis is applying predictive algorithms to
web traffic for scaling purposes and I cannot believe that their isn't more
server trace data available. The best I've done is some data from the mid-90s
and Wikipedia in 2007.

So if any of you wonderful people feel so inclined as to donate some
requests/sec metrics, I would be deeply appreciative.

------
icpmacdo
Anyone have a dataset of Every movie to come out between 1950 and 2013 I need
cast, year released and title?

~~~
siddboots
[http://www.imdb.com/interfaces](http://www.imdb.com/interfaces)

~~~
deanclatworthy
The problem with the data they provide is it is not relational, missing huge
amounts of movies, leaving you with no way to distinguish between movies other
than title which gives you mismatches when movies have the same name in the
same year.

~~~
mcphilip
Yes, imdb is a pretty poor source of data. From my research, a combination of
the well organized freebase film database[1] (~19mm facts) with details filled
in from imdb is a better approach. However, processing data from freebase is
not trivial and requires a decent amount of time investment to grok.

[1][https://www.freebase.com/film](https://www.freebase.com/film)

------
warrenmar
Amazon hosts public datasets at
[https://aws.amazon.com/publicdatasets/](https://aws.amazon.com/publicdatasets/)
Good if you want to quickly spin up an instance, copy data over from s3 and
process it.

------
veb
The New Zealand Government makes a lot of their datasets accessible. You can
also request data too:

[https://data.govt.nz](https://data.govt.nz)

------
ceolol
Any free and good datasets for business and POI addresses world-wide?
Preferably, with geo coding...

~~~
pella
check: OpenStreetMap database ( ODBL license )

[http://stackoverflow.com/questions/1875255/open-source-
poi-d...](http://stackoverflow.com/questions/1875255/open-source-poi-database)

 _" Planet.osm is the OpenStreetMap data in one file: all the nodes, ways and
relations that make up our map. A new version is released every week. It's a
big file (XML variant over 400GB uncompressed, 34GB compressed)."_

[http://planet.openstreetmap.org/](http://planet.openstreetmap.org/)

------
valevk
Thanks for this! Currently writing my master's thesis, and I'm in desperate
need for such data sets. Especially free data sets.

------
saganus
Non-blured:

[https://www.quora.com/Where-can-I-find-large-datasets-
open-t...](https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-
public?share=1)

Someone else shared this tip before. Note the "?share=1"

~~~
dang
We added it to the url.

------
eksith
I'm not signing in to Facebook to read that.

~~~
mindcrime
It's a discussion of "good places to find large datasets open to the public".
There are 132 or so answers, so I wouldn't even try to copy them all here
(copyright issues aside anyway), but here's the current contents of the answer
wiki. Note that some of these got partially truncated by the c&p from Quora,
so if you want the full URL, you'll have to login or get somebody else to fix
it. I'm too lazy for all that right now. :-)

Here are many of the links mentioned so far:

Cross-disciplinary data repositories, data collections and data search
engines:

[http://usgovxml.com](http://usgovxml.com)

[http://aws.amazon.com/datasets](http://aws.amazon.com/datasets)

[http://databib.org](http://databib.org)

[http://datacite.org](http://datacite.org)

[http://figshare.com](http://figshare.com)

[http://linkeddata.org](http://linkeddata.org)

[http://reddit.com/r/datasets](http://reddit.com/r/datasets)

[http://thedatahub.org](http://thedatahub.org) alias
[http://ckan.net](http://ckan.net)

[http://quandl.com](http://quandl.com)

Social Network Analysis Interactive Dataset Library (Social Network Datasets)

Datasets for Data Mining

[http://enigma.io](http://enigma.io)

Single datasets and data repositories

[http://archive.ics.uci.edu/ml/](http://archive.ics.uci.edu/ml/)

[http://crawdad.org/](http://crawdad.org/)

[http://data.austintexas.gov](http://data.austintexas.gov)

[http://data.cityofchicago.org](http://data.cityofchicago.org)

[http://data.govloop.com](http://data.govloop.com)

[http://data.gov.uk/](http://data.gov.uk/)

[http://data.medicare.gov](http://data.medicare.gov)

[http://data.seattle.gov](http://data.seattle.gov)

[http://data.sfgov.org](http://data.sfgov.org)

[http://data.sunlightlabs.com](http://data.sunlightlabs.com)

[https://datamarket.azure.com/](https://datamarket.azure.com/)

[http://developer.yahoo.com/geo/g..](http://developer.yahoo.com/geo/g..).

[http://econ.worldbank.org/datasets](http://econ.worldbank.org/datasets)

[http://en.wikipedia.org/wiki/Wik..](http://en.wikipedia.org/wiki/Wik..).

[http://factfinder.census.gov/ser..](http://factfinder.census.gov/ser..).

[http://ftp.ncbi.nih.gov/](http://ftp.ncbi.nih.gov/)

[http://gettingpastgo.socrata.com](http://gettingpastgo.socrata.com)

[http://googleresearch.blogspot.com](http://googleresearch.blogspot.com)

[http://books.google.com/ngrams/](http://books.google.com/ngrams/)

[http://medihal.archives-ouvertes.fr](http://medihal.archives-ouvertes.fr)

[http://public.resource.org/](http://public.resource.org/)

[http://rechercheisidore.fr](http://rechercheisidore.fr)

[http://snap.stanford.edu/data/in..](http://snap.stanford.edu/data/in..).

[http://timetric.com/public-data/](http://timetric.com/public-data/)

[https://wist.echo.nasa.gov/~wist..](https://wist.echo.nasa.gov/~wist..).

[http://www2.jpl.nasa.gov/srtm](http://www2.jpl.nasa.gov/srtm)

[http://www.archives.gov/research..](http://www.archives.gov/research..).

[http://www.bls.gov/](http://www.bls.gov/)

[http://www.crunchbase.com/](http://www.crunchbase.com/)

[http://www.dartmouthatlas.org/](http://www.dartmouthatlas.org/)

[http://www.data.gov/](http://www.data.gov/)

[http://www.datakc.org](http://www.datakc.org)

[http://dbpedia.org](http://dbpedia.org)

[http://www.delicious.com/jbaldwi..](http://www.delicious.com/jbaldwi..).

[http://www.faa.gov/data_research/](http://www.faa.gov/data_research/)

[http://www.factual.com/](http://www.factual.com/)

[http://research.stlouisfed.org/f..](http://research.stlouisfed.org/f..).

[http://www.freebase.com/](http://www.freebase.com/)

[http://www.google.com/publicdata..](http://www.google.com/publicdata..).

[http://www.guardian.co.uk/news/d..](http://www.guardian.co.uk/news/d..).

[http://www.infochimps.com](http://www.infochimps.com)

[http://www.kaggle.com/](http://www.kaggle.com/)

[http://build.kiva.org/](http://build.kiva.org/)

[http://www.nationalarchives.gov...](http://www.nationalarchives.gov...).

[http://www.nyc.gov/html/datamine..](http://www.nyc.gov/html/datamine..).

[http://www.ordnancesurvey.co.uk/..](http://www.ordnancesurvey.co.uk/..).

[http://www.philwhln.com/how-to-g..](http://www.philwhln.com/how-to-g..).

[http://www.imdb.com/interfaces](http://www.imdb.com/interfaces)

[http://imat-relpred.yandex.ru/en..](http://imat-relpred.yandex.ru/en..).

[http://www.dados.gov.pt/pt/catal..](http://www.dados.gov.pt/pt/catal..).

[http://knoema.com](http://knoema.com)

[http://daten.berlin.de/](http://daten.berlin.de/)

[http://www.qunb.com](http://www.qunb.com)

[http://databib.org/](http://databib.org/)

[http://datacite.org/](http://datacite.org/)

[http://data.reegle.info/](http://data.reegle.info/)

[http://data.wien.gv.at/](http://data.wien.gv.at/)

[http://data.gov.bc.ca](http://data.gov.bc.ca)

[https://pslcdatashop.web.cmu.edu/](https://pslcdatashop.web.cmu.edu/)
(interaction data in learning environments)

[http://www.icpsr.umich.edu/icpsrweb/CPES/](http://www.icpsr.umich.edu/icpsrweb/CPES/)
\- Collaborative Psychiatric Epidemiology Surveys: (A collection of three
national surveys focused on each of the major ethnic groups to study
psychiatric illnesses and health services use)

~~~
eksith
Thanks! That's actually a very helpful list. It's ironic that it's posted on a
large closed dataset like Quora ;)

