Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My Favorite Public Data Sources (jenunderwood.com)
81 points by bpolania on Jan 17, 2016 | hide | past | favorite | 5 comments


Advertisment submarine for Power BI. Shallow on content, mostly a pretty random list.


Here's a long list of (awesome) public datasets:

https://github.com/caesar0301/awesome-public-datasets


I was looking for these types of sources this week to populate a document database. One I ended up using for a demonstration was the "startup company information" hosted at http://jsonstudio.com/resources/ (apparently an extract from CrunchBase, mentioned in Jen's blog post).

I naively thought I could just grab a pile of tweets or something, but most public APIs require registration as a developer.

One quick tip, if you're dealing with JSON dumps as a series of objects (e.g., {} {} {}) that you want to wrap in an array (e.g., [{}, {}, {}]), is to "slurp" them into jq (https://stedolan.github.io/jq/):

    $ jq -s '.' companies.json > companies-array.json


I just started looking at her top listed data source, the GDELT Project[1]. Kind of mind blowing.

[1] http://www.gdeltproject.org/


Yes. I used the GDELT data set in a Geo-intelligence hackathon and it's very powerful, just have in mind that if you use Google BigQuery (actually the easiest way to use the data set) it will cost you money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: