
My Favorite Public Data Sources - bpolania
http://www.jenunderwood.com/2016/01/14/my-favorite-public-data-sources/
======
mynewtb
Advertisment submarine for Power BI. Shallow on content, mostly a pretty
random list.

------
phillc73
Here's a long list of (awesome) public datasets:

[https://github.com/caesar0301/awesome-public-
datasets](https://github.com/caesar0301/awesome-public-datasets)

------
SloopJon
I was looking for these types of sources this week to populate a document
database. One I ended up using for a demonstration was the "startup company
information" hosted at
[http://jsonstudio.com/resources/](http://jsonstudio.com/resources/)
(apparently an extract from CrunchBase, mentioned in Jen's blog post).

I naively thought I could just grab a pile of tweets or something, but most
public APIs require registration as a developer.

One quick tip, if you're dealing with JSON dumps as a series of objects (e.g.,
{} {} {}) that you want to wrap in an array (e.g., [{}, {}, {}]), is to
"slurp" them into jq
([https://stedolan.github.io/jq/](https://stedolan.github.io/jq/)):

    
    
        $ jq -s '.' companies.json > companies-array.json

------
geomark
I just started looking at her top listed data source, the GDELT Project[1].
Kind of mind blowing.

[1] [http://www.gdeltproject.org/](http://www.gdeltproject.org/)

~~~
bpolania
Yes. I used the GDELT data set in a Geo-intelligence hackathon and it's very
powerful, just have in mind that if you use Google BigQuery (actually the
easiest way to use the data set) it will cost you money.

