

US Census API now available - dangoldin
http://www.census.gov/newsroom/releases/archives/miscellaneous/cb12-135.html

======
grantjgordon
Anyone know if there's an easy way to get the 2010 census data all at once the
way they've made the 2000 census data available?
(<http://www2.census.gov/census_2000/datasets>)

For ease of exploring new ideas it would be super handy to be able to push it
all into a hadoop cluster on aws or some such nonsense.

EDIT: Found it! Hopefully...
<http://www2.census.gov/census_2010/04-Summary_File_1/>

~~~
sp332
Amazon has 2000 census data freely available
<http://aws.amazon.com/datasets/Economics/2290> as part of their Public Data
Sets. I suppose they will have 2010 data up soon.

~~~
grantjgordon
Thanks!

------
yummyfajitas
Something I don't understand - what does an API actually offer?

Right now, I download a zip file containing a bunch of csv files. Then:

    
    
        $ unzip filename.zip
    
        $ ipython -pylab
        import pandas
        sum_data = read_csv("atussum_2011.dat")
    
        adult_indices = where(logical_and(sum_data[:]['TEAGE'] >= 18, sum_data[:]['TEAGE'] <= 65))[0]
        ...
    

How does an API benefit me? Having done plenty of work with json APIs, I can't
see how it could possibly be easier than this.

~~~
azernik
That is easy to do for a one-off processing (I did it recently), but in my
opinion is a lot more of a pain if you want to do queries on-demand on your
web server or mobile app in response to user input. Maybe less importantly,
this means that the Census Bureau has written some handy search tools (filter
by state, zip code, etc.) so that you don't have to - that was never a really
big deal, but still annoying.

~~~
yummyfajitas
Queries on demand, with handy search tools:

    
    
        $ scp census_data.csv admin@my_postgres_server:/tmp/census_data.csv
        $ ssh admin@my_postgres_server
        $ psql
        # CREATE TABLE census_data ... ;
        # COPY census_data FROM '/tmp/census_data.csv' (DELIMITER ',');
        # CREATE INDEX idx_census_zipcode ON census_data(zipcode);
    

To do the actual search:

    
    
        conn = psycopg2.connect("dbname=census_data user=postgres")
        cur = conn.cursor()
        cur.execute("SELECT count(id) FROM census_data WHERE zipcode='%s';", (zipcode, ))
        return cur.fetchone()
    

What advantage does http/json have over this?

(Yes, I realize I'm missing GROUP BY's.)

[edit: I don't mean to be negative about government transparency AT ALL. I'm
only criticizing the particular technical choice here - for _small structured
data sets_ , a bunch of csv's in a zip file is the clear winner.
Pandas/excel/etc >> json over http for ad-hoc work, and postgres >> json over
http for interactive queries (or ad-hoc work).]

~~~
sholanozie
The retrieval and processing of the data is done externally (i.e. not on your
server).

~~~
yummyfajitas
CA is 973M as zipped csv. CA is a bit over 10% of the US population, so the
whole data set will be about 10gb. You can fit that on one of the cheaper
linodes pretty easily.

With a few indices and maybe even a materialized view (or even pruning data
you don't need), you can answer most queries so fast that latency between
linode and the census > query time.

I know munging csv's and writing sql isn't as sexy as JSON APIs or mongodb,
but sometimes the simple solutions are the right ones.

~~~
Zev
Why should I have to install Postgres to be able to play around with data?
This is one less step to think about.

~~~
yummyfajitas
As I pointed out in my first comment, you can also just load it with
pandas/excel/etc. I brought up postgres for the specific case of "queries on-
demand on your web server or mobile app".

------
knowtheory
It's good that they're trying to update census access, because by all
accounts, trying to work with census data has been hell on earth.

A bunch of news apps guys under the aegis of Investigative Reporters & Editors
put together an alternative API that's much easier to work with:

<http://census.ire.org/>

<http://census.ire.org/docs/javascript-library.html>

~~~
slurgfest
If you think that working with US government data is hell on earth, what kind
of publicly-provided data are you normally working with?

~~~
knowtheory
To be clear, it's not the data that they've collected that's the problem, it's
the data access that the census has thus far provided. That's why it's so
important that they've made an attempt to update their portal and API. That
said... their dev forum still requires devs to wait for manual moderation in
their sign up, for instance.

------
maclaren
Another resource that has existed for a while is the "Integrated Public Use
Microdata Series". Though I don't think they have an API, they do provide a
vast amount of data and have tools for online analysis.

<http://usa.ipums.org/usa/> <http://usa.ipums.org/usa/doc.shtml>

~~~
guan
IPUMS is often used by the researchers (including myself) who would download
the entire dataset and do regressions on it, which might be one reason why
they haven’t bothered to create an API.

------
sinzone
The US government is pushing down IT development costs by moving to an API
centric model.

------
jsmcallister
I've been hammering code on this all weekend. If anyone is interested in
discussing, feel free to contact me. I'm mainly working on queries for county
information regarding age, race, sex, occupation, and income.

------
bherms
Excited to see what kind of cool projects people come up with using this data.
I'm trying to think something up right now myself.

------
dangoldin
I'm impressed that the data is being returned as JSON rather than XML.

------
chickenhead
I made this extension based off the API
[https://chrome.google.com/webstore/detail/ljbaebikafaieieblj...](https://chrome.google.com/webstore/detail/ljbaebikafaieiebljlmgdgfgkfambni)

