The City of Chicago is on Github (thechangelog.com)
156 points by adamstac 1543 days ago

This is a very nice initiative!

A question: as a non-american, in view of America's current levels of paranoia, does this really have much potential beyond, say, bike-racks and street-lines?

I'd love to know, for example, which areas are served by fibre, which have a high numbers of wireless communication towers, which are serviced by new (as opposed to ancient) utilities? I can imagine some bureaucrat deeming many game-changing datasets as "security risks".

If that's true, then what's left to publish?

I'd also like to know, what commitment is there to keeping datasets updated? My guess: GitHub makes this much easier. For example, how long before hundreds of privately owned bike racks get added? How long before pathways get crowd-sourced into the data?

We'll see.

Sean Gorman did a lot of work around critical infrastructure and national security at GMU.[1] He ruffled a few feathers at the time. His research at GMU later turned into GeoIQ/[2]. I am not sure if you are interested in actual fiber mapping or the level of paranoia. If its the former there are some publications listed in the research section specifically about fiber mapping.[3]

On this note, I see a great application for small business to capitalize on knowing where more humans have accessibility to non-car parking so to speak. Having access gives the would be business owner to see ahead where bike access is and provide services to those areas.

The City of Philadelphia has a GitHub organization too - https://github.com/cityofphiladelphia.

We use the org to share some of our official city apps. These are usually simple web apps built with tools like Bootstrap and jQuery. We'll be open sourcing more of these going forward.

Not only are we interested in sharing the code for these apps, we're actively encouraging people to fork, improve and send pull requests.

Thanks for sharing. We're gonna work on a list post sharing links to all known large states and cities in the US on GitHub and embrace open source the way that Chicago and Philly are.

There really isn't any data there though.

Pittsburgh has a lot of good data too, it's just not on github: http://pittsburghpa.gov/dcp/gis/

Thanks for sharing we'll be compiling a list.

Merging user contributions comes with a number of problems. It'll be interesting to know how they'll manage it. Or, they won't, and this is entirely token. For example, if the city tenders for engineering works, they will likely be required to supply the consultant with source data like road centrelines and road reserves, as-built water networks, etc. That has to be the official, verified, accurate data. User supplied just won't cut it.

I'm also dubious that GitHub is the right way to release data. There are a huge number of people interested in civic data, and GitHub is probably one of the least accessible ways for 99.9% of people to get it.

For example, a number of city and regional councils in New Zealand publish their data via Koordinates.com. Wellington City Council alone publishes over 50 key datasets:


(disclosure: I work for Koordinates)

See my response below [0], but one of the main reasons they did this with GitHub was to try and make it easier to take user contributions. They know their data is wrong and they want help fixing it from the community.

As a contributor to OpenStreetMap that spent tons of time using their building and street centerlines datasets to improve OSM, I noticed that their data was wrong in tons of places. I approached the city to ask about better collaboration with OSM and about fixing their license to let OSM use their data. After that conversation they committed to releasing data and soliciting feedback from the community.

From your github profile, it appears that you work for koordinates. https://github.com/hamishcampbell

Why not disclose this in your comment?

Apologies, I should. Note that it's also in my HN profile :). I'm not expecting to drum up business via this thread, but we do know a bit about the challenges of open data delivery.

You seem to be missing the point they are trying to make. It's not so much about releasing (delivering) the data as it is about receiving user-contributed updated data in a more or less controlled way, that may allow them to curate it and update their official data. Koordinates seems great for publishing the data, but after briefly browsing its features, it does not allow users to contribute to the updating process, except perhaps via a centralized way, which seems to be what they are trying to avoid.

Github may or may not be the best tool they could have used to accomplish their objective; there are probably better tools out there and Github is barely "good enough"; however, "good enough" usually cuts it. On the other hand since their goal is to allow user-contribution, Koordinate is definitely not the right tool, at least not in an immediately obvious way.

Actually, our primary objective is to release data under an open source license so it can be used by businesses, non-profits, or open source projects, like OpenStreetMaps. Since we are releasing it on GitHub, we're going to experiment with the idea of merging changes made by users. This is an experiment for the community and government, we don't know how many pull requests will be made and the quality of those changes. We're excited to see what happens with data on GitHub to see if we can improve data quality.

Chicago has released over 400 datasets using our data portal, which is located at http://data.cityofchicago.org. The portal will remain the primary way we release data to the public since it provides a great interface, easy way to download data, and the ability to make maps and graphs. The datasets posted on GitHub have an MIT License and which we hope will be widely used by open source projects, businesses, or non-profits. GitHub also allows an on-going collaboration with editing and improving data, unlike the typical portal technology. Because it's an open source license, data can be hosted on other services and we'd also like to see applications that could facilitate easier editing of geographic data by non-technical users.

For the vast majority of non-technical (or even highly technical in some area other than software development) users, The Chicago Metropolitan Agency for Planning has a site called MetroPulse, which offers a large selection of open data browsable through a (sometimes) friendly GUI. It's more demographically and statistically focused (though there is a map view), but it's a much more generally accessible platform than GitHub.

That's good to hear, but it raises the question: why doesn't the City use GitHub to host code and tools to extract data from MetroPulse? Now you've got a situation where there are two versions of the data.. which is the most up to date or authoritative?

It is good to see public entities picking GitHub as a collaboration tool though.

Why is any website less accessible than any other website? Further being realistic anyone who doesn't know how to navigate Github (or can't figure it out), is very unlikely to ever care about this data. The people who do care are the intermediaries who put it in a more consumable form (e.g. mobile apps, etc)

That isn't true. As another commenter noted, I work for Koordinates. Our top data users, by far, are Engineering and planning consultancies, architects and universities.

The first step to releasing data is releasing the data. 3rd party applications are nice to have, but they second-guess the use case. A developer can grab some data and use it to build an app, regardless of the source. But github is a blocker for the vast majority of the audience that is interested in ad hoc analysis, for whatever reason.

OTOH, GitHub does have an API and automate-able mechanisms for submitting pull requests. An app could definitely be built simply for displaying relevant (say, nearby location of mobile device) information and allowing the user to correct any mistakes they see and pushing that as a request for fixing it.

The first step to releasing data is releasing the data.

And they released the data. I still am not understanding what the problem is with github. You don't HAVE to code to grab the data.

I agree. GitHub gives developer access, but at the same time the source can be browsed, linked to, or downloaded as a zip. Too easy.

I am very suprised that you think that "anyone who doesn't know how to navigate Github(or can't figure it out)" (interesting use of brackets there) wouldn't want this data.

Why would knowledge of development, and git, have anything at all to do with if someone wanted this kind of data? I can't imagine why you would draw a connection.

It is raw data. It isn't a brochure or a printable map or anything else directly consumable. I am making an eminently obvious statement, so why is this surprising to you?

Secondly, what is so funny about the parenthesis? What is your point?

I sense that you're trying hard to be snotty in some way, as if you've done your part against elitism, but the end result is rather bizarre.

You are right, I was just irritable, and was projecting onto you a "anyone who knows anything knows git" mentality, which in retrospect you didn't deserve, sorry.

I still think that I know quite a few people who love raw data, love excel, and still wouldn't want to learn git or github, but I was reading that into your message perhaps unfairly.

Github is ridiculously unsuited for one-time publication of blobs. Seriously.

As the person that convinced them to use GitHub, I suggested it to them because they were looking for an extremely easy way to bring changes from the community back in to the city's data. I suggested GitHub and GeoJSON because I envisioned them taking pull requests from citizens interested in adding more detail to their data or correcting existing data.

You're right, though: GitHub is horrible for large blobs of data like this. At the time I didn't know how big the data releases would end up being.

Tom and I have plans to talk more about future data releases and how they might be made with a more appropriate tool.

I have a hard time imagining a more appropriate tool.

The best tool would offer

  * Discoverability
  * Updatability
  * Transparancy (who, specifically, is behind it)
  * Tracability
  * Time stamping 
  * Linkability, with ├╝ber-stable URIs
  * Public issue tracking.
  * Documentation, including the possible crowd-sourcing thereof.
How is this not GitHub?

Edit: In this case we have identifiable and passionate individuals behind the initiative. This is far from faceless and cursory, as most data-dumps are. What's not to love here?

The only issue is that with huge data dumps (the buildings dataset here as GeoJSON is ~2GB uncompressed and ~1GB as shapefile) it becomes difficult to make direct pull requests against the data. Indeed they zipped the JSON file up before uploading it so it's impossible to make pull requests (I originally suggested GeoJSON because a pull request could be read by a human as opposed to a shapefile diff which could not be read).

At AmigoCloud, we are building a platform that can also be used to disseminate/crowdsource geodata. We have delta exports and a dashboard to review edits. Users can use our Android/iOS apps or apis (geojson) to edit the geodata. The system can sync the deltas back as 50+ different formats including ArcGIS Server. Full disclosure: I am the founder of AmigoCloud http://www.amigocloud.com

Your website doesn't work:

Error 310 (net::ERR_TOO_MANY_REDIRECTS): There were too many redirects.

Thanks for letting me know yellowbkpk. I was switching servers and you probably accessed it during the 10 minute downtime we had scheduled.

I think it is excellent, and the haters should fork the data and put it where they want it.

Can you keep us in the loop with how this progresses so we can cover it?

There are far better solutions, including things like http://ckan.org.. sounds like very little research was done before this decision was reached.

Thanks for your constructive feedback.

When I was searching for suggestions I didn't see any way for people to submit changes to the released data back to the organization releasing the data. Getting community feedback is what they cared about the most so I suggested GitHub.

Cool initiative. Quick import of the bike rack data into Fusion Tables and then into map view - https://www.google.com/fusiontables/DataSource?docid=1KXdOsA...

I don't believe it is possible to have Fusion Tables refer to the raw Github CSV. The ImportData in Google Spreadsheets first and them import into Fusion Tables isn't working.

Glad to see my city in the news for a good reason!

I'm going to be visiting it for the first time this week (on business), I'm really excited. Except for the cold:)

Yeah for some reason it's still snowing and below 30 most days. About the topic though: I'm really excited to see what people come up with... I might even have a few ideas

Chicago and most cities have had their GIS data available to the public for years, so this seems like a nice progression

Two key differences here:

1. The data is now available under the MIT license. This is important because (a) it is a predictable, well-known license that allow businesses to interact with the data without fear of the unknown (license) and (b) it does not have the "you must remove our data if we ask you to" clause that their data portal has [0].

2. They're actively seeking contributions from the community. None of the existing data portal tools have a built-in way of doing this, so they went with GitHub because it's a step in the direction of taking feedback. Is the data wrong or lacking something? Add it yourself and submit a pull request.

[0] http://www.cityofchicago.org/city/en/narr/foia/data_disclaim...

Absolutely wonderful birthday present for the greatest city on planet earth.

