A question: as a non-american, in view of America's current levels of paranoia, does this really have much potential beyond, say, bike-racks and street-lines?
I'd love to know, for example, which areas are served by fibre, which have a high numbers of wireless communication towers, which are serviced by new (as opposed to ancient) utilities? I can imagine some bureaucrat deeming many game-changing datasets as "security risks".
If that's true, then what's left to publish?
I'd also like to know, what commitment is there to keeping datasets updated? My guess: GitHub makes this much easier. For example, how long before hundreds of privately owned bike racks get added? How long before pathways get crowd-sourced into the data?
We use the org to share some of our official city apps. These are usually simple web apps built with tools like Bootstrap and jQuery. We'll be open sourcing more of these going forward.
Not only are we interested in sharing the code for these apps, we're actively encouraging people to fork, improve and send pull requests.
Pittsburgh has a lot of good data too, it's just not on github: http://pittsburghpa.gov/dcp/gis/
I'm also dubious that GitHub is the right way to release data. There are a huge number of people interested in civic data, and GitHub is probably one of the least accessible ways for 99.9% of people to get it.
For example, a number of city and regional councils in New Zealand publish their data via Koordinates.com. Wellington City Council alone publishes over 50 key datasets:
(disclosure: I work for Koordinates)
As a contributor to OpenStreetMap that spent tons of time using their building and street centerlines datasets to improve OSM, I noticed that their data was wrong in tons of places. I approached the city to ask about better collaboration with OSM and about fixing their license to let OSM use their data. After that conversation they committed to releasing data and soliciting feedback from the community.
Why not disclose this in your comment?
Github may or may not be the best tool they could have used to accomplish their objective; there are probably better tools out there and Github is barely "good enough"; however, "good enough" usually cuts it. On the other hand since their goal is to allow user-contribution, Koordinate is definitely not the right tool, at least not in an immediately obvious way.
It is good to see public entities picking GitHub as a collaboration tool though.
The first step to releasing data is releasing the data. 3rd party applications are nice to have, but they second-guess the use case. A developer can grab some data and use it to build an app, regardless of the source. But github is a blocker for the vast majority of the audience that is interested in ad hoc analysis, for whatever reason.
And they released the data. I still am not understanding what the problem is with github. You don't HAVE to code to grab the data.
Why would knowledge of development, and git, have anything at all to do with if someone wanted this kind of data? I can't imagine why you would draw a connection.
It is raw data. It isn't a brochure or a printable map or anything else directly consumable. I am making an eminently obvious statement, so why is this surprising to you?
Secondly, what is so funny about the parenthesis? What is your point?
I sense that you're trying hard to be snotty in some way, as if you've done your part against elitism, but the end result is rather bizarre.
I still think that I know quite a few people who love raw data, love excel, and still wouldn't want to learn git or github, but I was reading that into your message perhaps unfairly.
You're right, though: GitHub is horrible for large blobs of data like this. At the time I didn't know how big the data releases would end up being.
Tom and I have plans to talk more about future data releases and how they might be made with a more appropriate tool.
The best tool would offer
* Transparancy (who, specifically, is behind it)
* Time stamping
* Linkability, with über-stable URIs
* Public issue tracking.
* Documentation, including the possible crowd-sourcing thereof.
Edit: In this case we have identifiable and passionate individuals behind the initiative. This is far from faceless and cursory, as most data-dumps are. What's not to love here?
Error 310 (net::ERR_TOO_MANY_REDIRECTS): There were too many redirects.
When I was searching for suggestions I didn't see any way for people to submit changes to the released data back to the organization releasing the data. Getting community feedback is what they cared about the most so I suggested GitHub.
I don't believe it is possible to have Fusion Tables refer to the raw Github CSV. The ImportData in Google Spreadsheets first and them import into Fusion Tables isn't working.
1. The data is now available under the MIT license. This is important because (a) it is a predictable, well-known license that allow businesses to interact with the data without fear of the unknown (license) and (b) it does not have the "you must remove our data if we ask you to" clause that their data portal has .
2. They're actively seeking contributions from the community. None of the existing data portal tools have a built-in way of doing this, so they went with GitHub because it's a step in the direction of taking feedback. Is the data wrong or lacking something? Add it yourself and submit a pull request.