Hacker News new | past | comments | ask | show | jobs | submit login

Merging user contributions comes with a number of problems. It'll be interesting to know how they'll manage it. Or, they won't, and this is entirely token. For example, if the city tenders for engineering works, they will likely be required to supply the consultant with source data like road centrelines and road reserves, as-built water networks, etc. That has to be the official, verified, accurate data. User supplied just won't cut it.

I'm also dubious that GitHub is the right way to release data. There are a huge number of people interested in civic data, and GitHub is probably one of the least accessible ways for 99.9% of people to get it.

For example, a number of city and regional councils in New Zealand publish their data via Koordinates.com. Wellington City Council alone publishes over 50 key datasets:


(disclosure: I work for Koordinates)

See my response below [0], but one of the main reasons they did this with GitHub was to try and make it easier to take user contributions. They know their data is wrong and they want help fixing it from the community.

As a contributor to OpenStreetMap that spent tons of time using their building and street centerlines datasets to improve OSM, I noticed that their data was wrong in tons of places. I approached the city to ask about better collaboration with OSM and about fixing their license to let OSM use their data. After that conversation they committed to releasing data and soliciting feedback from the community.

[0] http://news.ycombinator.com/item?id=5316115

From your github profile, it appears that you work for koordinates. https://github.com/hamishcampbell

Why not disclose this in your comment?

Apologies, I should. Note that it's also in my HN profile :). I'm not expecting to drum up business via this thread, but we do know a bit about the challenges of open data delivery.

You seem to be missing the point they are trying to make. It's not so much about releasing (delivering) the data as it is about receiving user-contributed updated data in a more or less controlled way, that may allow them to curate it and update their official data. Koordinates seems great for publishing the data, but after briefly browsing its features, it does not allow users to contribute to the updating process, except perhaps via a centralized way, which seems to be what they are trying to avoid.

Github may or may not be the best tool they could have used to accomplish their objective; there are probably better tools out there and Github is barely "good enough"; however, "good enough" usually cuts it. On the other hand since their goal is to allow user-contribution, Koordinate is definitely not the right tool, at least not in an immediately obvious way.

Actually, our primary objective is to release data under an open source license so it can be used by businesses, non-profits, or open source projects, like OpenStreetMaps. Since we are releasing it on GitHub, we're going to experiment with the idea of merging changes made by users. This is an experiment for the community and government, we don't know how many pull requests will be made and the quality of those changes. We're excited to see what happens with data on GitHub to see if we can improve data quality.

Chicago has released over 400 datasets using our data portal, which is located at http://data.cityofchicago.org. The portal will remain the primary way we release data to the public since it provides a great interface, easy way to download data, and the ability to make maps and graphs. The datasets posted on GitHub have an MIT License and which we hope will be widely used by open source projects, businesses, or non-profits. GitHub also allows an on-going collaboration with editing and improving data, unlike the typical portal technology. Because it's an open source license, data can be hosted on other services and we'd also like to see applications that could facilitate easier editing of geographic data by non-technical users.

For the vast majority of non-technical (or even highly technical in some area other than software development) users, The Chicago Metropolitan Agency for Planning has a site called MetroPulse, which offers a large selection of open data browsable through a (sometimes) friendly GUI. It's more demographically and statistically focused (though there is a map view), but it's a much more generally accessible platform than GitHub.

That's good to hear, but it raises the question: why doesn't the City use GitHub to host code and tools to extract data from MetroPulse? Now you've got a situation where there are two versions of the data.. which is the most up to date or authoritative?

It is good to see public entities picking GitHub as a collaboration tool though.

Why is any website less accessible than any other website? Further being realistic anyone who doesn't know how to navigate Github (or can't figure it out), is very unlikely to ever care about this data. The people who do care are the intermediaries who put it in a more consumable form (e.g. mobile apps, etc)

That isn't true. As another commenter noted, I work for Koordinates. Our top data users, by far, are Engineering and planning consultancies, architects and universities.

The first step to releasing data is releasing the data. 3rd party applications are nice to have, but they second-guess the use case. A developer can grab some data and use it to build an app, regardless of the source. But github is a blocker for the vast majority of the audience that is interested in ad hoc analysis, for whatever reason.

OTOH, GitHub does have an API and automate-able mechanisms for submitting pull requests. An app could definitely be built simply for displaying relevant (say, nearby location of mobile device) information and allowing the user to correct any mistakes they see and pushing that as a request for fixing it.

The first step to releasing data is releasing the data.

And they released the data. I still am not understanding what the problem is with github. You don't HAVE to code to grab the data.

I agree. GitHub gives developer access, but at the same time the source can be browsed, linked to, or downloaded as a zip. Too easy.

I am very suprised that you think that "anyone who doesn't know how to navigate Github(or can't figure it out)" (interesting use of brackets there) wouldn't want this data.

Why would knowledge of development, and git, have anything at all to do with if someone wanted this kind of data? I can't imagine why you would draw a connection.

Why would knowledge of development, and git, have anything at all to do with if someone wanted this kind of data? I can't imagine why you would draw a connection.

It is raw data. It isn't a brochure or a printable map or anything else directly consumable. I am making an eminently obvious statement, so why is this surprising to you?

Secondly, what is so funny about the parenthesis? What is your point?

I sense that you're trying hard to be snotty in some way, as if you've done your part against elitism, but the end result is rather bizarre.

You are right, I was just irritable, and was projecting onto you a "anyone who knows anything knows git" mentality, which in retrospect you didn't deserve, sorry.

I still think that I know quite a few people who love raw data, love excel, and still wouldn't want to learn git or github, but I was reading that into your message perhaps unfairly.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact