Yeah looks like there were a few bad geocodes that I didn't catch. paulitex's idea further down about a standard format for these things would be really nice to see b/c parsing out the locations in some cases was kind of a mess.
I recently did a project with just geocoding in Ontario and it took a while to fix all the various cases for the 600+ volunteer-entered locations. Bonus points: only some (and different) geographical information was available for each address. Postal codes/zip would be ideal for this.
Anecdotally, New York feels like a vibrant, budding startup scene. It's cool to see the data corroborate that. New York had as many job postings as Palo Alto and Mountain View combined.
Comparing the city of New York with two small towns doesn't make a lot of sense. Just from a geographical point of view, comparing it against the entire San Francisco Bay Area would be far more apt comparison.
I suppose comparing just Manhattan to San Francisco proper would also work.
It would be useful to aggregate some of this based on a, say, 25 mile radius. I think separating Cambridge from Boston, for example, obscures some of the data here.
Maybe we as should design a little machine readable snippet that can optionally be added to the end of posts. I'd suggest json as it's easy to read/write and is popular. e.g
location could be an object as I have it above, or a simple [long, lat] pair, or a url to google maps such as (http://maps.google.ca/maps?q=Vancouver,+British+Columbia&...). [long, lat] is probably the simplest and most precise but it's difficult to author/discover your own location.
Beautiful data - thanks for this. I've been coaxed into making a presentation to some other students here, and I'm collecting graphical data to make it more interesting. Appreciated!
Geocoding problem?