Hacker News new | comments | ask | show | jobs | submit login
Microsoft Releases 125M Building Footprints in the US as Open Data (bing.com)
405 points by seshagiric 7 months ago | hide | past | web | favorite | 72 comments

Microsoft has allowed the OSM community to trace from Bing aerial imagery for almost 10 years now. That's been a massive benefit to OSM, way more than this dump of buildings. They should be congratulated and thanked for that contribution.

Now this is the sort of way of handling competition I can deal with. Find the data they have locked up and that you don't compete on, but that you need to produce and collect anyway and then release it for anyone to use.

Why do you assume anyone using this data is not thankful?

Because consumers of the data may not know.

I'm sure consumers doesn't know my input there too but I don't care much. I mean...what do you expect? A banner? Money?

Those who entered the data from their browser and directly used the bing data do know and I'm sure they appreciate it.

There is no need for that assumption. This is just positive reinforcement. Maybe we see more of a good thing.

I don't understand why this is needed. Here in Finland the government just gave this info out for free to anyone (CC 4.0). They have all the information anyway because they have need for it due to planning/building permits, defense, etc.

The data is very accurate (for example I can zoom into my parents cottage and see the out house on it). Later on they used laser scanning from planes to scan the whole country for very accurate topological map too.

The moment this information was released for free both OSM and Google Maps quality jumped a notch or two. Before this google maps only had roads but now it has small foot paths etc too.


There are Finland-sized entities within the USA that also do this. For example Los Angeles County, twice as large as Finland.

When you are pondering the question of why all USA is not as integrated as Finland, ask yourself also why Finland is not tightly integrated with Croatia. It is the same reason.

> For example Los Angeles County, twice as large as Finland.

Uhh. You're off by two orders of magnitude. Los Angeles county is 4751 sq mi [1]. Finland is 130,666 sq mi [2].

1: https://en.m.wikipedia.org/wiki/Los_Angeles_County,_Californ...

2: https://en.m.wikipedia.org/wiki/Finland

He or she meant by population I would assume. Approximately 5 vs. 10 million.

Population isn't very relevant when we're speaking specifically of things that relate to land area like topological maps.

But we aren't speaking of land area, we're speaking of buildings, which relate to population. I'd wager that LA County has more buildings than Finland.

But we are talking about building footprints which DO directly correlate to population, not overall area of an entity. So it's the most relevant factor.

Not really

Arizona and Croatia have the same population, but look at the aerial maps...

Right. If one interpretation is wildly wrong and there is another blindingly obvious interpretation that is correct and relevant... Why would we ever give someone the benefit of the doubt?

Or refer to the community guideline:

Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

Which can be found here: https://news.ycombinator.com/newsguidelines.html

Exactly. I would assume that the higher the population, the more buildings present. Just because you have a lot of land doesn't mean you have a lot of buildings.

Most of the building are not residential

The USA is about 30x Finland's size by landarea and building data is primarily stored at the county or equivalent level of government (3000 of those). Imagine munging data from 3000 different counties, each with their own standards and systems of recording.

Where does this size argument arrive from because it comes up consistently on USA v elsewhere arguments? 95th percentile densities are no more than europe. The population of USA is more than 30x Finland and we are talking about laser scanning (by EPA or equivalent)

laser scanning (by EPA or equivalent)

Why would the EPA laser scan buildings?

The EPA handles clean air and water.

The only entities that have any business laser scanning any building are the county tax assessor and the insurance companies.

Does the US federal government not have a department that handles land survey?

United States Geological Survey (USGS), Department of the Interior. https://en.wikipedia.org/wiki/United_States_Geological_Surve...

Does the US federal government not have a department that handles land survey?

Land surveying, yes - The USGS.

Building surveying, no. It's the "private" part of "private property."

Unless someone's committed a crime being investigated by a federal agency, the federal government has no right to know what's on my property. And even then there are limits. The feds can look down from a satellite, but so can anyone else. But the feds can't come onto my property and laser measure my house, as was the scenario proposed by the parent commenter.

As I stated earlier, external-only building surveys can be done by the local tax collection agency for the purpose of assessing taxes in places where taxes are assessed based on improvements to the land.

And your insurance company may also want to make similar measurements, but those are usually done only very rarely these days. They were more common in the past when structure fires were more of a problem. (Think Sanborn Maps† and The Great Chicago Fire.)


The United States is comprised of literally thousands of sovereign, independent governments. There are even independent limits on what rules these governments can impose upon each other. It is not a "subordinate" type of situation.

That's why the size argument comes into play so often. It's not really a landmass or even population argument as much as is it a complexity argument.

People like to say that, but in practice it's not true and hasn't been for over a century.

People like to say that, but in practice it's not true and hasn't been for over a century.

It depends on the topic. When it comes to things like interstate commerce, you are correct -- the states don't each have treaties with each other.

But when it comes to other things, like education, that's very much a local function. The feds set minimum best practices (enforced by the threat of removing funding), but it's up to the tens of thousands of local school districts to decide what to teach their children.

Some of those local school districts are huge, on the order of 10's of thousands of students. Others can be as small as 15 or 20 students, or even one individual school.

When it comes to the topic at hand -- GIS information -- It is pretty much a county situation. There are numerous competing GIS standards and products, and each county or municipality chooses the software that works for its needs, and budget.

So, yes, there is creeping federalism in the United States. But claiming "it's not true and hasn't been for over a century" shows a lack of understanding of local civics.

Assuming that to be true, I would expect there to be localities within the US where laws differ so vastly that they resemble different countries when juxtaposed. Yet, I’ve traveled enough of the US that this would surprise me. (It would be very interesting to experience those two, though!)

I think the US shows itself to be pretty homogeneous, with some differences between the rural, suburban, and urban areas, but not much.

Go to Las Vegas and then a dry county in Kentucky and note what you're allowed to buy and what you're allowed to do on the streets.

I can smoke weed legally for recreational purposes where I am here. Where I grew up I'd be thrown in jail.

Texas has no zoning laws of any kind. You can build any kind of building for any purpose anywhere (I'm sure there's still restrictions, but there's no zones per se.)

States have wildly different speed limits on highways. Different levels of allowed alcohol.

This is just off the top of my head without googling, I'm sure there's probably even bigger examples as well.

Take a look at the US Census Bureau's methodology page for the Building Permits Survey.[1] They statistically impute the numbers from a subset of 20,100 building permit issuing authorities, because they cannot collect all of the distinct authorities in the country with different data collection and storage policies.

Federal databases of postal addresses don't really have any reason to maintain data on the structures.

Some fire fighting authorities have building layouts for recently built buildings. They may all have their own methods for storing the data, and the coverage is confusing enough that the bodies and budget authorities responsible for fighting any long lasting wild fire frequently change.

Properties lines are usually recorded and maintained at the county level, and enforced by a county court and sheriff at their direction, but this too is not the case everywhere.

It is not outlandish to think that there are multiple federal databases that include all of the data on buildings in the US, whether at the Department of Defense, or at multiple agencies within the Department of Homeland Security. However, it seems common that only data collected as a side effect of the regular course of doing government business are released to the public, but data sets created as part of some form of security-related goals are not released to the public.

[1] https://www.census.gov/construction/bps/how_the_data_are_col...

I sadly agree that federalism is de facto dead as a framework of policy and ideology. But as a legal and bureaucratic infrastructure, it has quite refused to die, for better or for worse. City, county, and state governments still do their thing, and anybody dealing with property, construction, zoning, etc in any capacity will still need to negotiate through each of those layers separately.

I have to disagree. For sure, there is a lot stuff that is centralized at the federal level, but there is still a great deal of data managed at the county municipal level and there is a wide variety in how this data is managed (or not managed) and what format it is in. In the town where I live, they looked up my birth record (in order to provide me with an official birth certificate) in a large bound book. Most permits for construction are managed (at least in part) by the town, who knows where that data is stored. Hopefully most of it is submitted up to the state level.

Yep. As an example, higher level governments will attach stipulations to grants or other types of funding in order to “encourage” lower governments to do as they’d like them to. Our tax system is such that this almost always works.

> it's not true and hasn't been for over a century

We just had a Supreme Court ruling affirming states’ anti-commandeering rights [1].

[1] http://www.scotusblog.com/2018/05/opinion-analysis-justices-...

I know building permits do seem to work like this in the US.

There is probably a separate db (if its even a db) for each county. It is very much a data cleaning nightmare from my limited experience.

Way more granular than at the county level. Building permits in my village at at the village level. Dozens of villages in my county.

It should be trivial. It should be standardized.

It is absolutely bizarre that we need to rely upon commercial companies to map things like streets and buildings when this is very accurately tracked by government. In an ideal world OSM data grids could be delegated to the appropriate governments, where it could have perfect precision with changes, street closures, one ways, redirects, etc.

It should be trivial. It should be standardized.

Your use of the word "trivial" indicates you've never worked on complex GIS systems.

Whether it should be standardized is a matter of debate. Currently, each county or state or municipality uses the software and standard that suits its needs, and more importantly -- it's budget.

The GIS needs of New York City are not the same as the GIS needs of Saint George, Utah. Saint George doesn't need skyscraper functions, and New York doesn't want to measure airports on top of mesas.

The real world is messy, and so is pretty much every single GIS deployment. The real world doesn't digitize well.

Likewise, New York probably doesn't have the nuclear contamination that St. George, Utah does[0]!

[0] <https://en.wikipedia.org/wiki/St._George%2C_Utah#Nuclear_con...

"Whether it should be standardized is a matter of debate."

In the same way that vaccines are `debatable'. To export an accurate, up to date model of roads (and road data) and optionally building layouts should absolutely be the norm.

Nor does a standardized export format necessitate using a single uniform GIS solution, of which I've had to inter-operate with a number.

The real world digitizes spectacularly well, a lot of people just make poor choices and make excuses (or worse, buy nonsensical excuses) for not helping in getting there.

The real world digitizes spectacularly well

No, it doesn't. If it did, then the surveying industry would be out of business. Yet every time a building is permitted in the United States, surveys are done.

Here's an example for a recent project surveyed in Chicago just this year:

West Roosevelt Road; South Clark Street; a line beginning at a point 116 feet north of vacated West 16″‘ Street as measured along the west line of South Clark Street that is westerly 135.20 feet along the arc of a circle having a radius of 375.00 feet concave northerly and whose chord bears north 79 degrees 49 minutes 52 seconds west a distance of 135.20 feet; a line north 69 degrees 46 minutes 04 seconds west a distance of 101.85 feet; a line north 69 degrees 49 minutes 57 seconds west a distance of 26.00 feet; a line along the arc of a circle having a radius of 407.80 feet concave southerly and whose chord bears north 75 degrees 52 minutes 04 seconds west a distance of 85.51 feet a distance of westerly 85.67 feet; a line north 83 degrees 47 minutes 05 seconds west a distance of 164.45 feet; a line north 69 degrees 43 minutes 24 seconds west a distance of 25.16 feet; a line north 43 degrees 07 minutes 24 seconds west a distance of 31.91 feet to a point on the easterly dock line of the former South Branch of the Chicago River; a line south 46 degrees 47 minutes 47 seconds west along the easterly dock line of the former South Branch of the Chicago River a distance of 73.33 feet; a line south 89 degrees 54 minutes 55 seconds west a distance of 32.69 feet; a line south 49 degrees 36 minutes 35 seconds a distance of 46.38 feet; a line north 89 degrees 54 minutes 55 seconds east a distance of 296.25 feet; a line easterly along the arc of a circle having a radius of 375.00 feet concave southerly and whose chord bears south 78 degrees 32 minutes 39 seconds east a distance of 109.97 feet for a distance of 110.36 feet; a line south 69 degrees 46 minutes 04 seconds east a distance o f 136.90 feet; a line easterly along the arc of a circle having a radius of 391.00 feet concave northerly and whose chord bears south 79 degrees 33 minutes 50 seconds east a distance of 135.64 feet for a distance of 136.33 feet; South Clark Street; vacated West 16″‘ Street; a line 155.40 feet west of and parallel to South Clark Street; the north line of vacated West 16″” Street; and the South Branch of the Chicago River

Digitize that.

Source: https://www.chicagoarchitecture.org/2018/05/29/everything-th...

You have this somewhat backwards.

Cities have loads of data that define property lines. They could keep it on paper, but more and more they keep it in computers now. Surveyors take that digital data and convert it into its precise real world marks. This has literally nothing to do with whether the real world digitizes well, and if really stretched only proves that it certainly does because that's the entire basis of property grants.

Further, pointing out a massive civic project in the middle of a large city as some sort of counterpoint, when it is in reality still not that complex at all, doesn't make your case.

There's a huge difference between tracking for building permits and opening an API to the world.

As an example -- in Seattle, sewer lines are still denoted with "sewer cards", digital images of pencil drawings on hundred-year-old paper. It is sufficient for sewer records, but utterly insufficient for OSM.

In Washington State, nearly all of the counties provide daily or weekly data exports for lot lines, streets and more. OSM regularly imports this data, but if you go much outside Seattle and are using Google Maps, prepare for a rubbish experience, as Google hasn't bothered to source the data directly from the counties (thus there are many missing and incorrectly drawn roads in the San Juans) and they're relying on a 3rd party vendor that is running a few years behind on pulling county data.

To put this in perspective, in Kitsap county alone they've renumbered nearly 300k lots over the past few years. Google Maps doesn't get those new addresses or road names until its trickled down through USPS (who they're required to notify) then eventually to their data vendor, leaving you with a spotty patchwork of places that Google users have updated to the correct addresses.

Google had a good strategy with sourcing user contributions, but pulling the public domain data that is offered freely for download and is literally the canonical source isn't hard, a small team can manage it for OSM, why is Google paying a 3rd party vendor that gives them trash instead of the latest data?

This explains why I’ve had such a miserable experience navigating the peninsula with Google turn by turn directions. Thank you for the context.

This is sometimes accurately tracked by some government, but there are thousands of governments in this country. Getting thousands of organizations to a standard is pretty hard when they don't get a concrete benefit from it. At the same time, the federal government has no business knowing the shape of my buildings, thank you very much.

> It should be standardized

No thank you. New York City doesn’t need a federal bureaucracy to be checked into every time a building goes up or a pipeline gets laid. And the level of detail which is necessary for New York would be overkill for rural counties; they have better things to spend their tax dollars on.

A building going up or a pipeline getting laid necessitates an enormous amount of government activities. The idea of having a simple standardized output would be just too onerous is absolute nonsense.

Further, saying that the detail necessary for NYC is overkill for a rural county is just bunk. Yes, of course NYC would have more detail. Do you think a rural county is going to be overloaded plotting their dozen streets?

The whole "better things with tax dollars" bit is really the cherry on the top.

> * The idea of having a simple standardized output would be just too onerous is absolute nonsense*

This complicated thing is already so complicated, it couldn’t possibly hurt to add even more complexity!

Turning OSM into, essentially a view of local governments would remove the wiki aspect, where anyone can edit it.

Additionally, there is enough problems getting the global OSM community to come up with one tagging scheme. If every country made their own stuff, it would not be one map.

Of course it would remove the wiki aspect where a better, canonical source is available. For the same reason that my local library doesn't use a wiki library catalogue. That doesn't prevent the wiki aspect for non-canonical data. I hardly see tagging standardization as a big road block.

> primarily stored at the county or equivalent level

That's assuming there are any records at all. In states like Idaho and Montana, once you get out of the bigger cities I very much doubt there are permits on file for most dwellings. Partly because few people care, and partly because the land owners resent any sort of "big government" telling them what they can and cannot do.

The US government doesn't necessarily hold data with this level of detail. If they did it would be (in theory) in the public domain anyway. Remember the difference in size between the US and European countries.

As for why other governments even have to "release" such information, it's historical and surprisingly hard to change. The UK government holds possibly the most detailed map in the world covering every inch of Britain. But after 2000 when GIS based products started to take off people discovered that none of this data was available to use. OSM was created specifically to give the UK some open geographical data because there simply wasn't any and campaigns to the government were taking too long (the government data still isn't fully open).

When the danish goverment did it some years back, they used the data to recreate the whole of denmark in minecraft [0].


Local government rules much of the US which makes national data sets difficult to assemble. There are excellent topographic resources though:


Take the size and complexity and federation into account... In Switzerland we have map.geo.admin.ch with a highly accurate representation of the levels and buildings. I love it

The maps are an electronic version of the one we used in the military. They look gorgeous, retro and with a clever amount of details.

Having done a lot of research into this for a project in the US, the issue is not that they aren’t available but that each county distributed their GIS data in a different format. Some have online portals with the GIs data, some have only paper records, and some counties are missing information all together. There are companies here that work to consolidate that information but it hey charge exorbitant fees to access the data.

There are numerous "open data" licences, many of which are totally incompatible with OSM. CC-BY 4.0 in this case isn't straightforwardly importable into OSM (you need to get them to sign a waiver). So that's one reason to generate our own data, rather than using "open data" from a government.

That's great for Finland! That is a real asset to your country.

The rest of the world doesn't necessarily work the same way Finland does, unfortunately.

>>The CNTK toolkit developed by Microsoft is open source and available on GitHub as well. The ResNet3 model is open source and available on GitHub. Our Bing Maps computer vision team will be presenting this work at the annual International State of the Map conference in Milan, Italy


I wonder if I ever heard about Microsoft Maps before. What is the state of it? Who uses it?

So I tried www.bing.com/maps. The first thing that surprises me is that it tries to set 5 cookies from google.com and one from www.google.com

What is the reason for Microsoft to allow Google to track everybody who uses their maps?

> What is the state of it? Who uses it?

In the Netherlands the businesses (shops, restaurants, and general businesses) are more complete than OpenStreetMap, but not as complete as Google because everyone sees Google as the de facto standard and they push owners to add their info in the de facto Google search.

Bing Maps is fast and responsive (unlike Google Maps on anything other than their own, proprietary platforms such as Google Chrome, the Google Maps app, or Google Earth), and has aerial imagery and traffic info (unlike OSM).

Bonus game: turn off labels on satellite view, zoom out to world view, and try to find your house or other places. Surprisingly hard!

I visited it's website, but didn't notice any google trackers. I'm using Privacy Badger btw. Maps app is also automatically installed on every windows 10. I think they partnered heavily with here maps in the past.

When I read the part about downloading the dataset from Github, my first reaction was "Can't Microsoft afford to host downloads themselves?", and then I realized they own Github now...

Seriously, the two aren’t related. Microsoft has had public projects on Github for a couple of years now with more important projects being made public regularly.

Downvoted because the cynicism and not paying attention.

Out of curiosity, I wonder if this data dump includes "sensitive" buildings? eg things that would generally be blurred/missing/etc on OSM for various reasons.

If it is visible in the imagery they analyzed it wouldn't be intentionally left out of OSM without some superseding information that it didn't exist anymore.

Makes sense, thanks. :)

If anyone is interested in that type of data, our startup (https://tensorflight.com) extracts information like footprints and 10 other property factors for insurance (including bing imagery). More info: https://tensorflight.com/catalogue#Objects-supported-by-Tens... .

Hey ! I'll look into it with a lot of interest. Your catalog link is not very obvious in your navigation (only in text corpus). You could make it more visible in the orange bottom navbar? Cheers

This is pretty great! This data is usually locked up in individual scraper-unfriendly county websites, sometimes behind a paywall, sometimes only by person, or even only by mail. And the data is often limited to parcel geometry, not building footprints.

The other option is to use 3rd party services to purchase or lease the data. This data can cost hundreds of thousands of dollars for the entire US dataset.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact