The article is a bit of a long read but I've been looking at this topic for some...

The article is a bit of a long read but I've been looking at this topic for some time and things are definitely improving.

We build a map-based productivity app for workers. We map out the workplace and use e.g. asset tracking to visualize where things are and help people find these things and navigate around. There's a lot more to this of course, but we typically geo-reference whatever building maps we can get our hands on on top of openstreetmaps. This allows people to zoom in and out and switch between indoor and outdoor.

The hard part for us: sourcing decent building maps. There usually is some building map information available in the form of cad drawings, fire escape plans, etc. But they aren't really well suited for use as a user friendly map. Also typically getting vector graphics for this is hard. In short, we usually have to spend quite a bit of effort on designing or sourcing maps. And of course these maps aren't static. People extend buildings, move equipment and machines around, and re-purpose the spaces they have. A map of a typical factory is an empty rectangle. You can see where the walls, windows, and doors are and any supporting columns. All the interesting stuff happens in the negative spaces (the blank space between the walls).

Mapping all this is a manual process because it requires people to interpret raw spatial data in context. We build our own world model. A great analogy is text based adventure games where the only map you had was what you built in your head from querying the game. It's a surprisingly hard problem. We're used to decent quality public maps outdoors; but indoors there isn't much. Making outdoor maps is something that is quite expensive but lucrative enough that companies have been investing in that for years. Also openstreetmap has tapped into a huge community of people that manually edit things and/or integrate third party data sets (a lot of stuff is imported as well).

Recently with Google's nano banana model creating building maps got a lot easier. It has some notion of proportions and dimensions. I was able to take a smart phone photo of the fire escape plan mounted to the wall and then I let nano banana clean it up and transform it; without destroying dimensions or hallucinating new walls, doors, windows, etc. or changing the dimensions of rooms. We've also been experimenting with turning bitmaps into vector graphics which can work with promising results but this still needs work. But even just a cleaned up fire escape plan minus all the escape routes, and other map clutter is already a massive improvement for us. Fire escape plans are everywhere and are kind of the base line map we can get for pretty much any building provided they are to scale. Which at least in Germany they are (standards for this are pretty strict).

AI-based map content creation from photos, reference cad diagrams, textual descriptions, etc. is what we want to target next. Given some basic cad map and a photo in the building, can we deduce the vantage point from which the photo was taken and then identify things in the photo and put them on the map in the correct position. People are actually able to do this with enough context. That's what openstreetmap editors do when they add detail to the map. AI models so far don't quite do all of this yet. Essentially this is about creating an accurate world model and using that to populate maps with content. It's not just about things like lidar and stereo vision but about understanding what is what in a photo.

In any case, that's just one example of where I see a lot of potential for smarter models. Nano banana was the first model to not make a mess of our maps.