Besides being a neat visualization, one remarkable thing (to me) is that this is one of the few computer vision tasks where the state-of-the-art is still the old hand-designed descriptors--specifically HOG. The attempts to re-do this work with deep learning haven't worked so well.
The closest I know is https://arxiv.org/abs/1506.06343 , which I've never even seen applied to cities. The problem is actually that deep nets have too much invariance: they can classify whether a facade is in Paris or not, but there's no easy way to separate out different kinds of facades.
If anyone has questions about this work, post below and I'll try to answer.