
Extracting city blocks from OpenStreetMap data - p8donald
https://peteris.rocks/blog/openstreetmap-city-blocks-as-geojson-polygons/
======
jofer
Always good to see geospatial analysis, especially with the toolchain
available in Python, these days!

One of the downsides of forming polygon from line segments in this way is that
it implicitly assumes all lines have a vertex at the intersection of another
line.

For example, something like this would fail ("o" represents a vertex):

    
    
        o      o  o
         \      \/
          \     /\
           \   /  \   o
            \ /    \ /
             /      /
            / \    / \
           o   \  /   \
                \/     \
                /\      o
               /  \
              o    \
                    o
    

You can handle this in a couple of ways:

1) Intersect the lines before-hand, ensuring that you wind up with a vertex at
every intersection. (In shapely's case, you'd call
`shapely.ops.cascaded_union(lines)` before applying `polygonize`.)

2) Cut a bounding polygon with the lines, rather than forming polygons from
individual line segments.

The second method has the advantage of ensuring that everything inside of the
bounding polygon is split up into smaller, block-level polygons. Usually this
is what you want, though for something like city blocks, you'd need to filter
out non-urban areas (e.g. water).

~~~
brianshaler
What about bridges and tunnels? It seems like an ideal (or idealistic)
solution is to clean up the source data to include intersections when there
are intersections so they don't have to be inferred. Is that impractical or
would it explode the size of the dataset?

~~~
jofer
Bridges and tunnels are still represented as linear features in most road
datasets. In good ones, there's an attribute to mark it as a bridge or tunnel,
and you can choose to use them or not.

Including intersections wouldn't significantly change the size of the dataset,
but that type of clean-up is best done as a pre-analysis step. There's no good
reason to do it in the "raw" data.

Furthermore, it's dependent on the projection/etc that you do it in. The
intersection of two lines in lat/long space is not at the same point as the
intersection of two lines in another lat/long space with a different datum,
and it's not the same as the intersection in a projected coordinate system.

Ideally, you'd only include the intersection if you've actually measured it at
that location. It's best to leave the observations as observations and try not
to alter them too much.

At any rate, by the time you're analyzing the data, you've already made the
type of assumptions about cartesian vs. spherical vs. real space that I'm
referring to above. Therefore, it's usually a good idea to defer cleanup
that's specific to a particular operation (this one is) until you're doing
that operation.

------
muxxa
There's also the tag place=city_block [1] which may be useful depending on its
coverage but which has the advantage of being explicitly assigned by a human.

I've always thought of city blocks as an anti-pattern in urban layout e.g.
exploring New York on foot for the first time I tried 'using my nose' to find
interesting places but had the problem that I got zero incremental information
until the next intersection at which point the large block size meant it was
too far to turn back.

I prefer to live in cities for which the concept of a 'block' has less
relevance, and I'd caution against trying to 'blockify' cities worldwide.

[1]
[http://wiki.openstreetmap.org/wiki/Key:place](http://wiki.openstreetmap.org/wiki/Key:place)

------
jokoon
I have to keep trying to open those mapbox vector tiles. Zipped protocol
buffers in a sqlite file. If you want to handle a lot of geo data, you can't
keep using text. Although I wonder if protocol buffer+sqlite is really a good
idea.

~~~
vetinari
The sqlite is only storage server-side. Otherwise you would have bazillion of
small files in your filesystem and they mostly do not handle that well.
(Speaking of experience of having 800k tiles in a single NTFS directory - that
was not pretty).

When serving the tiles over http, the only tiles themselves are served, not
the entire sqlite database.

This was actually pioneered in the era of raster tiles - pngs/jpgs in the
sqlite db. Vector tiles just "inherited" that.

~~~
jokoon
Do you know a vector format that is more adequate?

~~~
vetinari
No; however there are not many vector tile formats publicly defined. MVT is
good at what it does.

The generic vector files used in GIS (shapefiles, DGN, KML, GML etc) are not a
good fit for tiles - they describe a specific classes of features and do not
optimize for tiling (i.e. you must always parse the entire files without
regards what extents you are really interested in) - which is fine for generic
GIS usage, but not for end-user vector maps.

------
nxzero
Looking at the visualizations provide I don't see a visual proof that the code
is able to extract city blocks; by city block, I mean as defined by Wikipedia
here:
[https://en.m.wikipedia.org/wiki/City_block](https://en.m.wikipedia.org/wiki/City_block)

Proof might be an interactive map that shows the city block of a selected area
in NYC.

Is there a visual proof in the walkthrough I'm not seeing?

~~~
p8donald
Data from [https://mapzen.com/data/metro-extracts/metro/brooklyn_new-
yo...](https://mapzen.com/data/metro-extracts/metro/brooklyn_new-york/)

I followed my instructions in the post (except I had to use
encoding='iso-8859-1' instead of utf-8)

I cropped the map a little.

Generated file
[https://gist.github.com/pdonald/223e42b491eb42ba99b8e10ef1ae...](https://gist.github.com/pdonald/223e42b491eb42ba99b8e10ef1ae2072)

Interactive and clickable
[http://bl.ocks.org/anonymous/raw/d62bfd592234ed71d24bfd6f264...](http://bl.ocks.org/anonymous/raw/d62bfd592234ed71d24bfd6f264cc22b/)

You can also go to [http://geojson.io](http://geojson.io) and load the
generated file for more interactivity (select, delete, add your own polygons,
etc.) or open it in QGIS.

In the Results section I had a screenshot of the generated polygons in both
QGIS and geojson.io (polygons are separated by thick black lines and their
area is colored gray).

~~~
nxzero
Reluctant to respond, since it's clear that bit of more effort went into your
comment than mine.

Basically for me when I click the interactive map I get a response, which is a
pinned blank text-box with a x to close it. Thing is that I'm unable see that
a city block is highlighted via a click; the text box simple appears where I
click, unless it's outside the targeted area; for example, in the river.

Looking at the raw JSON it does appear the block are extracted.

So, putting aside "seeing" it, appears you're confirming it extracts single
blocks as objects and isn't doing something else like if two blocks are
attached, they're not sharing the path that's between them.

------
marcodena
Nice, could you try it with an irregular city? Like Rome?

thx

~~~
p8donald
Data taken from [https://mapzen.com/data/metro-
extracts/metro/rome_italy/](https://mapzen.com/data/metro-
extracts/metro/rome_italy/)

Generated file (cropped)
[https://gist.github.com/pdonald/dfeee5213c8c5379f7b421f9ad5f...](https://gist.github.com/pdonald/dfeee5213c8c5379f7b421f9ad5f8b3c)

~~~
marcodena
here you have the problem. You see? They are not correctly detected
[http://imgur.com/iGLT5Ea](http://imgur.com/iGLT5Ea)

~~~
p8donald
Yes, to detect them I would also have to include pedestrian paths, service
roads and other smaller roads. The solution is not perfect but good enough for
the most part.

