

Map Vectorizer – Map polygon and feature extractor - linux_devil
https://github.com/NYPL/map-vectorizer

======
apaprocki
I've always wanted the equivalent of * Maps with a time dimension. The data is
constantly changing, but imagine if you could push the slider back hundreds of
years and see how a city evolved. There is a whole "dark" dimension of data
out there that is only captured in print. (e.g. NYC business directories from
the 19th/20th century -- being able to see what a particular address _used_ to
be) Adding historical map data is the base layer for capturing this data from
print.

~~~
riordan
I'm part of the team at NYPL Labs that's been working on these historical
geospatial projects for the past few years (The Vectorizer is the work of our
own @MGA), and that's EXACTLY what we've been working toward. For almost 4
years, we've had staff and volunteers going over scans of geo-rectified
(stitching and stretching a raster image so it aligns with geospatial
coordinates) historical insurance maps of NYC meticulously extracting the
information on there. Namely we go after the outlines of buildings to capture
the amazingly detailed datapoints these maps had about every building in the
city all the way back to the first half of the 19th century.

These are nice to have for researchers, but the real purpose of collecting
this is, just as you note, to unlock the hidden historical geospatial data in
textual materials. Once we've got all those names of places, their addresses,
their lat/lon coordinates, and their timeframes of existence, we can start to
search through texts to find linkages. Old city directories (they're basically
books of ghosts) start to show you who lived and worked where [1] (and in the
process starts to get you more names you can associate with these places),
address matches in historical newspapers start to show you what happened in
these places, and the maps start to become this geospatial backbone to
traverse across tons of different datasets.

The Vectorizer is so freaking cool for so many reasons, but mostly because
it's going to let us actually get through these insurance atlases to collect
this data before we all die (one of our favorites is the 1854 William Perris
Atlas [2][3] but it took nearly 3 years to actually get through the 64,000+
buildings in Manhattan south of 42nd st) so we can start doing this kind of
querying with it. The real geniuses behind all this, our Geospatial Librarian
Matt Knutzen and the team at Topomancy, have been working on an experimental
gazetteer [4] so that we'll finally have this as a public web service for
people to hack on all these places as we collect and conflate them. Give us a
few months...

In the meantime, sign up for the Open Historical Maps project listserv [5]
that some of the OSM crew is working on (including the geniuses at Topomancy).

Also, this came out of a historical geospatial hack day [5] we threw a few
months back, which you should check out if you want to play around with some
of our data sources for this kind of work or for building something else out
of historical NYC's geospatial footprint.

[1]: [http://andrewxhill.github.io/cartodb-examples/scroll-
story/b...](http://andrewxhill.github.io/cartodb-examples/scroll-
story/basic/index.html) [2]:
[http://maps.nypl.org/warper/layers/861](http://maps.nypl.org/warper/layers/861)
Tileserver, please forgive me for linking to you [3]:
[http://aaronland.info/nypl-perris/](http://aaronland.info/nypl-perris/) YEAH
SHAPEFILES! [4]: [http://vimeopro.com/openstreetmapus/state-of-the-map-
us-2013...](http://vimeopro.com/openstreetmapus/state-of-the-map-
us-2013/video/68099833) Schuyler Earle's presentation on their version of
historical gazetteer they're building for the Library of Congress at State of
The Map US 2013 [5]: [http://www.nypl.org/blog/2013/07/12/maphack-hacking-
nycs-pas...](http://www.nypl.org/blog/2013/07/12/maphack-hacking-nycs-past-
nypl-labs-friends)

~~~
apaprocki
This sounds awesome. I've spent a lot of time digging around in the records at
30 Chambers and it is really hard to comprehend how much historical stuff is
lying around in decaying pages.

Even just something as simple as showing an OSM with all the historical
election districts / assembly districts over time for each census/election as
map layers would visually convey to someone looking at an address what would
otherwise take a decent amount of time to look up.

The NYC Dept. of Records has all the tax lot photos from the 1940 and 1980
canvas, so you could even build up a historical "street view" for the 5
boroughs. I've always found it annoying that they keep this data locked up and
charge a decent sum for each photo. It is something that a decent microfilm
scanner could make quick work of, but I don't know if they have plans to
liberate all of that image data as part of the open data efforts.

------
andrewljohnson
NYPL does a lot of great open source mapping work. They also funded an open
source raster map warper that I've forked.

[http://maps.nypl.org/warper/](http://maps.nypl.org/warper/)

~~~
chippy
I literally love that map warper! You can upload your own images at
mapwarper.net as well.

~~~
riordan
You LITERALLY built that map warper! ^^ this guy ^^ is one of the partners at
Topomancy, the crew who envisaged and built our whole historical geospatial
stack with our Geospatial Librarian. And the Map Warper/Digitizer is
absolutely amazing.

------
aidos
What a cool project. Kudos to the NYPL for releasing this.

I've been working on something similar recently using OpenCV (though haven't
done much on it yet). My use case is to find paths through electrical
schematics. I figured my problem was close to map vectorization so I searched
around for an existing library in that space but didn't have any luck.

Will be interesting to see how they've approached it and what sort of results
I could get using their library.

~~~
mga
OpenCV is used here only for the "has dot/cross" aspect of feature detection
(and it is very primitive still). The polygons themselves are more a work of R
and GDAL.

------
twelvechairs
This is a great idea but I question the process in the example - should they
not be capturing the lines and then dividing into areas rather than trying to
draw areas instantly (the areas being inside the line with random width spaces
between)? Also - a good process to capture the information will probably vary
significantly from map to map.

~~~
mga
It does vary from map to map although it is optimized for insurance maps (or
any map made mostly with clearly delineated polygons). We've included a config
file that you would modify to suit the maps you are working with. The current
process takes advantage of preexisting features in mapping tools and adds
feature detection (e.g. polygon is/isn't a building) and concave hulls.

We do welcome input and merge requests to improve the tool!

------
616c
This is pretty cool. A lot of manual work on OSM, back when I was with some
OSM nerds in DC, faced the problem of manual imports and a tool like this
would have been very, very useful.

Thank you NYPL!

~~~
riordan
OSM's been pretty hardcore about their no import policy because of accuracy
issues in validation. They want to map these places themselves and don't want
to trust others for accuracy, let alone the legal issues that come with
importing data (ugh).

We, however are far more lenient (mostly because we can't afford to build a
time machine to map the past ourselves).

On the other hand, Mike Migurski's Green Means Go [1] project is fantastic for
figuring out where batch imports into OSM will be greeted with confetti and
parades for filling out parts of the US without enough coverage to warrant
anti-import protectionism.

[1]: [http://mike.teczno.com/notes/green-means-
go.html](http://mike.teczno.com/notes/green-means-go.html)

~~~
ZeroGravitas
The OSM opinion is not quite as black'n'white as that, though they did react
badly to the TIGER import in the US (with good reasons), other areas have
imported other types of data on quite large scales e.g. Danish addresses from
government data.

[http://wiki.openstreetmap.org/wiki/Import](http://wiki.openstreetmap.org/wiki/Import)

But most of that's importing already digitised info. Relevant to this topic,
vectorising from images, there seems to be renewed work on autotracing from
satellite imagery coming to the iD editor:

[http://www.mapbox.com/blog/user-friendly-guided-feature-
extr...](http://www.mapbox.com/blog/user-friendly-guided-feature-extraction/)

edit: just noticed you linked to that elsewhere in this thread.

------
bradleysmith
Very very cool. Anybody know of similar projects?

I think I'll give it a whirl on some old 60's fenceline maps I have around.

~~~
riordan
Mapbox is working on a really slick tool for guided feature extraction from
satellite maps [1] that'll be part of the iD editor for OpenStreetMap. I'm
hoping this will provide a HUGE help in adding more buildings into OSM.

And while not mapping, John Resig's Ukiyo-e [2] project is one of the coolest
projects I've seen in the digital cultural heritage space in some time. He's
been applying image recognition to these incredible Japanese woodblock prints
from museums, galleries, dealers, universities and libraries all around the
world. Because these things are prints, there could be hundreds of prints from
the same block master all around the world, but because the expertise in the
field is so divergent, the cataloging practices are really inconsistent.
Different institutions might call artists by totally different names (or think
a print is by totally different artists). So he built a search-by-image search
engine of hundreds of thousands of Ukiyo-e that finds and reunifies prints
totally independently of their metadata. Which is cool when you find 5-10 of
the same print in places around the world. But it's cooler when it matches 2
prints with totally different artists and publishers and dates because at some
point after the first print was made, someone bought the block master, cut out
the face and replaced it with another, then did the same for the signature
[3][4].

Actually, The Vectorizer owes a big debt of gratitude to John and his brother
Mike. Mike Resig is a geographer and was the first to show us a process for
how this kind of automated identification is possible.

[1]: [http://www.mapbox.com/blog/user-friendly-guided-feature-
extr...](http://www.mapbox.com/blog/user-friendly-guided-feature-extraction)
[2]: [http://ukiyo-e.org](http://ukiyo-e.org) [3]:
[http://ukiyo-e.org/image/met/DP134583](http://ukiyo-e.org/image/met/DP134583)
[4]:
[http://ukiyo-e.org/image/mfa/sc214530](http://ukiyo-e.org/image/mfa/sc214530)

