

Rendering the world - dmarinoc
http://mapbox.com/blog/rendering-the-world/

======
rbranson
I have built one of these map rendering systems on EC2. We decided to go with
a 100% on-the-fly + two caching layers (one HTTP on top and one application-
specific in between [TileCache]). There were dozens of layers (now there are
thousands) so pre-rendering all of them was not feasible. Since I left that
gig, the current team has added some processes to "warm" the caches before new
data goes live. It just takes a few minutes though, nowhere near 2% of the
tilespace.

From what I remember, the biggest performance trick was figuring out how to
properly pack worldwide street data (geometry, categorization, and label) +
indexes into RAM on a single machine without using a custom file format. It
involved stripping out every little unnecessary byte and sorting all of the
streets geographically to improve access locality. I believe this got down to
~15GB of shapefiles.

------
pkh80
This comment will probably get buried and no-one cares, but...

OSM, although a great force for good in the mapping world, is not in its
current state a viable competitor to Google Maps and other paid datasets.

You need to start driving around, using LIDAR, and create a massive parcel /
address database to really do it right.

You need photos, from the street and from the air. For OSM to really fly we
need to see open source hardware combining photography, GPS, LIDAR, that folks
can strap to their cars or fly from RC planes or balloons or kites.

Geocoding needs to actually work, worldwide. That's incredibly hard. Its so
much more than the visual maps.

Just pointing this all out, since everyone seems to gloss over the fact that
Google Maps has a massive monopoly in this area right now.

~~~
kiwidrew
So in other words, OSM has done a pretty good job of mapping the _streets_ of
the world, and now it's time to shift the focus to recording the individual
_address points_ along those streets?

AFAIK, the tags to use for individual address points have been agreed upon,
and in some parts of the world (e.g. Germany, where the maps are essentially
finished) this address point data is already useful. Is this, in fact, the
case?

~~~
ZeroGravitas
As with everything in OSM, it depends on which country you are talking about.
For example, I believe in (at least some areas of) France the outline of every
building has been imported from a high quality government source with address
data yet the roads still have to be drawn in from satellite photos or GPS
traces.

The US is the home base of Google (amongst other big name tech companies) but
it's OSM data is amongst the worst. This is ironic as much of the open data
used to map the rest of the world was provided by the US government e.g. NASA
radar topography and Landsat photos.

But for the lower level data the best source (often the definitive source e.g.
for administrative boundaries that aren't physically present on the ground) is
government data, so the quantity and quality of data varies as you cross state
(and sometimes county) lines.

------
siavosh
Interesting stuff. This is essentially a coding/compression problem. My
advisor at UCLA helped pioneer the computer vision analoge for this. His work
tackled encoding textures (or areas of little information) with markov random
fields and areas of high information (edges, features) with graphs.

<http://www.stat.ucla.edu/~sczhu/publication.html>

------
maxprogram
Not sure if I understand why real-time tile rendering on servers doesn't work.

Googly clearly does not pre-render tiles and it looks like it works fine for
them. Request is made, data collected, relevant tiles rendered, returned to
client-side. Yes, I know, Google has $billions in computing resources, but
does it really take that much server-power to render tiles? (even for 1,000s
of requests/second?)

Is it a matter of data transfer or processing capacity? A screen sized
2048x1536 would need to load 48 tiles at a time. Google's tiles for a city avg
about 14KB/tile, so 672KB. 5,000 of these a second is 3.4GB. (I'm a front-end
guy so this is a little out of my league.)

~~~
awj
The problem, generally, is that rendering the tiles is computationally
expensive. You have to wade through each feature (thing that _could_ be
represented in the tile) and decide if it intersects the tile, then decide
if/how it will be rendered, then render and composite it with every other
visible feature.

Doing all of that work quickly is possible, it just isn't simple. Also, for
most people the data involved changes relatively infrequently. Why have the
server render the same tiles repeatedly when you could just cache the result
and reduce it to a simple file hosting problem?

Also, your 2048x1536 screen may likely be loading more than 48 tiles. It's
common to request tiles around the current viewport and (less common)
above/below the current zoom to ensure they're present before they're needed.
To see this in action, see how fast/far you have to scroll a google map before
you can "catch it" with tiles that aren't loaded.

------
magicalist
Hmm, I'm curious about how much this is overstating the effectiveness of the
optimizations in order to teach about them. With this approach, (it seems
like) you would still have to render the highest zoom level first, which
already takes 3/4 the render time anyway. There are lots of other
optimizations you can do (and they probably are doing) there, but they aren't
related to the tree/reference-based ones mentioned here.

The presentation also seems to overstate the redundancy found in the land
tiles. You would get savings from water tiles at all zoom levels, which would
be enormous, but (looking at <http://mapbox.com/maps>) even if humans only
cover 1% of the land, our infrastructure is well enough distributed that it
and inland water and other details they've included would preclude redundancy
at all but the highest zoom levels (although, in this case, the highest zoom
level taking up 3/4 of the tiles _saves_ the most).

With that in mind, I'm wondering about the claimed rendering time of 4 days.
That fits nicely with the story told, but with the 32 render servers mentioned
at the end, that would seem to be 128 CPU days (though I'm not sure about the
previous infrastructure they were comparing it to), which is actually close to
the count mentioned early on with a super-optimized query and render process.
This is all just supposition, so I don't want to sound too sure of myself, but
the storage savings seems to be the big win here (60% from water + redundancy
at highest zoom levels), while I would guess that you would save considerably
less in processing (15% from water + minor redundancy on land (absent other
optimizations e.g. run-length-based ones)).

~~~
Retric
Try zooming in on Russia, Alaska, Brazil, or Egypt and you find a lot of empty
tiles. Still, if you can render the whole thing in 4 days I suspect caching
what people actually look for would be good enough as people probably hit max
zoom on manhattan 1000x as much as they do on some random village in the
amazon. The advantage being you can just invalidate tiles as you get new
information vs trying to render the whole thing every time you get new street
data for Ohio.

~~~
magicalist
True, and countries that are empty except for random river squiggles can save
a good bit of storage (at the highest zooms so those river squiggles are
isolated), but, again, if you have to start with the lowest level first, the
described approach isn't saving a whole lot in processing time, even with big
empty space.

One thing I forgot to say in my top post, though, is the presentation mentions
compositing layers together on the fly, and that is one of the key tools they
use, but the presentation drops that thread. I'm curious if originally it had
more on that front and what they do with that.

------
mahyarm
Would it be cheaper to just transmit the 'vector' map data for a region and
render it client side vs rendering it after a certain zoom level?

~~~
dclowd9901
My guess is, even client side, vector rendering (I assume you're talking SVG
or something similar) is rather CPU intensive, and definitely doesn't do well
on mobile devices. In a crowded area like NYC, you'd get slowed to a halt.
Mobile would likely be unusable. So maybe it's a user experience trade-off.

~~~
ldng
I was under the impression that Google Mpas for android was vector rendered
since honeycomb. Am I wrong or/and are they somehow cheating in some way ?

~~~
mseebach
They do. And it's slow. Although, to be fair, I don't know if it's for that
reason.

------
aresant
I find myself in awe of complex technical information distilled into such a
beautiful, simple style such as evident in this presentation.

That right brain / left lane combo strikes me as just about the fundamental
quality to look for in a startup founder or team.

Nice work.

------
kelvin0
The ideas presented are very interesting, it would be nice if a talk/article
about this would be available?

Good job guys

------
mcmire
A straightforward, understandable presentation to solve a really interesting
problem. I like it.

------
joshuaheard
The author writes, "I found myself wishing Word had a simple, built-in button
for '"cut it out and never again do that thing you just did'". Then I look up
at the Clippy image and it has a check box with "Don't show me this tip
again". Hmmmm.

------
ricardobeat
That makes Satellite View in Google Maps look like a much more amazing feat.

Isn't that algorithm going to erase any islands small enough to not show up in
zoom level 5?

------
ww520
Very cool presentation. I wonder where they get the source data to render the
map tiles.

~~~
llimllib
From openstreetmap <http://www.openstreetmap.org/> .

------
ljegou
Interesting. Still nothing about the ridiculous Mercator projection.

