That right brain / left lane combo strikes me as just about the fundamental quality to look for in a startup founder or team.
From what I remember, the biggest performance trick was figuring out how to properly pack worldwide street data (geometry, categorization, and label) + indexes into RAM on a single machine without using a custom file format. It involved stripping out every little unnecessary byte and sorting all of the streets geographically to improve access locality. I believe this got down to ~15GB of shapefiles.
OSM, although a great force for good in the mapping world, is not in its current state a viable competitor to Google Maps and other paid datasets.
You need to start driving around, using LIDAR, and create a massive parcel / address database to really do it right.
You need photos, from the street and from the air. For OSM to really fly we need to see open source hardware combining photography, GPS, LIDAR, that folks can strap to their cars or fly from RC planes or balloons or kites.
Geocoding needs to actually work, worldwide. That's incredibly hard. Its so much more than the visual maps.
Just pointing this all out, since everyone seems to gloss over the fact that Google Maps has a massive monopoly in this area right now.
AFAIK, the tags to use for individual address points have been agreed upon, and in some parts of the world (e.g. Germany, where the maps are essentially finished) this address point data is already useful. Is this, in fact, the case?
The US is the home base of Google (amongst other big name tech companies) but it's OSM data is amongst the worst. This is ironic as much of the open data used to map the rest of the world was provided by the US government e.g. NASA radar topography and Landsat photos.
But for the lower level data the best source (often the definitive source e.g. for administrative boundaries that aren't physically present on the ground) is government data, so the quantity and quality of data varies as you cross state (and sometimes county) lines.
Googly clearly does not pre-render tiles and it looks like it works fine for them. Request is made, data collected, relevant tiles rendered, returned to client-side. Yes, I know, Google has $billions in computing resources, but does it really take that much server-power to render tiles? (even for 1,000s of requests/second?)
Is it a matter of data transfer or processing capacity? A screen sized 2048x1536 would need to load 48 tiles at a time. Google's tiles for a city avg about 14KB/tile, so 672KB. 5,000 of these a second is 3.4GB. (I'm a front-end guy so this is a little out of my league.)
Doing all of that work quickly is possible, it just isn't simple. Also, for most people the data involved changes relatively infrequently. Why have the server render the same tiles repeatedly when you could just cache the result and reduce it to a simple file hosting problem?
Also, your 2048x1536 screen may likely be loading more than 48 tiles. It's common to request tiles around the current viewport and (less common) above/below the current zoom to ensure they're present before they're needed. To see this in action, see how fast/far you have to scroll a google map before you can "catch it" with tiles that aren't loaded.
The presentation also seems to overstate the redundancy found in the land tiles. You would get savings from water tiles at all zoom levels, which would be enormous, but (looking at http://mapbox.com/maps) even if humans only cover 1% of the land, our infrastructure is well enough distributed that it and inland water and other details they've included would preclude redundancy at all but the highest zoom levels (although, in this case, the highest zoom level taking up 3/4 of the tiles saves the most).
With that in mind, I'm wondering about the claimed rendering time of 4 days. That fits nicely with the story told, but with the 32 render servers mentioned at the end, that would seem to be 128 CPU days (though I'm not sure about the previous infrastructure they were comparing it to), which is actually close to the count mentioned early on with a super-optimized query and render process. This is all just supposition, so I don't want to sound too sure of myself, but the storage savings seems to be the big win here (60% from water + redundancy at highest zoom levels), while I would guess that you would save considerably less in processing (15% from water + minor redundancy on land (absent other optimizations e.g. run-length-based ones)).
One thing I forgot to say in my top post, though, is the presentation mentions compositing layers together on the fly, and that is one of the key tools they use, but the presentation drops that thread. I'm curious if originally it had more on that front and what they do with that.
For a real world implementation of rendering in <canvas>, see Kothic.js: http://kothic.org/js/
My version of Chrome on a high end MacBook Pro renders those tiles in ~350ms each.
Good job guys
Isn't that algorithm going to erase any islands small enough to not show up in zoom level 5?