Once I got the data into PostGIS and aggregated into hexes (slightly harder than it sounds) it wasn't that big a deal to visualise using Tilemill and MapBox.
I haven't read the nanocubes paper yet, but it sounds much much faster than my approach using PostGIS. It took me many hours to import and process tens of millions of points in my visualisation. It'd be great to be able to turn something like this around in <10 mins.
Would this nanocubes approach work with hexes? (they look cooler IMO)
Offtopic: After all this, all I can think about—is making a Settlers of Catan MMO. ;-)
There's a related capability called "interpolation" which could be linear, bicubic etc. Looks nice too, try it.
This is really not about interpolation or mipmapping; it's about creating a data structure that allows you to quickly answer queries with which to build visualizations. For example, you might want to see a heatmap of all tweets that happened in 2013, but you might want to see only the count of a particular week, or a month of tweets from Instagram. You might also want to see all of these things above over the entire world, or over a single metro region. What nanocubes enables is answering these queries in time essentially proportional to the size of the output (we take a polylog hit, but that's kind of to be expected). We don't even touch all of the input data, which, if you ask me, is kind of nice.
I wish I had bigger public datasets to show you, but we keep this fast experience with datasets in the billions of points.
About your interpolation point: yes, we could (and do sometimes) use interpolation and smoothing. We choose here not do it here so that it looks the same when you go to the webgl-less mode and run it on an iPhone or iPad, there isn't a big visual difference.
Hm, I see, so this can output the virtual miplevels quickly? Also, would it work with a "full" dataset (i.e. no empty cells)?
> About your interpolation point: yes, we could (and do sometimes) use interpolation and smoothing.
Also consider miplevel interpolation, makes the transitions between levels much easier on the eye (doesn't work well with nearest sampling).
Yes: when you see the tiles on (say) http://www.nanocubes.net/view.html#twitter, what you're getting is not a precomputed tile: the server is actually visiting the datastructure every time (and I hope I can convince you that at that speed, it is not touching 210 million points :). The reason you can't precompute all choices is simply that there's too many of them and you'd run out of bits in the universe (the paper, http://www.nanocubes.net/assets/pdf/nanocubes_paper.pdf, has details).
> Also, would it work with a "full" dataset (i.e. no empty cells)?
Not exactly. We get away with an in-memory data cube exactly because in many practical cases the data is sparse over the address space. With dense data you'll take a much more significant memory hit (but it's one you'll have to take with any aggregation scheme. The paper, again, has details)
> Also consider miplevel interpolation, makes the transitions between levels much easier on the eye (doesn't work well with nearest sampling).
That's fair, although again we're hoping to keep the WebGL version feature-compatible with the leaflet.js version (that does canvas-only, and has no zoom transition at all); the smooth version looks nicer, as you can see in the (bit on the PR-heavy side, sorry) following video https://www.youtube.com/watch?v=8P9QA6TJwys#t=69
I also think the capability of taking slices of the data (just Saturday, just evening) pretty slick from a UI standpoint.
What's the limitation? (I admit that I did not attempt reading the paper yet)
EDIT: To reword it, this sentence gives the impression that there could more factors than just plain quantity, and I was wondering if that is the case and what that could be.
The version with online demos uses latitude/longitude for the spatial dimension, but a new version we're working on (under the master branch on github) allows arbitrary x,y addresses. With that, we've encoded IP addresses as locations in space-filling curves to play around with source IP/destination IP datasets. It's particularly nice because the hierarchical nature of the spatial addresses end up mapping to larger and larger subnets of the IPv4 space.
At the same time, I certainly believe that some sort of hierarchical, low-footprint data cube would be a great addition to postgis.