Hacker News new | past | comments | ask | show | jobs | submit login
Nanocubes: Fast Visualization of Large Spatiotemporal Datasets (nanocubes.net)
71 points by chuhnk on Jan 14, 2014 | hide | past | favorite | 14 comments



Kind of related, I recently made a similar style of visualisation for 99designs.com - http://a.tiles.mapbox.com/v3/dhotson.d2013/page.html

Once I got the data into PostGIS and aggregated into hexes (slightly harder than it sounds) it wasn't that big a deal to visualise using Tilemill and MapBox.

I haven't read the nanocubes paper yet, but it sounds much much faster than my approach using PostGIS. It took me many hours to import and process tens of millions of points in my visualisation. It'd be great to be able to turn something like this around in <10 mins.

Would this nanocubes approach work with hexes? (they look cooler IMO)

Offtopic: After all this, all I can think about—is making a Settlers of Catan MMO. ;-)


Heh, I had a conversation a few weeks ago with Jeff Heer, coauthor of a very similar paper where we talked about how to make this stuff work with hexes. I think it can be done, but the hierarchical part is a pain, and I'll need more than a HN comment to explain :) Look me up offline if you're interested and we can chat.


Or as we in the graphics fields call it: mipmapping, geoclipmapping, megatexturing, virtual texturing or slippy maps.

There's a related capability called "interpolation" which could be linear, bicubic etc. Looks nice too, try it.


(Hey, I'm a big fan of your WebGL work!)

This is really not about interpolation or mipmapping; it's about creating a data structure that allows you to quickly answer queries with which to build visualizations. For example, you might want to see a heatmap of all tweets that happened in 2013, but you might want to see only the count of a particular week, or a month of tweets from Instagram. You might also want to see all of these things above over the entire world, or over a single metro region. What nanocubes enables is answering these queries in time essentially proportional to the size of the output (we take a polylog hit, but that's kind of to be expected). We don't even touch all of the input data, which, if you ask me, is kind of nice.

I wish I had bigger public datasets to show you, but we keep this fast experience with datasets in the billions of points.

About your interpolation point: yes, we could (and do sometimes) use interpolation and smoothing. We choose here not do it here so that it looks the same when you go to the webgl-less mode and run it on an iPhone or iPad, there isn't a big visual difference.


> it's about creating a data structure that allows you to quickly answer queries with which to build visualizations

Hm, I see, so this can output the virtual miplevels quickly? Also, would it work with a "full" dataset (i.e. no empty cells)?

> About your interpolation point: yes, we could (and do sometimes) use interpolation and smoothing.

Also consider miplevel interpolation, makes the transitions between levels much easier on the eye (doesn't work well with nearest sampling).


> Hm, I see, so this can output the virtual miplevels quickly?

Yes: when you see the tiles on (say) http://www.nanocubes.net/view.html#twitter, what you're getting is not a precomputed tile: the server is actually visiting the datastructure every time (and I hope I can convince you that at that speed, it is not touching 210 million points :). The reason you can't precompute all choices is simply that there's too many of them and you'd run out of bits in the universe (the paper, http://www.nanocubes.net/assets/pdf/nanocubes_paper.pdf, has details).

> Also, would it work with a "full" dataset (i.e. no empty cells)?

Not exactly. We get away with an in-memory data cube exactly because in many practical cases the data is sparse over the address space. With dense data you'll take a much more significant memory hit (but it's one you'll have to take with any aggregation scheme. The paper, again, has details)

> Also consider miplevel interpolation, makes the transitions between levels much easier on the eye (doesn't work well with nearest sampling).

That's fair, although again we're hoping to keep the WebGL version feature-compatible with the leaflet.js version (that does canvas-only, and has no zoom transition at all); the smooth version looks nicer, as you can see in the (bit on the PR-heavy side, sorry) following video https://www.youtube.com/watch?v=8P9QA6TJwys#t=69


I think what's novel here is the data structure, not the type of visualization.

I also think the capability of taking slices of the data (just Saturday, just evening) pretty slick from a UI standpoint.


Author here, happy to answer questions.


> and in some cases it uses sufficiently little memory that you can run a nanocube in a modern-day laptop.

What's the limitation? (I admit that I did not attempt reading the paper yet)

EDIT: To reword it, this sentence gives the impression that there could more factors than just plain quantity, and I was wondering if that is the case and what that could be.


Memory usage gets worse as the number of dimensions increases (because there's more ways in which to build summaries), and also as the number of unique keys increases (because there's more of different things to store). The main advantage nanocubes has is that it's a sparse scheme: previous work on fast data cubes for visualization (immens, for example) use a dense storage scheme, and so memory usage goes up proportionally to the size of the address space instead.


A high speed multidimensional query seems like it would be useful for visualizing and analyzing server logs to look for attacks and other trends.


You're right, it is :)

The version with online demos uses latitude/longitude for the spatial dimension, but a new version we're working on (under the master branch on github) allows arbitrary x,y addresses. With that, we've encoded IP addresses as locations in space-filling curves to play around with source IP/destination IP datasets. It's particularly nice because the hierarchical nature of the spatial addresses end up mapping to larger and larger subnets of the IPv4 space.


Would this make sense as a type of index in PostGIS?


It's open source (http://github.com/laurolins/nanocube), so one could certainly try that. With that said, it is currently an in-memory store with no disk backing, so it would be a significant amount of work to get it going.

At the same time, I certainly believe that some sort of hierarchical, low-footprint data cube would be a great addition to postgis.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: