
Nanocubes: Fast visualization of large spatiotemporal datasets - orbifold
http://www.nanocubes.net
======
CyberDildonics
I can't figure out what is supposed to be new here.

Spatio-temporal only means 4 dimensions at most I would think.

They give a time for 4.5 Million objects (not sure what that means) as 3.5
minutes. I've already personally been able to rasterize gaussian filtered kd-
trees at a rate of 10 million in a few seconds.

Are people really paying for this? If so where can I find them?

~~~
cscheid
(Author here, sorry I didn't catch this earlier) The 3.5 minutes is the
_preprocessing_ time (think of it as index-building time).

After the index is built, the goal is to be able to generate outputs at a time
bounded by (up to a poly-log loss) the size of your screen, and _not_ of your
data.

After the index is built you get sub-millisecond answers for the 2-D
histograms over reasonably-sized screens, and that turns out to be pretty
useful for the use cases we're going for.

~~~
CyberDildonics
I was including the indexing time too, I think 10 million points in 4 seconds
on one core was what I saw.

These were single high dimensional points, I'm guessing that the data being
dealt with here is different?

The link in your post doesn't work but I did look at the paper and couldn't
really see the uniqueness. I wasn't clear on what 'data cubes' was supposed to
mean.

When you say 'screen' you are talking about ranges or bounds of area right?

~~~
cscheid
We use "data cubes", as per Jim Gray's paper,
[http://web.stanford.edu/class/cs345d-01/rl/olap.pdf](http://web.stanford.edu/class/cs345d-01/rl/olap.pdf).

When I say "screen", I mean (e.g.) your laptop's screen resolution. If you're
going to display a heatmap on a screen with p pixels, we (roughly) touch only
O(p) memory cells on query time, independently of the dataset size.

When you say "you don't see the uniqueness", I'm not sure what you're
comparing it against, so I can't say anything more. You mentioned gkd-trees
earlier: is that the comparison you mean? In that case, these are two
completely different data structures. For example, (at least as described on
the 2009 siggraph paper), you can't subset on a categorical dimension of the
pixels. In the case of the data structure we created, we can report (for
example) a heatmap of all geolocated tweets generated by an iPhone, or all
geolocated tweets generated by a windows phone, or all geolocated tweets
irrespective of device, _without having to scan the 200M tweet database_.

------
baconner
So, how is this different from an in memory olap queryable column store like
vertipaq?

~~~
cscheid
(author here, sorry I didn't see this earlier)

One of the principles we tried hard to follow is that after building the
indices, the query times should be proportional to the resolution of the plots
you're generating.

So the difference here is building the data structure that's fundamentally
well-suited to interactive visualization. So we push hard to get low latency,
and we take trade-offs that will let us generate query results well-suited to
create histograms , heatmaps, etc. These are typically multiresolution
(sometimes you want mile-wide pixels, sometimes you want yard-wide-pixels, or
day-wide pixels vs second-wide pixels) and require some index-building tricks
that we hadn't seen combined in the way we did it.

