
Visualizing Large Datasets on the GPU with Vega and MapD - tmostak
https://www.mapd.com/blog/2017/07/22/vega-makes-visualizing-big-data-easy/
======
coherentpony
> MapD uses Vega to drive the rendering engine directly on the result set of a
> SQL query without ever requiring the data to leave the GPU

How big is the dataset? If it can't ever leave the GPU then it is at most a
few GB? Unless there are several GPUs at play then it's N * (a few GB). If
there are a few GPUs at play then this dataset would fit into DDR3 RAM on a
single mainstream Xeon node, or entirely into MCDRAM on a Xeon Phi node.

Please correct me if I'm wrong.

~~~
tmostak
MapD customers typically run our product on multiple servers with multiple
GPUs per node. So 4 servers with 8 Nvidia P40s each has 4X192GB = 768GB of
VRAM. Note MapD compresses data and also keeps data in CPU RAM as needed. Even
two servers with these GPUs or 4 servers with gamer GPUs is enough to query
and visualize an 11B record shipping dataset without a hitch
([https://www.mapd.com/demos/ships](https://www.mapd.com/demos/ships)), a demo
running on four servers with 8 Nvidia 1080 Tis each.

Other customers with smaller datasets (i.e. less than a few hundred million
records) are able to run with a single GPU.

We're not going after petabyte size datasets (where <100ms querying is rarely
important), so ability to scale has rarely been an issue.

~~~
tmostak
Note that although we can run on CPU, CPUs do not have the graphics pipeline
and the memory bandwidth necessary for interactive server-side visualizations
like this.

~~~
coherentpony
Cool, thanks for the explanation.

------
greyskull
At first, I thought it was referring to AMD's new Vega GPU family. I was
hoping they found a particularly good use case for it.

~~~
microcolonel
Yeah, I feel tricked frankly, given the proximate mention of GPUs.

~~~
tmostak
Sorry, as the OP that wasn't the intention (the Vega rendering API has been
around for some time and predates our porting it to GPUs).

~~~
microcolonel
Fair enough, name collisions are becoming very hard to avoid.

------
jarmitage
@tmostak large dataset visualisation like this looks great, but one of the
most appealing parts of Vega for me is interaction. It's just as easy with
Vega to create composite interactions for filtering and navigating data as it
is to visualise it. Is there any scope for this type of architecture to
support more than just serving rendered PNGs? (Can it do that at 60fps? :P)

~~~
tmostak
We've considered video rendering using the H264 encoders on the GPU, another
approach might be to create a format that has all info needed for interactions
and that could be deciphered on the frontend (i.e in WebGL). So it's something
we're definitely looking at.

We already do provide some support for hover interactions in our charting
library on top of Vega, but it would be nicer to do this in Vega.

Right now customers still get a lot of value from server-side rendering simply
bc we can render visualizations of billions of records interactively and in
real time, which is difficult or impossible to achieve in other platforms
without some sort of pre-computation.

~~~
jarmitage
Thanks for your reply, and yes I can imagine your tools are already very
useful. Are there any examples of real-time online? It would be interesting to
see what your SoTA is!

~~~
tmostak
We do support real-time ingest (helped by the fact that we do not need to
index on insert). The only example we have online of that now is our Tweetmap
demo
([https://www.mapd.com/demos/tweetmap/](https://www.mapd.com/demos/tweetmap/))

------
edejong
A lot of these visualisations would benefit greatly by using 2d or 3d kernel
density estimates instead of a simple scatter plot. See for example:
[https://youtu.be/Xz_7Ej6JsMY](https://youtu.be/Xz_7Ej6JsMY)

~~~
tmostak
We can already do simple histograms (heatmaps). More complicated features such
as Gaussian weighting and hexagonal bins are coming.

------
chenster
Can I use it on a MacBook Pro? My concern is that it doesn't have a dedicated
GPU.

~~~
shusson
you can use MapD on your MacBook pro in CPU mode, but it will not be able to
do any vega rendering because that requires a GPU.

