

MapD: Massive Throughput Database Queries with LLVM on GPUs - bsprings
http://devblogs.nvidia.com/parallelforall/mapd-massive-throughput-database-queries-llvm-gpus/

======
paulasmuth
Are problems that fit in memory on a single machine (96GB in this article)
considered "big data" now? This buzzword is becoming absolutely meaningless.
But yeah, "Hyper-interactive visualytics at scale.", go nvidia PR.

Also, while it's absolutely amazing that they are able to scan 240B rows/sec,
I wonder what one would use this capability for if they can only keep a few
hundred million records around at a time? The difference between taking 10 or
100ms to scan the dataset should hardly matter to a user that is running
"interactive analytics queries".

~~~
reitzensteinm
You can rent a 128gb machine at Hetzner for $130 a month.

I think we can all agree to draw a line in the sand and say if a kid doing a
paper run could afford your servers, you don't get to call it big data.

(not to take away from the linked project, which looks technically awesome)

~~~
Smerity
I disagree with the "big data" line in the sand being whether "a kid doing a
paper run could afford your servers". Using spot instances on AWS, you can
have a 100 machine cluster with 1.5TB RAM for only $3 per hour.

Working at Common Crawl, some of the coolest projects I've seen have been side
projects or weekend projects by interested volunteers. WikiReverse[1] cost
only $64USD to parse the metadata of all 3.6 billion pages (even cheaper if
you avoided EMR fees) whilst Yelp extracted 748 million US phone numbers from
2 billion pages for $10.60USD.

These days, big data (regardless of how you define the term) is within the
reach of the kid with a paper route.

[1]: [https://wikireverse.org/](https://wikireverse.org/)

[2]: [http://engineeringblog.yelp.com/2015/03/analyzing-the-web-
fo...](http://engineeringblog.yelp.com/2015/03/analyzing-the-web-for-the-
price-of-a-sandwich.html)

------
rektide
Spark SQL, with their new Catalyst optimization, is real time compiling their
queries into code (assembling Scala quasiquoted fragments into an AST,
compiling that). No ridiculously awesome GPU backend here, but some of the
same general kinds of tech- compiled queries.
[https://databricks.com/blog/2015/04/13/deep-dive-into-
spark-...](https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-
catalyst-optimizer.html)
[http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_s...](http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf)

~~~
threeseed
I use SparkSQL every single day and it is not anywhere close to this.

You could combine SparkSQL with Tachyon but it is still pretty far from what
MapD seems to be accomplishing.

Would love to see parts of Spark skip the JVM and call out to Rust or LLVM for
some aspects.

------
vardump
I wonder how much effort the CPU implementation saw. I'd expect 1 socket, 4
memory channel Haswell CPU to be able to filter 40+-10 GB/s of rows per
second, if row data can be in column major order. SIMD is pretty often memory
bandwidth limited, whether we're talking about CPUs or GPUs.

~~~
etrain
Yeah, I agree this is surprising. If all the data in their CPU implementation
were stored in a well-packed columnar format, I'd expect something close to
theoretical memory throughput on the query they describe (intermediate counts
table fits easily into L1/L2 cache and results are commutative and
associative). Cache-coherence issues in the aggregation code are a possible
cause.

If I were building a speed-freak analytics database I'd be focusing on making
my CPU implementations as fast as possible, since that's what 99% of potential
customers are already running. Assuming you get to 40GB/s on this type of
query, that's only a factor of 5 slower than the GPU implementation. I'd
imagine that for most workloads, a factor of 5 speedup that requires new
hardware and lots of energy is kind of a non-starter.

~~~
tmostak
Hi, one of the original authors here.

You're confusing rows per second with bytes per second. We're measuring here
in rows per second, with random data that takes four bytes per record. So the
group-by is showing around 24 GB/sec for CPU and roughly 1 TB/sec for the
GPUs. Admittedly we are a small startup and there are some optimizations still
to be made on CPU (we're working on being NUMA-aware, for example), but the
CPU performance is not bad and still much higher for group by than you see in
other databases.

You have to remember that we're building a full database and visualization
system and not just optimizing for a single benchmark. In addition we're
trying to make the point that you can hit this 1-2TB/sec (depending on the
query) on a single server with GPUs, which means assuming the compressed data
fits in 192GB of GPU RAM you can get much higher performance that you would
see out of a whole rack of beefy CPU servers, particularly when you take into
account the network overhead that distributed databases suffer. Furthermore,
the bandwidth of Nvidia's Pascal architecture (to be released next year)
should have at least 2X the bandwidth by using High Bandwidth Memory and
likely significantly higher memory sizes, so our speedups will only increase.

~~~
pdeva1
have you considered the cost of your system? the aws g2.8xlarge comes with
16gb vram and will cost almost $1,900 per month. the 192gb of vram you are
talking about will cost $22,000 per month!! A system with 192gb of normal ram
will cost almost an order of magnitude lower than that. How does it make sense
to run MapD in that case considering the enormous cost.

~~~
Bedon292
It doesn't make sense to run it on AWS, and not just because of cost. Which is
why it is run on bare metal, which costs approximately the same as two months
of AWS.

------
pella
similar : PG-Strom (past=CUDA ; now=OpenCL)

"GPGPU Accelerates PostgreSQL"

slide: [http://www.slideshare.net/kaigai/gpgpu-accelerates-
postgresq...](http://www.slideshare.net/kaigai/gpgpu-accelerates-postgresql) (
Dec 20, 2014 )

HN:
[https://news.ycombinator.com/item?id=8787414](https://news.ycombinator.com/item?id=8787414)

and Open Source! [https://github.com/pg-strom/devel](https://github.com/pg-
strom/devel)

------
pella
more info ( from : 2013 )

[http://data-informed.com/fast-database-emerges-from-mit-clas...](http://data-
informed.com/fast-database-emerges-from-mit-class-gpus-and-students-
invention/)

now:

Demo: [http://tweetmap.mapd.com/desktop/](http://tweetmap.mapd.com/desktop/)
Blog
[http://www.mapd.com/blog/2015/06/04/mapd/](http://www.mapd.com/blog/2015/06/04/mapd/)

------
smegel
> that maps OpenGL primitives onto SQL result sets

Here is a set of vertices, now draw me a 2d image that represents the 3d scene
those vertices represent from a certain viewpoint.

It boggles my mind that this logic can be twisted into running arbitrary SQL
logic. Does it actually do something as magic as overlying SQL set logic onto
a graphics scene...where the results of the SQL query are equivalent to the
set of vertices that are calculated to not be visible when rendering the
scene? Amazing stuff.

~~~
paulasmuth
I believe these guys are actually going straight from SQL to "GPU instruction
code" (so AIUI the SQL query execution is not some hack that is layered on top
of the programmable shader pipeline [which is what's responsible for the "here
is a list of vertices now draw me a 3d scene part" you mentioned] or something
like that but runs directly on the more or less general purpose processor in
the graphics card)

I understand the part where they "map the result set to opengl primitives" as
that they write the output/result set of the sql query into some datastructure
that allows them [after executing the query] to easily use it from opengl to
render pretty stuff. This part shouldn't be much different from other
texture/vertex data that you'd bring in e.g. from files on disk or a MySQL
database into opengl specific buffers so that you can subsequently use it for
drawing in your program.

