
AresDB: Uber’s GPU-Powered Open-Source, Real-Time Analytics Engine - manojlds
https://eng.uber.com/aresdb/
======
twotwotwo
Any good resources on where GPU DBs offer significant wins? Especially, but
not only, if it wins in cost _efficiency_ for some workload.

My naïve impression working at smaller scales is that in a reasonably balanced
system, storage is often the bottleneck, and if it can deliver data fast
enough the CPU probably won't break a sweat counting or summing or so on. The
amount of interest in GPU databases, though, suggests that's not the case for
some interesting workloads.

~~~
jandrewrogers
GPU databases are brilliant for cases where the working set can live entirely
within the GPU's memory. For most applications with much larger (or more
dynamic) working sets, the PCIe bus becomes a significant performance
bottleneck. This is their traditional niche.

That said, I've heard anecdotes from people I trust that heavily optimized use
of CPU vector instructions is competitive with GPUs for database use cases.

~~~
burk96
I'm clueless on the amount of bandwidth needed for larger applications, will
the eventual release of PCIe 4 & 5 have a big impact on this? Or will it still
be too slow?

~~~
vkaku
PCIe3 x16, current generation has a bandwidth of 15.75 GB/s (×16)

Assuming you have 16 GB or RAM on the GPU, theoretically ~1 second is all it
would take to load the GPU with that amount of data. Unfortunately, when you
take into consideration huge data sets, you'll be able to saturate that with 5
M2 SSDs each running at 3200 MB/s, assuming Disk<-RAM DMA->GPU. Those would
also require at least 5 PCIe 2x8 ports on a pretty performant setup. RAM
bandwidth is assumed to be around ~40-60GB/s, so hopefully no bottlenecks
there.

This is assuming your GPU could swizzle through 16GB of data in a second. They
have a theoretical memory bandwidth of between 450-970 GB/s.

Now realistically, per vendor marketing manuals, the fastest DB I've seen
allows one to ingest at 3TB / hour ~~ 1GB / second.

So there must be more to it than the theoretically 16GB / second business.
PCIe4 x16 doubles the speed to 32GB/s but at this time it looks pointless to
me.

------
based2
[https://github.com/uber/aresdb](https://github.com/uber/aresdb)

------
samaparicio
Liked this post. Would love more detail on GPU query offloading and on how
different the engine looks compared to one that would run queries on the CPU,
as that seems to be the primary innovation

~~~
dragontamer
This article has a great overview, but it is dense and uses a lot of unique
terminlology:
[https://moderngpu.github.io/join.html](https://moderngpu.github.io/join.html)

In many ways, this paper is easier to read instead:
[http://www.csee.usf.edu/~tuy/pub/SSDBM17.pdf](http://www.csee.usf.edu/~tuy/pub/SSDBM17.pdf)

But I'm pretty sure the "moderngpu" version of merge-join is more optimal.

------
jclay
Interesting. MapD and this project both use Thrust under the hood and from
what I gather, both attempting to address the same issue. Can anyone speak to
the differences?

While I originally didn’t get the case for GPU accelerated databases, it makes
more and more sense given that the bandwidth speeds between GPU and CPU are
steadily increasing, making the GPU an increasingly attractive option since
the latency for GPU<->CPU syncs is diminishing.

~~~
tmostak
The MapD (now OmniSci) execution engine is actually built around a JIT
compiler that translates SQL to LLVM IR for most operations, but it does use
Thrust for a few things like sort. One performance advantage of using a JIT is
that the system can fuse operators (like filter, group by and aggregate, and
project) into one kernel without the overhead of calling multiple kernels (and
particularly the overhead of materializing intermediate results, versus
keeping these results in registers). Outputting LLVM IR also makes it easy for
us run on CPU as well by for the most part bypassing the need to generate GPU
specific code (for example, in CUDA).

It does make coding on the system a bit more difficult, but we found that the
performance gains were worth it.

Also I would add that this system seems, at least for now, geared toward
solving a specific set of problems Uber is grappling with around time-series
data, whereas OmniSci is trying to tackle general purpose OLAP/analytics
problems. Not that solving specific use cases is not great/valid, or that they
don't plan to expand the functionality of AresDB over time to encompass
broader use cases.

~~~
jclay
Does the LLVM IR output from the JIT compiler have to then be compiled to CUDA
PTX with nvcc? I looked a while back for a JIT compiler for CUDA, but didn't
find much.

~~~
tmostak
LLVM actually has a native PTX backend, NVPTX, but since the Nvidia ISA is
proprietary we use the CUDA driver API to generate the binary for the target
GPU arch. (see here in MapD where it generates the PTX from the LLVM IR:
[https://github.com/omnisci/mapd-
core/blob/ae88c486d1d790db54...](https://github.com/omnisci/mapd-
core/blob/ae88c486d1d790db54885605349b882203b2bb73/QueryEngine/NativeCodegen.cpp#L613),
and here for where it calls the driver API to generate the native code:
[https://github.com/omnisci/mapd-
core/blob/568e77d9c9706049ee...](https://github.com/omnisci/mapd-
core/blob/568e77d9c9706049eeccd32846387f4e042588d0/QueryEngine/NvidiaKernel.cpp#L83))

------
polskibus
I wonder if they considered adding GPU support to ClickHouse instead of
building another database.

~~~
arnon
Adding GPU support to an existing database is incredibly hard, and usually
doesn't pay off very well.

We were considering doing that some years ago, but opted for writing our own
(GPU accelerated) database too.

We talked about it recently at a CMU Database Group talk (relevant parts start
at around 17:00 -
[https://youtu.be/bSYFwd50EcU?t=1028](https://youtu.be/bSYFwd50EcU?t=1028))

------
mandeepj
They could not find anything existing in 2019 that they have to build a new
db?

~~~
sixo
I'm guessing they didn't start this project in 2019.

~~~
mandeepj
Correct, as 2019 just started. But, even if it was 2018 then that is not far.

------
alfalfasprout
I'm surprised GPUs were the right call for this use case. Not to say GPUs
aren't useful in DBs, but I/O and keeping the GPUs working hard quickly
becomes a bottleneck. I'd assume the workloads must be non-trivial in order
for this to be superior to general purpose CPUs using SIMD + in-memory data.

~~~
samstave
I have what may be a completely stupid question, but the imagination in me
just posited it to me, so I thought I'd ask:

Can you translate raytracnig to DB relations?

You have a raytracing engine, and many proven approaches to be able to
determine what is visible from any given point.

What if the "point" was, in fact, "the query" and from that point - you apply
the idea of raytracing to see what pieces of information are the relevant
fields in a space of a DB that pulls in all those fields - and the shader
translates to the "lens" or "answer" you want to have from that points
perspective?

Is this where a GPU would help? Why? Why not?

~~~
alfalfasprout
In theory, yes. Rays get bounced around from a light source until they hit a
camera (or decay below a threshold). The net intensity is recorded for all the
rays in the camera plane. Computing the net intensity is a reduction over all
the rays hitting that pixel in the camera plane. Then you have NUM_BOUNCES
many map steps to compute the position and intensity of the ray after each
bounce. So in theory these map and reduce operations could be expressed in a
database context.

In practice, does it make sense? Not really since each ray is not the same
amount of work. One ray can go straight to the camera. Another could bounce
many, many times going through many objects before hitting the camera or dying
out altogether. GPUs are terrible at handling workloads with high thread
divergence (some threads run much slower than others).

------
swittmer
A GPU accelerated database with custom API for analytics? Jedox does that
since 2007 and some of the biggest global companies use it. Jedox is not only
an in-memory DB with optional GPU acceleration, it is a complete software
suite with powerful tooling for ETL, Excel integration, Web-based spreadsheets
and dashboarding. see [https://www.jedox.com](https://www.jedox.com) and
[https://www.jedox.com/en/resources/jedox-gpu-accelerator-
bro...](https://www.jedox.com/en/resources/jedox-gpu-accelerator-brochure/)

------
kerng
Interesting post with lots of architectural details.

I like how they highlight the different existing things they tried and what
the shortcomings were which led to building this, although not sure where I
could use this to solve a problem.

------
subhajeet2107
How does it compares to ClickHouse ? Isnt creating a proprietary Query
Language going to be a problem for Adhoc quries? Why create yet another
language when the industry is standardizing on SQL or a subset of SQL

~~~
jamesblonde
ClickHouse is interesting - it's also an apache project. I am suprised Uber
didn't build AresDB on it. The stigma, unfortunately, of coming from Russia
makes it hard for the project to gain mindshare in the Valley.

~~~
youngtaff
AFAIK CloudFlare run it for their analytics platform so it'd be interesting to
see why it works for them

~~~
jgrahamc
Yes we do. We moved our entire analytics backend to ClickHouse some time ago.
It works well for us: [https://blog.cloudflare.com/http-analytics-
for-6m-requests-p...](https://blog.cloudflare.com/http-analytics-
for-6m-requests-per-second-using-clickhouse/)

~~~
bithavoc
What Clickhouse engines does CF uses?

~~~
bithavoc
I just re-read the blog post, is the Kafka engine in use at CF?

~~~
dqminh
We mostly use replicated MergeTree, SummingMergeTree, AggregatedMergeTree etc.
And yes, we do use Kafka engine to consume directly from Kafka, in additions
to standard Clickhouse inserters.

------
rhdev7150
Does Uber use Salesforce? I’m asking because I’m saying no way with all the
work their engineering team does, custom this and custom that. I know there is
a random blog on the internet that says they did in 2016 but I think the
source is wrong. They have smart people and a need for big data quickly,
throwing money at enterprise Salesforce just seems backwards to how I see them
work. I think it’s cool they push technology and would love to know what the
politics of that is like internally and how the higher ups trust the engineers
vision.

~~~
huac
uber has lots of sales and ops folks and it makes lots of sense to use
salesforce then.

~~~
rhdev7150
They show up at Dreamforce but I can’t imagine what they are tracking with it,
it seems odd to put ur drivers and users in it so that leaves some odd selling
of data they collect.

~~~
bpicolo
They have a ton of other relationships to manage. Restaurants (Uber Eats),
Uber for business, etc

~~~
rhdev7150
Interesting, thanks

------
tanilama
Any benchmarks against traditional data warehouse solutions?

~~~
reilly3000
[https://tech.marksblogg.com/benchmarks.html](https://tech.marksblogg.com/benchmarks.html)
It paints a clear picture of how performant GPU analytics can be.

------
davidhyde
> GPU technology has advanced significantly over the years, making it a
> perfect fit for real-time computation and data processing in parallel.

I wouldn’t call GPUs perfect for performing analytics functions. It can be
painful to force your data processing algorithms through shaders. What GPUs
are is cheap in terms of dollar cost per flop and power consumption.

~~~
nostrademons
Presumably they write the algorithm-to-shading munging _once_ , in the query
engine, and then clients of the database don't have to worry about it at all.
It looks like they didn't even write it, outsourcing it to Nvidia Thrust.

------
usgroup
This is cool to see. check out pgstrom for Postgres I’d you haven’t heard
about it. Especially if you’re curious about perf gains with a gpu in like for
like comparisons.

One peeve, not loving the idea of having to use AQL and not having a command
line analytical interface. Does AQL support windowing functions, CTEs, sub
queries ?

------
mbesto
> software powered by machine learning models leverages data to predict rider
> supply and driver demand; and data scientists use data to improve machine
> learning models for better forecasting.

Why does any of that require real-time analysis? What's the business need for
this being instantaneous? Feels very NIH...

~~~
nemothekid
A friend, who works for Uber, was once explaining the difference between
AirBnb’s tech stack and Uber’s even though they are both “marketplaces”

In short, Uber has a ton more infrastructure because they need to do more in
real time. Calculating the price of a trip from point A to point B needs to be
done in real time. Then there are other factors like Pool, surge, and various
rider preferences that all need to be accounted for.

Uber is sort of unique in how much of their stack nessicates real time
analytics

~~~
mbesto
I don't disagree but the use cases Uber is talking about is not app
functionality but for their analytics team (from the article):

 _\- Build dashboards to monitor our business metrics

\- Make automated decisions (such as trip pricing and fraud detection) based
on aggregated metrics that we collect

\- Make ad hoc queries to diagnose and troubleshoot business operations
issues_

None of those require real-time information, because there isn't real-time
decision-making required from the information presented.*

With the exception of the "automated decisions", then I don't see the need.

------
yingw787
Ooh nice! I work at a GPU database company (kinetica.com) and we are targeting
larger enterprises and trying to replace their legacy store and analytics
stacks. We’re releasing our 7.0 database stack for general availability
tomorrow.

------
sagichmal
Could someone with more context compare and contrast this with typical OLAP
stuff?

------
xiaodai
I wonder if someone should design a DPU for database workloads. I think the
GPU is used because it's available not because it's optimize for the type of
work people want to do with data.

~~~
Vekz
sounds like great FGPA prototype idea

------
lyxsus1
How does it compare with pg-strom ( [https://github.com/heterodb/pg-
strom](https://github.com/heterodb/pg-strom) )?

------
arnon
It's a bit of a bummer they didn't choose to use a common language like some
flavour of SQL, but opted to create another proprietary API

------
low_tech_love
Cool, another GPU application that has nothing to do with graphics; isn’t it
time we start building new hardware for these things?

~~~
johnhenry
It turns out that "graphics processing units" are good for more applications
than just processing "graphics"... perhaps it would make sense to rename them
something more inclusive?

~~~
elephantum
Like Tensor Processing Units?

------
vxxzy
When will we finally see some OpenCL (AMD) GPU database software?

~~~
yayr
[https://en.wikipedia.org/wiki/OpenCL](https://en.wikipedia.org/wiki/OpenCL)
"When releasing OpenCL version 2.2, the Khronos Group announced that OpenCL
would be merging into Vulkan in the future." I guess CUDA is a more save and
stable bet for the near term future if you need to choose between the
alternatives...

~~~
slavik81
Comments on this matter from the Khronos president:

> Khronos has talked about the convergence between OpenCL and Vulkan - a
> little clumsily as it turns out - as the message has been often
> misunderstood - our bad. To clarify:

> a. OpenCL is not going away or being absorbed into Vulkan - OpenCL Next
> development is active

> b. In parallel with a. - it is good for developer choice if Vulkan also
> grows its compute capabilities

> c. OpenCL can help b. through projects like clspv that test and push the
> bounds of what OpenCL kernels can be run effectively over a Vulkan run-time
> - and maybe inspire Vulkan compute capabilities to be expanded.

Related article:
[https://www.phoronix.com/scan.php?page=news_item&px=Vulkan-O...](https://www.phoronix.com/scan.php?page=news_item&px=Vulkan-
OpenCL-Interop-2019)

------
subprotocol
Very cool! I love geeking out on analytics tech and look forward to studying
its design further. My take as I see it so far (please correct me if I'm
wrong)-

* As a datapoint Pinot/Druid/Clickhouse can do 1B timeseries on one server. AresDB sounds like it's in the same ballpark here

* Pinot/Druid don't do cross table joins where AresDB can. My understanding is these are at (or near?) sub-second which would be a very distinguishing feature. I'm not sure how this will translate to when distributed mode is built out, as shuffling would become the bottleneck. Maybe there would be some partitioning strategy that within a partition allows arbitrary joining or something?

* Clickhouse can do cross table joins, but aren't going to be sub-second

* AresDB supports event-deduping. I think this can easily be handled by the upstream systems (samza, spark, flink, ..) in lambda

* Reliance on fact/dimension tables. \- This design/encoding is probably to help overcome transfer from memory to GPU, which in my limited experience with Thrust was always the bottleneck. \- High cardinality columns would make dimension tables grow very large and could become un-unmanageable (unless they are somehow trimmable?)

~~~
peferron
Regarding your second point: your intuition seems good, as Alipay apparently
extended Druid to performs joins this way, with good performance [1].
Unfortunately it looks like they won't finish open-sourcing it, but it at
least validates the idea.

[1] [https://github.com/apache/incubator-
druid/issues/6457](https://github.com/apache/incubator-druid/issues/6457)

------
northerdome
"However, these existing solutions did not provide an adequate way for an
engineer to get promoted. Thus, we decided building our own was necessary."

~~~
dang
"Please don't post shallow dismissals, especially of other people's work. A
good critical comment teaches us something."

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
datumdadum
Seems like a fairly benign humorous comment to me.

