Hacker News new | past | comments | ask | show | jobs | submit login
AresDB: Uber's GPU-Powered Open Source, Real-Time Analytics Engine (ubere.ng)
191 points by Recovery2020 16 days ago | hide | past | favorite | 83 comments

When I worked at Uber, this project was openly mocked. One of the CTO’s biggest failures was implementing a promotion scheme where you needed to create a new service in order to be considered “innovative”. This promotion scheme marked what I consider the end of Uber’s engineering excellence and the start of what made Uber turn into a bureaucratic mess.

One of the VP’s of engineering called it “toil vs talent”. People who “toiled” at work, meaning doing good maintenance work, would be rewarded with good bonuses but those with “talent” would be rewarded with promotions. Of course this drove people to come up with fake new services so that they could demonstrate “talent”. This also lead to an explosion of new services that overlapped or did nothing useful. Instead of working together, groups would make new services instead of working with existing service-owners because they needed to justify writing a new service. It was sickeningly transparent.

This project was one of those projects. It has no real use case because why the fuck would we want to use GPUs except to look cool on your resume. The sad thing is that the projects is overstating how well it’s being used internally. Internally people use Pinot instead of this.

For all you future CTOs, consider your incentive schemes carefully and don’t be so far removed from the action that you can’t see when your org is rotting. This is what the CTO did, and like I said, it was one of his biggest failures because it gutted the engineering org. Instead of working together, every team was looking at get promotions at the expense of the company and it showed.

I wish I had understood this earlier. My past company made a push to hire from top companies like Uber for some key positions. Some of them were great people who were relieved to be out of the FAANG rat race. Others were single-mindedly focused on rewriting everything they touched with cleverly-branded project names, regardless of whether or not it made business sense.

Early I on, I made a harmless comment in Slack about how one person’s pet project wasn’t a good fit for our needs so our team would be using the older, more proven solution. Later that evening the person pulled me aside, almost in tears, begging me to never say anything critical about his project in a Slack channel again. He explained that at his previous role, success or failure depends entirely on the perception of one’s personal projects and that seemingly innocent comments could tank someone’s promotion chances for years.

I felt bad for him because he had clearly come out of a toxic situation. However, one of his teammates later warned me that he was keeping a journal of potentially incriminating things that I had said in Slack and a detailed log of every issue that he could find with our team’s project in case he “had to use it against me later”.

I could never tell if this was a unique experience or the norm at some companies like Uber.

What you describe was definitely not a common occurrence. Engineering wasn’t toxic for many years until the last 9-12 months or so I would say. Pre-Susan Fowler memo, it was the best company I had ever worked at. From 2017-2019 we sort of stalled because of the internal drama and it didn’t get really bad until the last 9 months, where attrition of our best engineers and vile political maneuvering from the dregs made it too much for me to stick around.

The engineer you describe sounds like they have mental health issues. There may be some teams with terrible managers but all companies have this, and I’ve seen similar or worse situations at Amazon.

Most engineers I worked with were great but there were many engineers that “played the game” in order to get a promotion and more money. It was sickening but if that’s the way the CTO sets the incentive scheme, who can blame an engineer for following it? It’s more on the CTO for setting the terrible culture than the engineers.

Being blind to politics rarely ever means politics isn't going on.

Pretty much every company out there has a concept of 'promotion packet', its basically building a case for one's promotion. Of course in a company the budgets are fixed, and so are promotion cycles(yearly in most places). You miss out a turn, you could lose an year, or even risk losing two. In that case its fairly common for managers to build a list of accomplishments(file/packet), and rival managers to build a anti-case/defence for the same. Stack ranking eventually is all about a combination of merit+advocacy+lobbying+counter-lobbying at so many levels that I'd say the engineer who cried wasn't wrong at all.

This is the case in nearly every company. We just wish to delude ourselves that politics is absent at some places.

This sort of power play comes with the territory in a large people structure.

One thing that has been crystal clear in the past few years is that just because someone has the competence for high-level work like software engineering doesn't mean they can't be immature and crazy at the same time.

And these people are louder and more common now because it's easy to hide criticism and claim accomplishments with all the politics, buzzwords and general sensitivity these days.

> begging me to never say anything critical about his project in a Slack channel again

Because he discourages critical feedback - it leads to 2 major problems:

1) He does not improve [because of lack of feedback].

2) He discourages team around him from improving [because of lack of feedback].

Why not explain that to him (and then if he does not understand - fire him)?

> one of his teammates later warned me that he was keeping a journal of potentially incriminating things that I had said in Slack and a detailed log of every issue that he could find with our team’s project in case he “had to use it against me later”.

That is not cool.

What happened to this person?

His strategies worked at first, at least before he had really delivered anything of value. It started to unravel later when he couldn’t deliver on all (or really any) of his promises.

I think he was operating under the assumption that he could build rapport quickly and then hire a team underneath him to get the work done. That might work at a hyper growth startup that values growth over profit, but we were a mature and profitable company looking to keep headcount reasonable. Most of his plans were so over engineered that they would have take 5-10x the engineers to actually finish on time, so they ended up being half-finished projects that requires constant on-call attention.

That was the tipping point for a lot of us in the old guard. There was an exodus around that time, including myself. It’s not worth fighting those political battles day in and day out while walking on eggshells in every Slack channel.

Sounds like the engineering higher ups turned a blind eye to his antics.

Good on you for exiting.

I can concur. Most of Uber's traffic for real-time analytics was served by Uber's Elasticsearch clusters with customized querying layer. My team was also a user of the ES service. The service was sometimes unstable, but in general was fast enough for us, and scalability was never a problem. My team got paged only once from the ES team because we drove a huge spike to the team, and they ended up scanning more than 500 million records per second, for which they had to scramble to scale out. But even at that time, there was no visible degradation on our side, let alone outage. The owning team also presented their pain points, and none of the pains had to do with CPU not being fast enough.

By the way, Uber already had two real-time analytics systems before AresDB. One is the aforementioned ES-based service, and the other is Pinot, which was owned by a Pinot contributor. I was in one of those so-called alignment meetings about using AresDB. Engineers from both the ES service and Pinot were there. It was a disaster. The engineers simply asked what AresDB was supposed to solve, and presented charts over charts to show that computation or lack of join operator was never a problem (because for analytics, data streams can be pre-joined), while efficient IO was. The AresDB team simply repeated that join was important, and parallel computation was critical.

I left the company soon after, so I'm not sure how many critical use cases AresDB has been serving since then. Hopefully they do find some sweet spot to justify the cost of developing such system.

Not a big fan of AresDB because of the glaring IO issues. That being said, pushing 'lack of join operator was never a problem' is textbook Sapir–Whorf. 'We've trained our captive users to never ask a question that requires a join, or use convoluted workarounds if they absolutely must do so, and we are proud of it' just leaves a bad taste. I wouldn't be surprised if Pinot were to add join support in the near future...

True that join is nice. I was just not sure if real-time join is truly necessary for Uber’s use cases, or AresDB is so much better than a query engine like Presto to justify its effort. Or more fundamentally, is GPU the solution to expensive joins?

Love the shoutout to Gairos. But man that on-call was a pain in the arse.

The culture at Uber is weird. In my opinion, there was way too much emphasis on titles. An employee's title is literally in your face in almost every internal site. This creates a lot of situations of bias because every time you get a message on slack, one of the first things you see is their pay grade.

In combination with what you listed above, and the rigid, narrow pay bands, it's no wonder everyone is fishing for constant promotions.

Wow. This is such a good comment. I remember when I was a junior engineer and knew more than many of the "senior" engineers at a job a few years back. It didn't matter at all, as seniority outranked sensibility.

I was thinking about this a bit after reading your comment. Now, I say this as having recently negotiated pretty hard for a "Staff" title, and previously had a "Team Lead" title. I think in certain situations it makes sense to have the title authority to shoot obviously bad engineering problems down, but this has been the exception, rather than the rule, in most of my 12 years of being a software engineer.

So your comment makes me wonder: would it make sense to have a system where everyone would simply be called "Engineer", but allow engineers to vote secretly after having worked with another engineer, on various aspects of their colleague's technical expertise. Engineering Managers or perhaps HR would be aware of the engineering votes, but engineers wouldn't, which would remove much of the implicit bias in engineering meetings. Rather, pay grades would be determined by votes, but no one would ever just "Leave it to Yakaaccount, she's the principal engineer". Everyone would have responsibility to be a solid engineer.

What you describe reminds me of the peer review system in place at a previous company I worked at. The review period was every 6 months. The way it worked was that as an individual you'd rank (and comment on) everyone you had worked with the previous 6 months based on your perception of how much value they added to the company over those previous 6 months. It could be based on some project they delivered, or them simply always answering your questions about something you needed to know to move your own projects forward, or them participating thoughtfully in company wide discussions. Once everyone had submitted their rankings, an algo would figure out everyone's final rank. The various pay level cutoffs would then be decided by eng management.

It so happens that at that company everyone's title was simply 'Software Engineer'. There was no ego bs so it was a great place to work. I think how helpful you were to others being part of the review is another reason for that. There was a way to see some people's 'rank' by looking at the org chart if you really wanted to. In general swe and senior swe would be under an eng manager, if someone was directly under a director that told you something, and if someone was directly under a vp that also told you something.

I think its a good idea to remove implicit bias where ever you can. But doing that by making a structure that does exist be invisible is dangerous. That hierarchy does exist. Even if it's not in your face. It's human nature and it's a good thing in some situations. I think the better angle on this is to have an open culture where the discussion can happen and everyone listens and asks the questions without fear, consensus is built and those with power (because there is always someone with power) wield it sparingly. This is much harder to get to, but it's more sustainable.

If members of the team had trusted each other enough to trust the votes that related to pay grade, you’d probably don’t need the votes already

> So your comment makes me wonder: would it make sense to have a system where everyone would simply be called "Engineer", but allow engineers to vote secretly after having worked with another engineer, on various aspects of their colleague's technical expertise.

I worked at a place that had a system like this. Started great, but turned toxic after a hiring spree.

> Everyone would have responsibility to be a solid engineer.

You would think so, but people with other intentions try to find ways to manipulate the system to achieve various strange goals. They're often successful.

Post-Fowler, HR became very concerned about income inequality. They implemented this by severely compressing the range of compensation at each level. For a year or two, there was no way to meaningfully reward a high performer in her level other than to promote her.

Since mid-2019 or so, the company realized this mistake and the pendulum swung back towards TK’s more Ayn Randian compensation philosophy. But the damage was done. FWIW I never saw much of a power dynamic around title. But it is definitely the most important task of an engineering manager to secure promotion-worthy projects and hand them out intelligently. Since promotion is based on impact, it’s almost entirely determined by the project charter. Difficulty or skill deployed in execution is a tertiary concern.

Also given the precipitous drop in equity value, promotions with hefty raises are required just to keep people’s TC in moderate decline instead of freefall.

I've been wondering why the UI/UX is broken in so many ways that would be easy to fix while the engineers are busy engineering for the sake of engineering. This project, the time series database, etc. Meanwhile, the Android app can't do basic drawing of a car and a moving line on a map, for example. Your explanation of the dynamics explains the state of things quite well!

I whish this kind of shop talk was part of more sci-fi world's world building. "No-one liked the P-class light patrol missile gunships, they were only commissioned because the space engineering devision was incentivised to deliver new projects..." on second thoughts zzzz

> because why the fuck would we want to use GPUs except to look cool on your resume.

Well, there are reasons, but it's true that GPUs in DBMSes is still something that's very immature.

> Internally people use Pinot instead of this.

Can you elaborate a bit regarding the choice of Pinot over other analytic DBMSes?

This reminds of what Eddie Lampert did to Sears. He changed the company's internal structure such that internal teams has to compete against each other for bonuses and relevancy. There was a great write up that I cant find about his strategy and why it spectacularly failed.

Management theory now tells you to be competitive over small things. If you're doing competitiveness, do it over lunch or something else.

Honestly mind blowing to me that someone could get to the CTO level of any organization - never mind something of the scale/pay grade of Uber - and think that incentive structure was going to yield anything desirable.

Hey, at least you didn't have to use TChannel and Hyperbahn ;)

oh my god.

Where did you land after Uber?

> Like Pinot, Elasticsearch is a JVM-based database, and as such, does not support joins

Uh. What does the JVM have to do with the data model’s ability to do handle joins?

Yeah that's a very odd statement, I mean, PrestoDB, Impala, KSQL...

GPUs and analytic DBMSes / query engines are actually my own field of research (https://eyalroz.github.io); and it's obviously beyond what a comment would encompass, but:

1. There are very few analytic DBMSes which are actually fast (and compare against reasonable baselines). Most claims of speed are bogus. Or rather, might be better than what's otherwise available to use, but are still slow.

2. Designing an analytic DBMS to properly utilize a massively-parallel processing device is a monumental task, and I would claim that it has not yet been undertaken. Existing research and production systems graft such use onto a system whose fundamental design dates back to the 1980s in many ways.

3. CPU-utilizing anallytic DBMSes are typically faster than GPU-based ones, to a great extent due to the above - but also since we've had decades of work on optimizing them.

4. GPUs are artificially handicapped on Intel-architecture systems, because they are placed "far" from main memory relative to the CPU. More literally - the bandwidth you get t between your GPU and main memory is typically 0.25x the bandwidth a CPU socket has with main memory. This is critical for analytic query processing (as opposed to neural network simulation which is more computation-heavy and can tolerate this handicap much better).


PS - Always glad to discuss this further with whoever is interested.

As someone interested in esoteric databases, I’d be curious to hear which ones you’re aware of that in your opinion don’t live up to claims or do. For instance, I’ve been very impressed with Clickhouse and have messed around with Jd (from Jsoftware ). Any other good ones I should check out?

How much would the trade offs change if GPUs shared the same main memory as CPU?

Not sure I understand exactly which trade-off you're referring to, but on systems without the GPU-handicapping (e.g. IBM Power; and also when you put link up many GPUs together with NVLink) - there is still a significant design and implementation challenge to produce a full-fledged analytic DBMS, competitive vis-a-vis the state-of-the-art CPU-based systems.

There are also other considerations such as: The desire to combine analytics and transactions; performance-per-Watt rather than per-processor; performance-per-cubic-meter; existing deployed cluster hardware; vendor lock-in risk; etc.

Here’s also a big misconception. When a CPU runs it has access to the entire memory (or more specifically, a process can chew up almost all available memory if allowed). The cost of copying from RAM to L1|2|3 cache is lot lower than GPU.

GPU on the other hand is slightly complex. It behaves like lots of small CPUs with their own local memory. They can access the full swath of memory but in parts and copying between the two is much more expensive. If the problem can be boiled down to map on GPU and reduce, then GPU excels. If the problem is serial or can be parallelized with SIMD instructions, CPU will run circles around GPU.

SIMD has come pretty far.

What are the best case scenarios (roughly) for GPU-based speedups of typical DB workloads? I have no sense of whether we're aiming for 2x, 10x, 1.1x.

If a GPU has K_1 times the integer or floating-point operations per second of a CPU, and K_2 the memory bandwidth to its on-GPU memory than a CPU to overall system memory, then the absolute best improvement you can reasonably hope for is somewhere between K_1 and K_2. For a typical CPU and typical corresponding GPU (note the highly exact terminology here...) K_1 and K_2 are give-or-take, 10x or so.

Now, it might be a lot less than 10x if your use of the GPU is suboptimal in any way, which means you don't really have a lot of leeway for non-optimality in your design. Plus, there's the handicap I mentioned in main-memory bandwidth - while the GPU's on-board memory is much smaller.

On the other hand - whoever said that the system you're measuring against is making optimal use of CPU resources? It may very well not, in which case the computation and memory throughput ratios are not upper bounds at all.

Just remember that if someone tells you "I got a 50x speedup by porting this to a GPU!" - then more likely than not, their baseline was a massively sub-optimal system. Which is not to say their work is without merit: Improving the performance of a real-life system does make actual work go faster, today, rather than pursuing a dream of future optimality.


How do you see your general observations applying to AresDB?

I have not undetaken a serious analysis of AresDB; and on their part, Uber has not - to my knowledge - published any in-depth design description or benchmark results. I definitely see some key useful concepts mentioned in that blog post, but there are others which I don't really see presented (or that are associated with what's described as future work).

I realize it's FOSS, but - I can't just go read their sources.

My intuition says that they're probably doing a decent-but-not-optimal job for their own use-case and are not planning on developing it into something more general. Specifically, the fact that they only accept their own query language is fishy.

I also note that the public repository on GitHub has not been updated for about half a year.


Caveat: Ares might be brilliantly designed and implemented, I can't really fault them for anything with any certainty. Just speculating here.

Curious - what’s the use case for an organization like Uber needing real time analytics at high frame rates? I noticed the emphasis on dashboards but was curious what a real-time dashboard at this scale actually ends up being used for.

Maybe my question is more around, what business decision would be impacted by not having real-time instantly reserved dashboards.

Honest question here not trolling.

GPUs are highly specialized CPUs, especially good at some types of number crunching. In analytics workloads, they are used to run queries that can benefit from that specialization. This is not for your typical GPU graphics based use (games/graphics). Has nothing to do with what you see on the screen/frame rate.

All sorts of departments at Uber use real-time queries (operations, marketplace, eats, new mobility, their data science and growth group, finance, communications, legal.) Marketplace in particular has a demand for real-time prediction, matching and dispatching, and pricing queries.

I understand why any marketplace-based system would need real time data but why would any of "growth group, finance, communications, legal" require real-time data to do their jobs?

It must have been fun to be at Uber 2017-2019. They seemed to have an unlimited appetite (and funding) for “invent it here”, and a lot of those projects made it into open source.

I'm sure it was fun, but it does seem to point at a lack of focus. I suppose that's hard to resist when there's an endless pipeline of money.

And now I would be scared to even accept an Uber offer if given because even Microsoft of all people now have a better track record of not laying people off :(

There was a pervasive problem of “not invented here” at the company for a long time. I used to laugh at all the “It doesn’t scale!” justifications that were thrown around.

On the other hand, having enough people or money to throw at something were never a problem, so…

I worked at Uber before. The team and the project is pretty much gutted after last couple layoffs. Check the commit history/contributors and go figure.

It was some amazing tech, but it falls into the category of "when all you have a hammer, everything looks like nail". sometimes you really need a company culture to reward people for creating values instead of deliverable for promotion

Does anybody with Clickhouse experience at scale know if AresDB is better on some use cases?

It's hard to say, though I think the UPSERT capability looks useful because it simplifies handling duplicates. On the other hand it does not appear that Ares offers clustering, which is critical for large datasets.

(I work on ClickHouse and enjoyed this article when it came out.)

It should be mentioned, to ClickHouse' credit, that they made an effort to publish relatively detailed benchmark results for a some data sets and queries, when they first came out. They even got in contact with my research group at the time (the MonetDB group at CWI) to make an effort to present the MonetDB results in a fair manner.

Thank you for the comment. I will take a look then.

What databases are comparable to ClickHouse say for Ease of single node deployment Super fast basic analytics, filter/group/count without a lot of optimization Fabulous compression

SQL Server, KDB+, MemSQL, Kinetica, etc.

> GPU databases are brilliant for cases where the working set can live entirely within the GPU's memory. For most applications with much larger (or more dynamic) working sets, the PCIe bus becomes a significant performance bottleneck. This is their traditional niche.

> That said, I've heard anecdotes from people I trust that heavily optimized use of CPU vector instructions is competitive with GPUs for database use cases.

This comment is important imo. Also related to applied ML inference in applications... the memory needs can grow quite a bit and this data transfer cost, including the memory size limitations vs RAM, becomes very real very fast.

Not sure I understand the scale of the use case or where it's mentioned as well as in comparison to the big data tools mentioned.

Disclosure: I work on Google Cloud (but you don’t need to rent GPUs from us).

Absolutely, but as last year’s discussion highlights, a bunch of GPUs connected via NVLINK kind of gives you the aggregate memory of the set for some of these database applications (large-scale ML training has also gone this way).

That’s why our A100 system design is 16 A100s in a single host. 16x40 GB gives you 640 GB of aggregate memory, which is pretty attractive for many applications.

The question as always is cost vs benefit. If there’s something that a GPU backed < noun > can do that you “couldn’t” with a large Intel/AMD cpu box, or is actually a large integer multiple cheaper, it’s probably worth the development effort.

Interesting. Much of my work here is on production GCP workloads. We've landed on C2s and cpu optimizations for our inference engine but hadn't really considered NVLINK. Now wondering if I can distribute batch inference across multi-gpus.

We'll be on premium support soon ... hoping we can get access to folks like yourself for some of this.

I should clarify: A100s are likely a bad price/performance trade off for inference. The T4 part is designed for that, but doesn’t have NVLINK.

Do you have models >16 GB that you’re trying to do real-time inference against?

Feel free to send me an email regardless! (In my profile)

Edit: https://news.ycombinator.com/item?id=23800049 was my writeup for a recent Ask HN about cost efficient inference.

A100s support MIG (multi instance GPUs), so you can carve each A100 up into 7 separate GPUs for inference of small models. If your inference workload is too small to feed this beast all at once, it can be pretty handy: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/inde...

Yeah the sweet spots is landing as:

* CPUs: medium data, and queries that are small / slow / irregular

* GPUs: general analytics over data that is small (in-memory) or large / streaming data (replace Spark):

Data perspective:

-- small data (100MB - 512GB): all in GPU memory, so question if boring queries ("select username from django_table" better in psql) or compute ones ("select price where ..." better in GPU SQL)

-- medium data: data sits in CPU RAM / SSD with compute nodes, and in a preorganized / static fashion, e.g., time series DB. too much data for GPU RAM, yet enough for for local SSD, so PCI bus is the bottleneck (8-32 GB/s)

-- large data (ex: 10TB spread through S3 buckets) + streaming (ex: 10GB/s netflow): you'll be waiting on network bandwidth anyways, so network link of 10GB/s -> PCI of 10GB/s -> GPU wins out over CPU equiv anyways. Good chance, instead of the pricey multi-GPU V100/A100s, you'll want a fleet of wimpy T4 GPUs.

As network/disk<>GPU high-bandwidth hw rolls out and libs automate their use, the current medium data sweet spot of CPU analytics systems goes away.

Compute perspective:

-- The category of 'irregular' (non-vectorizable) compute has been steadily shrinking for the last ~30 years as it's an important + fun topic for CS people. Even CPU systems now try to generally optimize for bulk-fetches (cacheline, ...) & SIMD-compute (e.g., SIMD over columns), and that inherently can only go so far until it's effectively a GPU alg on worse hw.

I see other areas in practice like crazy-RAM CPU boxes and FPGA/ASIC systems that I'm intentionally skipping as these end up pretty tailored, while my breakdown above is increasingly common for 'commodity HPC'.

I don't think the "large data" case holds true, and I would not expect it to be economical to use GPUs for that.

First, this is essentially limiting the scope of "analytics" to selection/aggregation centric operations which are memory bandwidth bound. Many types of high-value analytic workloads and data models don't look like that. Even when 90% of your workload is optimal for GPUs, I've often seen the pattern that the last 10% is poor enough that it largely offsets the benefit.

Also, GPUs have better memory bandwidth than CPUs but people overlook that CPUs can use their limited memory bandwidth more efficiently for the same abstract workload, so the performance gap is smaller than memory-bandwidth numbers alone would suggest.

Second, 10TB is tiny; this is around the top-end of what we consider "small data" at most companies where I work. For example, in the very broad domain of sensor and spatiotemporal analytics, we tend to use 10 petabytes as the point where data becomes "large" currently, and data sets this size are ubiquitous. This data is stored with the compute when at all possible for obvious reasons -- it ends up looking more like your "medium" case in practice, albeit across a small-ish number of machines. The cost of processing tens of petabytes of data on GPUs would be prohibitive.

Lastly, a growing percentage of analytics at every scale is operational real-time, so new data needs to be integrated into the analytical data model approximately instantly. GPUs are not good at this type of architecture.

GPUs have their use cases but their Achille's Heel is that their performance sweet spot is too narrow for many (most?) real-world analytic workloads, and for some workload patterns the performance can be much worse than CPUs. CPUs provide much more consistent and predictable performance across diverse and changing workload requirements, which is a valuable property even it has worse performance for some workloads. Databases give considerable priority to minimizing performance variability because users do.

- RE:First: selection/aggregation-centric-only -- see my comment on irregularity. If there is little compute, CPU vs GPU is moot (go tiny ARM..), but as soon as there is, the scope of vectorizable compute is ever-increasing. In my world, we do a lot of graph + ML + string, which are all now GPU-friendly. They were all iffy when I first started with GPUs. Feel free to throw up examples you're skeptical of. The list is shrinking, it's pretty wild..

- RE:Gap, sort of. Ultimately, it's still typically there though, in three important ways. The set of interesting compute that needs that CPU arch there is increasingly small relative to workloads ("super-speculative thread execution on highly branchy..."). Multi-core CPU vs. single GPU is more 2-10X for most tuned code: most 100X claims are apples/oranges b/c of that. When you get beyond those workloads, 100X becomes real again for multi-GPU / multi-node b/c of the bandwidth. Yeah, your real-time font library might still win out on CPU SIMD, but you have to dig for stuff like that, while the more data/compute, the more this stuff matters & gets easier.

- RE:scale, storing 10PB in CPU RAM is also expensive, so we're back to streaming... and thus back to where GPUs increasingly win. Even if you could afford that in CPU RAM, you can probably afford making that accessible to the GPUs too, and then save not just on the hw, but the power (which becomes the dominant cost.) Your example of large-scale & real-time spatiotemporal data seems very much leaning towards GPU, all the way from ETL to analytics to ML. It's still hard to write that GPU code as the frameworks are all nascent, so I wouldn't fault anyone for doing CPU on production systems here for another few years.

-- RE:real-time: writing is on the wall, mostly around (again) getting the unnecessary CPU bandwidth bottleneck out of the way in HW, and (harder), the efforts to use that in SW.

A critical aspect being ignored is the economics. Highly optimized analytical database code saturates bandwidth on a surprisingly cheap CPU (usually lower-mid range). While a GPU may be 2-4x faster for some operations, it usually pencils out to be at least as expensive operationally, never mind CapEx, for the same workload performance as just using CPUs. This has been a reliable pattern. In which case, why wouldn't you just use CPUs? When you build systems at scale, these cost models are a routine part of the design specs because the bills are steep.

No one stores 10PB in RAM that I know of. A good CPU database kernel will run out of PCIe lanes driving large gangs of NVMe devices at theoretical without much effort. The performance for most workloads is indistinguishable from in-memory, but at a fraction of the cost. It would be slower to insert GPUs anywhere in this setup. (In modern database kernels generally, "in-memory" offers few performance benefits because storage has so much bandwidth that a state-of-the-art scheduler can exploit.) An interesting open research question is the extent to which we can radically reduce cache memory entirely, since state-of-the-art schedulers can keep query execution fed off disk for the most part, even in mixed workloads. Write sparsity still recommends a decent amount of cache for mixed workloads but probably much less than Bélády's optimality algorithm superficially implies.

Almost nothing is CPU-bound in databases these days in reasonable designs, not even highly compressive data model representations, parsing, or computational geometry. Which is great! A lot of analytics is join-intensive, but that is more about latency-hiding than computation. I would argue that the biggest bottleneck at the frontier right now is network handling, and GPUs don't help with that, though FPGAs/ASICs might.

I'm not sure how a GPU would help with operational real-time. Is it even possibly to parse, process, and index tens of millions of new complex records per second over the wire concurrent with running multiple ad hoc queries on a GPU? I've done this many times on a CPU but I've never seen a GPU database that came within an order of magnitude of that in practice, and I've used a few different GPU databases plus some custom bits. GPUs work better in a batch world.

I use GPUs, just not for analytical databases. I am biased in that GPU databases have consistently failed to deliver credible workloads across many scenarios in my experience and I understand at a technical level why they didn't live up to their marketing. Every time one gets stood up in a lab, and I see many of them, they fail to distinguish themselves versus a state-of-the-art CPU-based architecture. Most of them actually underperform in absolute terms. Almost everyone I know that has designed and delivered a production GPU database kernel eventually abandoned it because CPUs were consistently better in real-world environments.

GPU capabilities are improving, but I have seen limited progress in directions that address the underlying issues. They just aren't built to be used that way, and there are other applications for which they are exceedingly optimal that we wouldn't want to sacrifice for database purposes. CPU developments like AVX-512 get you surprisingly close to the practical utility of a GPU for databases without the weaknesses.

Anyway, this is a really big, really large conversation. It doesn't fit in the margin of an HN post. :-)

Yeah so it sounds like you are thinking about I/O bound workloads for what you consider analytical workloads, not compute bound. Traditional GPU (or your proposal of ASIC/FPGA) doesn't matter almost by definition: external systems can't feed them. No argument there, Spark won't replace your data warehouse's lookup engine :)

Assuming the analytic workload does have some compute, however, that's more of a comment about traditional systems having bad bandwidth than GPUs themselves. GPUs are already built for latency hiding, so it's more like CPUs are playing catchup to them. Two super interesting things have been happening here going forward, IMO:

- Nvidia finally got tired of waiting for the rest of the community to expose enough bandwidth. $-wise, they bought mellanox and now trying for ARM. In practice, this means providing more storage->device and network->device through improved hw+sw like https://developer.nvidia.com/blog/gpudirect-storage . I'm not privy to hyperscaler discussions on bandwidth across racks/nodes, but the outside trend does seem to be "more bandwidth", and I've been watching for straight-from-network in from cloud providers.

- Price drops for more hetero hardware. E.g., the T4 on AWS is ~6X cheaper for less choice in stuff like 64b vs 32b than the V100, yet has similar memory and perf within that sweet spot of choices. Nvidia pushes folks to DGX (big SSD -> local multi-GPU), which works for some scales, but in the wild, I see people often land on single-T4 / many-node once you take network bandwidth + cost into consideration in larger and more balanced systems.

For our own workloads, we don't trust GPU DBs enough to get it right, so found RAPIDS to be a nicer sweet spot where even junior devs can code it (Python dataframes), while perf people can predictably tune it and appropriately plug in the latest & greatest. Out-of-memory / streaming / etc. only became a thing starting in ~december, e.g., see recent https://github.com/rapidsai/tpcx-bb results + writeups, so it's been a wild couple of years. We still stick to single-GPU / in-memory for our workloads as we care about sub-second, but have been experimenting & architecting it for as ^^^^ smooths out for our use (and to help our customers who have different-shaped workloads). I've been impressed by stuff like the T4-many-node experience as layers like dask-cudf and blazingsql build up.

> GPUs work better in a batch world.

This categorical insight should be front and center of any discussion about the relative merits of using GPUs vs CPUs.

That seems inaccurate. GPUs are used for stuff like real-time games, video, and ML inference: the hardware is explicitly built for heavy streaming.

I would agree that productive GPU data frameworks in streaming modes are nascent, e.g., https://medium.com/rapids-ai/gpu-accelerated-stream-processi... .

Next gen Nvidia 30x0 series, can have direct access to SSD without hitting the CPU. In that case would they be any worse than cpus on any workloads? I guess you could still have larger ram amounts on the cpu, albeit slower ram usually

That's for gaming (new DirectX feature), GPUDirect Storage is supported on older GPUs as well for compute. https://developer.nvidia.com/blog/gpudirect-storage/

Gpudirect is only on the data center cards afaik.

Interesting, then the only difference is that a CPU easily as 256 GB ram and a high end GPU typically 16 GB or so?

May Nvidia will start to create ML type cards with memory expansion options at some point.

Hmm - in our case all of the embeddings we process exist on SSD already. Idk enough here honestly but will see what I can learn.

Wow nvidia is selling SBCs now?

No, there's a new thing about giving GPUs some kind of DMA to storage. And it's pointless on HDDs, so it's only discussed in terms of SSDs.

Microsoft is bringing the DirectStorage API from XBox to Windows, Nvidia calls theirs RTX IO. I think they're the same class of idea, like Vulken vs. Metal.

They do have SBCs, I think, but other than being the basis for the Nintendo Switch I haven't heard much about them.

> Like Pinot, Elasticsearch is a JVM-based database, and as such, does not support joins and its query execution runs at a higher memory cost.

What does the JVM have to do with joins?

> In the past, we have utilized many third-party database solutions for real-time analytics, but none were able to simultaneously address all of our functional, scalability, performance, cost, and operational requirements.

Completely original excuse for an over-staffed engineering organization to justify doing some crazy stuff.

Does anyone know how this compares to the RapidsAI project called BlazingSql?

Howdy, full disclosure I'm the CEO at BlazingSQL (BSQL).

I'm not incredibly familiar with Ares save the linked article, but we aren't a DBMS or manage data in any way.

BlazingSQL is a SQL engine, it's easier to think of it similar to SparkSQL, Presto, Drill, etc.

We're core contributors to RAPIDS cuDF (CUDA DataFrame), which is a Pyhton and C++ library for Apache Arrow in-GPU memory. The Python library follows a pandas-like API, and the compute kernels are in C/C++.

BSQL binds to the same C++ as the pandas-like cuDF. What this enables users to do is interact with a DataFrame with either SQL or pandas depending on their needs or preferences. This interoperability means that the rest of the RAPIDS stack can be applied to a variety of different use cases (data viz, ML, Graph, Signal Processing, DL, etc), with the same DataFrame.

The DataFrame also has performant libraries for IO, Joins, Aggregations, Math operations, and more.

Here is an example of running a query on ~1TB on a single GPU in under 9 minutes. The data was stored on AWS S3 in Apache Parquet. https://twitter.com/blazingsql/status/1303370102348361729

Here is an example of scaling that same query up to 32 GPUs and running it in 16 seconds. https://twitter.com/blazingsql/status/1304450203030880257

Again, think of BSQL as a query engine, that runs queries on data wherever and however you have it. Here is a BSQL user running 1-2 minute queries on 1.5TB of CSV files using 2 GPUs. https://twitter.com/tomekdrabas/status/1303824164273270789

Let me know if that helps at all (or not).

This should have the 2019 tag in the title

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact