It’ll require software strategies such as splitting the working set into multiple pieces and dealing with them one by one. In some extreme cases, for example consider a huge hash table used in a hash join that exceeds the max GPU memory, most systems just simply report OOM.
However we noted that NVIDIA recently announced the “Grace” CPU, which seemingly breaks the memory wall between host and device. This may be a game changer for such cases.
I wonder what proportion of analytic queries are compute-bound rather than disk-bound? I would guess most are disk-bound, so using a GPU seems like it will only benefit a small class of queries. (Though perhaps that class will grow larger if GPU acceleration is widely available... maybe SQL:2025 will add functions to train a ML model in a query!)
For example analytical queries tend to have a big portion of joins and aggregations, they will be more CPU-intensive rather than disk-bound, esp. when complicated data types are involved such as decimal. Further more, traditional databases have buffer pool which is expected to buffer most disk accesses.
But if you have to hit the buffer pool (memory) then you have to cross the PCIe bus. With fast NVMe storage you might get similar speeds going directly to storage (e.g. DirectStorage) https://news.ycombinator.com/item?id=25956670
Actually, you have the reasoning backwards for many common use cases like web analytics. To the extent queries are doing on-the-fly aggregation you are doing column scans, which are by definition heavy on IO.
>To avoid I/O operations amortizing the overall performance improvement, we used the Coprocessor Cache to buffer all the intermediate results of the Table Scans. It effectively makes TiDB a "hypothetical" in-memory database.
Basically this article does the similar thing with BlazingSQL. Actually both are built on top of the cuDF project. The difference is that TiDB is an HTAP database and this article describes how we empower TiDB’s analytical processing by leveraging a GPU.
And the comparison is between TiDB running on a GPU and a CPU. No other database is involved in this comparison, and this may come in the future once the productionizing of this project is done.
Question is how does it perform when the working set doesn't fit in GPU memory?