
Disk Spilling in a Vectorized Execution Engine - asubiotto
https://www.cockroachlabs.com/blog/disk-spilling-vectorized/
======
arielweisberg
What about aggregation? I assume it's also sort based?

~~~
asubiotto
We haven't implemented disk spilling for aggregations yet. It's different from
the algorithms discussed in this article in that the amount of memory used is
proportional to the number of aggregation groups (buckets), rather than the
number of input tuples. This makes it less likely that aggregations would need
to spill to disk and requires a slightly specialized solution to figure out
how to serialize intermediate aggregation results.

Sorting is a good idea. The in-memory aggregator could keep track of the
aggregation columns of each bucket, sort the buckets and input that has not
yet been processed and then perform an ordered aggregation on the sorted input
using the already-computed intermediate result as a starting point for the
group's aggregation result.

Another option is to partition the buckets and input, which would subdivide
the aggregation and avoid a sort.

