Could you do something like computing the aggregate groups in background jobs instead, perhaps staggered, to take that out of the critical path?

Yup, that would reduce write IOPs but increase the latency for making the data available to our users.

