Why is the naive solution n^3? Generating one partial sum costs n, and you need to do it n times, so n^2. Even if you need to store all the intermediate sums, you get them as you sum the entire partial sum. No memoization needed. What am I missing here?
Let me correct my self, there are n^2 results in total, when the algorithm ends. The naive algorithm is doing n times more work. That is redundant, because there is clearly an overlap of some partial sums.
OK, now I don't understand what you are trying to say.
There are n^2 results, each result takes O(n) to compute, so the naive algorithm is O(n^3). That's what you asked - why does the naive algorithm take O(n^3) - and that seems to answer your question.
Yes, clearly there is an overlap of some partial sums, and that's why a less naive algorithm is sub-O(n^3).
So, what are you asking, or what additional point are you making?
Especially if you are doing it in frequency space, all you would need to do is 1 forward FFT and n reverse FFTs.
Alternatively you can just perform a prefix sum operation and subtract each (n - k)th element from nth element for results in the kth iteration. This could be even faster.