Exactly, this kind of thing always happens with these loops, which is why I think programs that allocate are fairer. But then people point out that the C allocator is terrible, so we can't make that point :')
Might be worth seeing if e.g. jemalloc is enough less terrible for the sort of examples you're looking at to help with that.
(though note that I mention jemalloc because I've had "huh, this C code now magically runs faster" experiences with it, make no claim it's the right one to look at, and am very sure that I don't know what I'm talking about sufficiently wrt allocators to be able to recognise the right one if it bit me on the leg)