It makes sense that a simulation game like factorio would be memory bandwidth limited: each tick it needs to update the state of a large number of entities using relatively simple operations. The main trick to making it fast is reducing that amount of data you need to update each tick and arranging the data in memory so you can load it fast (i.e. predictably and sequentially so the prefetcher can do is job) and only need to load and modify it once each tick (at least in terms of loading and eviction from cache). The complexity is in how best to do that, especially both at once. For example, in the blog post they have linked lists for active entities. This makes sense from the point of view of not loading data you don't need to process, but it limits how fast you can load data because you can only prefetch one item ahead (compared to an array where you can prefetch as far ahead as your buffers will allow)