The "fanout" or "wasteful duplication of a single message 30 million times" is only required because they are using tiny underpowered hardware to begin with. The approach that they claim can't possibly work actually does work. You just can't do it on a $2000 "server".
That still means 30 million writes instead of one. That the size of each write is smaller is not going to help you all that much if each write still worst case ends up forcing you to rewrite at least one disk sector.
While I'm inclined to favour a fanout approach, a "full" fanout can easily be incredibly costly on the write side.