I profile aggressively using Superluminal. All of the passes are O(N), it's mostly an issue of the amount of time it takes to go through and lay out a few thousand boxes with constraints and configuration flags set. There aren't many 'bottlenecks' and it's more just a bunch of CPU time spread across the whole algorithm.