I think the stuff about nested loops performing poorly is a misunderstanding. There's nothing inherently bad about nested loops. The motivation given in the previous post[0] is:
> The big breakthrough I had was after looking at the hot function I realised that storing the game events in a List of Arrays meant that we had a loop in a loop. It was however possible to flatten this into a single array of integers and reset the game every 11 turns with a simple if check.
The issue here wasn't nested loops, it was data structure overhead and poor data locality from having a lot of tiny lists (11 items each, as I understand it).
Unsurprisingly, the biggest performance gains came from algorithmical changes, not micro optimisations (like if instead of switch.)
The lesson is always the same: leave micro optimisations for later and spend more time learning about data structures that can improve your code locality.
Correct. However I did find that just working on the codebase getting small wins was enough to put my mind into thinking about it and hence able to make that leap to the larger gains.
For the super tight loops scc uses the micro optimisations still add up though so it wasn’t wasted effort and in the end it all came together nicely.
Once you are somewhat experienced with optimization, the difficult part isn't getting into the mindset of optimizing, but in holding off on it until your features are in.
Before you have the features, you mostly want to find the boundaries on the problem so that you can slice it up into modules and optimize those independently. It's unusual in contemporary programming to have to do a whole-program micro-optimization pass, but that's the only kind you can do within a small program, so it's common to get the wrong idea and start shaving off bytes and cycles before the architecture is ready for it. In a large program, it's specific combinations of features that create performance spikes, and those tend to be addressed by building a new custom code path. When you scale up the data changes, so your view of the problem changes too, and you start being able to optimize by creating a codepath that combines a whole bunch of specific processing steps.
The caveat working that way is that you can end up working towards a local optimum, and lose sight of the forest for the trees when it comes to making the bigger algorithmic changes.
No, Go has about the fastest loops you could want, implemented in the obvious way at the asm level. (That is, not that Go's loops are any more extra magical fast than any other compiled language, but they are not slow as in some dynamic languages.) The focus is there because that's where you get wins, in any language.
99% the reason I didn’t put ripgrep in there. Ripgrep, sift, Ag, pt, ack solve a totally different problem to scc, tokei, loc, polyglot and cloc.
The last 1% was because I didn’t want to have any comparisons made against ripgrep because early in scc’s development I was slightly comparing walk performance and ended up having a chat with burntsushi over twitter. To avoid marking more work for him I was careful to avoid any comparisons, as the last thing we want is him distracted from working on that and other tools.
> The big breakthrough I had was after looking at the hot function I realised that storing the game events in a List of Arrays meant that we had a loop in a loop. It was however possible to flatten this into a single array of integers and reset the game every 11 turns with a simple if check.
The issue here wasn't nested loops, it was data structure overhead and poor data locality from having a lot of tiny lists (11 items each, as I understand it).
0: https://boyter.org/2017/03/golang-solution-faster-equivalent...