"enabling frame pointers is a 1-2% performance loss, which translates to the los...

thechao · 2024-11-04T15:00:10 1730732410

As a part time compiler author I'm extremely skeptical we're getting a global 1–2%/yr. I'd've thought more like a tenth to half that? I've not seen any numbers, so I'm just making shit up.

However, for sure, if compiler optimizations disappeared, HW would pick up the slack in a few years.

fanf2 · 2024-11-04T15:00:59 1730732459

Proebsting’s Law suggests 4% per year, but as a satirical joke it seems to have underdone its cynicism.

https://gwern.net/doc/cs/algorithm/2001-scott.pdf

variadix · 2024-11-04T16:35:40 1730738140

There’s likely a lot of performance still on the table if compilers were permitted to change data structure layout, but I think doing this effectively is an open problem.

Current compilers could do a lot better with vectorization, but it will often be limited by the data structure layout.

clausecker · 2024-11-04T14:58:13 1730732293

Yeah, compilers are already pretty close to the limit of what is possible, unless your code is unusually poorly written.

londons_explore · 2024-11-04T15:30:58 1730734258

Clearly this isn't the case. Plenty of neat C++ "reference implementation" code ends up 5x faster when hand optimized, parallelized, vectorized, etc.

There are some transformations that compilers are really bad at. Rearranging data structures, switching out algorithms for equivalent ones with better big-O complexity, generating & using lookup tables, bit-packing things, using caches, hash tables and bloom filters for time/memory trade offs, etc.

The spec doesn't prevent such optimizations, but current compilers aren't smart enough to find them.

adrianN · 2024-11-04T15:43:06 1730734986

Imagine the outcry if compilers switched algorithms. How can the compiler know my input size and input distribution? Maybe my dumb algorithm is optimal for my data.

londons_explore · 2024-11-04T16:50:41 1730739041

Compilers can easily runtime-detect the size and shape of the problem, and run different code for different problem sizes. Many already do for loop-unrolling. Ie. if you memcpy 2 bytes, they won't even branch into the fancy SIMD version.

This would just be an extension of that. If the code creates and uses a linked list, yet the list is 1M items long and being accessed entirely by index, branch to a different version of the code which uses an array, etc.

adrianN · 2024-11-05T04:26:57 1730780817

If I know my input shape in advance and write the correct algorithm for it, I don't want any runtime checking of the input and the associated costs for branching and code size inflation.

londons_explore · 2024-11-05T11:06:52 1730804812

Presumably you could tell the compiler of those constraints eg. via assert(n<100000)

laserbeam · 2024-11-04T15:57:28 1730735848

That's my question. I'm also under the impression that optimizations CAN be made manyally, but I find it surprising that "current compilers aren't smart enought to find them" isn't improving

londons_explore · 2024-11-04T18:11:46 1730743906

The percentage of all software engineers working on compilers is probably lower now than it ever has been...