Hacker News new | past | comments | ask | show | jobs | submit login

"enabling frame pointers is a 1-2% performance loss, which translates to the loss of about 1 or 2 years of compiler improvements"

Wait, are we really that close to the maximum of what a compiler can optimize that we're getting barely 1% performance improvements per year with new versions?






As a part time compiler author I'm extremely skeptical we're getting a global 1–2%/yr. I'd've thought more like a tenth to half that? I've not seen any numbers, so I'm just making shit up.

However, for sure, if compiler optimizations disappeared, HW would pick up the slack in a few years.


There’s likely a lot of performance still on the table if compilers were permitted to change data structure layout, but I think doing this effectively is an open problem.

Current compilers could do a lot better with vectorization, but it will often be limited by the data structure layout.


Proebsting’s Law suggests 4% per year, but as a satirical joke it seems to have underdone its cynicism.

https://gwern.net/doc/cs/algorithm/2001-scott.pdf


Yeah, compilers are already pretty close to the limit of what is possible, unless your code is unusually poorly written.

Clearly this isn't the case. Plenty of neat C++ "reference implementation" code ends up 5x faster when hand optimized, parallelized, vectorized, etc.

There are some transformations that compilers are really bad at. Rearranging data structures, switching out algorithms for equivalent ones with better big-O complexity, generating & using lookup tables, bit-packing things, using caches, hash tables and bloom filters for time/memory trade offs, etc.

The spec doesn't prevent such optimizations, but current compilers aren't smart enough to find them.


Imagine the outcry if compilers switched algorithms. How can the compiler know my input size and input distribution? Maybe my dumb algorithm is optimal for my data.

Compilers can easily runtime-detect the size and shape of the problem, and run different code for different problem sizes. Many already do for loop-unrolling. Ie. if you memcpy 2 bytes, they won't even branch into the fancy SIMD version.

This would just be an extension of that. If the code creates and uses a linked list, yet the list is 1M items long and being accessed entirely by index, branch to a different version of the code which uses an array, etc.


If I know my input shape in advance and write the correct algorithm for it, I don't want any runtime checking of the input and the associated costs for branching and code size inflation.

Presumably you could tell the compiler of those constraints eg. via assert(n<100000)

That's my question. I'm also under the impression that optimizations CAN be made manyally, but I find it surprising that "current compilers aren't smart enought to find them" isn't improving

The percentage of all software engineers working on compilers is probably lower now than it ever has been...



Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: