- Splitting AsmPrinterHandler (used for unwind info) and DebugHandler (used also for per-instruction location information) to avoid two virtual function calls per instruction (https://github.com/llvm/llvm-project/pull/96785)
- Remove several maps from ELFObjectWriter, including some std::map (changed locally, need to make PR)
- Faster section allocation, remove ELF "mergeable section info" hash maps (although this is called just ~40 times per object file, it is very measurable in JIT use cases when compiling many small objects) (planned)
- X86 encoding in general; this consumes quite some time and looks very inefficient -- having written my own x86 encoder, I'm confident that there's a lot of improvement potential. (not started)
Some takeaways on a higher level -- most of these aren't really surprising, but nonetheless are very frequent problems(/patterns) in the LLVM code base:
- Maps/hash maps/sets are quite expensive when used frequently, and sometimes can be easily avoided, e.g., with a vector or, for pointer keys, a pointer dereference
- Virtual functions(/abstraction) calls comes at a cost, especially when done frequently
- raw_svector_ostream is slow, because writes are virtual function calls and don't get inlined (I previously replaced raw_svector_ostream with a SmallVector&: https://reviews.llvm.org/D145792)
- Frequent heap allocations are costly, especially with glibc's malloc
- Many small inefficiencies add up (=> many small improvements do, too)
Big thanks for the recent performance changes!
The "many small inefficiencies" point resonates – it definitely shows how performance is hurt in many small areas.
(I aim to write blog posts every 2-3 weeks, but this latest one was postponed...
I wrote this in relatively short time so that the gap would not be too long, and I really should take time to refine the post.)
Side note, but I was looking for a pre-built binaries in releases of LLVM project. Specifically I was looking for clang+llvm releases for x86_64 linux (ubuntu preferably) in order to save some time (always had trouble compiling it) and to put it into my own `prefix` directory. It's kind of wild to see aarch64, armv7, powerpc64, x86_64_windows.. but not something like this. I am aware of https://apt.llvm.org/ and its llvm.sh - but as I said, I'd prefer it to live in its own `prefix`. Anyone knows where else there might be pre-builts? There used to be something just like that for v17, like https://github.com/llvm/llvm-project/releases/download/llvmo...
Why not just an LLM-based interpreter that direclty executes a PDF spec plus edits received by email? No need to recomplile and restart the app. A DB is also not required - the LLM will naturally remember all user requests and figure out the current state. (We'll solve the limitations of context later)
Have it emit plausible-looking x64 instructions by training on lots of executables, get a program out which has some behaviour. Might be worth seed funding at the moment.
Yeah, this is however the same kind of discussion as back in the day Assembly developers not trusting FORTRAN compilers, so it is a matter of time, and funding.
The Fortran compilers were trying to get the answer right whereas the proposed funding void would at best be trying to avoid a segv.
What probably does have real merit is tying a superoptimiser to a LLM, provided you've got the SAT solver included in the mix as well to know if it worked.
You missed out the input to the LLM, which would presumably be a requirements spec with all behaviour specified in exact detail, including all the tricky corner cases were someone has to think hard about which solution is most useful and least confusing to the customer. Natural language isn't great for expressing such things. A formal notation would be easier. Perhaps something that makes it easy to express if-this-then-that kinds of things. I wonder if a programming language would be good for that.
Indeed, that is why, based on offshoring experience, I see a future where the developers of tomorrow are mostly technical architects, with Star Trek style "Computer do XYZ".
This has been tried before with UML, see Rational, Together or Enterprise Architect, however LLMs bring an additional automation step to the whole thing.
If you have a verification step behind the llm that proves semantic equivalence between the original code and the llm output I could imagine scenarios where it can be beneficial.
- Removing per-instruction timers, which add a measurable overhead even when disabled (https://github.com/llvm/llvm-project/pull/97046)
- Splitting AsmPrinterHandler (used for unwind info) and DebugHandler (used also for per-instruction location information) to avoid two virtual function calls per instruction (https://github.com/llvm/llvm-project/pull/96785)
- Remove several maps from ELFObjectWriter, including some std::map (changed locally, need to make PR)
- Faster section allocation, remove ELF "mergeable section info" hash maps (although this is called just ~40 times per object file, it is very measurable in JIT use cases when compiling many small objects) (planned)
- X86 encoding in general; this consumes quite some time and looks very inefficient -- having written my own x86 encoder, I'm confident that there's a lot of improvement potential. (not started)
Some takeaways on a higher level -- most of these aren't really surprising, but nonetheless are very frequent problems(/patterns) in the LLVM code base:
- Maps/hash maps/sets are quite expensive when used frequently, and sometimes can be easily avoided, e.g., with a vector or, for pointer keys, a pointer dereference
- Virtual functions(/abstraction) calls comes at a cost, especially when done frequently
- raw_svector_ostream is slow, because writes are virtual function calls and don't get inlined (I previously replaced raw_svector_ostream with a SmallVector&: https://reviews.llvm.org/D145792)
- Frequent heap allocations are costly, especially with glibc's malloc
- Many small inefficiencies add up (=> many small improvements do, too)