Hacker News new | past | comments | ask | show | jobs | submit login

I see a lot of misconceptions about using ML for compilers. You don't ask the model what instructions to emit. Instead, you prepare a set of passes which are guaranteed to preserve correctness (we already have hundreds of them). Then you ask the model - what passes should I apply and in what order.

Writing code to unroll a loop is trivial. The limitations of compilers are that almost all currently existing languages are too low level for optimization. ML has the potential to extract back this lost information. Basically the opposite of lowering an IR.




ML for phase ordering is just one problem that ML could solve within compilers.

Heuristic replacement (like loop unrolling) is another big one. For the specific case of loop unrolling, I would think lower level elements like how much iCache pressure the unrolling creates/whether or not the loop could fit in the DSB buffer would matter more.

For your point about existing IRs being too low-level, there has been a large push to try and work on that. MLIR has been used pretty extensively for that problem in ML applications, and languages like Rust have multiple higher level IRs. There's also a preliminary implementation of a Clang-IR for C/C++, and there's even be some work on higher level representations within LLVM-IR itself.


> prepare a set of passes which are guaranteed to preserve correctness (we already have hundreds of them)

Hundreds of passes, sure. Guaranteed that they preserve correctness is a bit more dubious, that's pretty hard to establish for most transforms. Passes that make no assumptions about prior passes are tricky too since compilers tend to work in terms of a lowering pipeline.

If the compiler has N correct passes that can be combined in arbitrary order without compromising compiler termination, exponentially increasing code size or generally making the output much worse, then you've already built a really good compiler. The subtask of then shuffling the order of passes to see if you missed anything is trivial, using machine learning to control your sort & test loop doesn't seem very compelling here.

My hunch is that the low hanging fruit in compiler dev using LLM is driving a fuzz tester with one. Other things seem worthwhile but difficult.


> The subtask of then shuffling the order of passes to see if you missed anything is trivial, using machine learning to control your sort & test loop doesn't seem very compelling here.

This is called the "phase ordering problem", and it's neither trivial nor solved.


> You don't ask the model what instructions to emit.

You could still do that, you'd just also need to ask the model for a proof. (But I guess that's much harder than heuristically picking which passes to apply.)


This is the right way to interact with LLMs in general. Ask for what you want, but independently verify the result. Don't be like that lawyer "...but I asked ChatGPT if it was telling the truth, and it said yes!"


What you describe is useful, but it's not really what my comment described.

My comment was describing the wacky idea of asking the model to come up with a formal, machine-checkable mathematical proof of correctness, too. That's hard in general.

The idea in the article of just letting the model pick between different, already proven-correct, optimization passes is much saner most of the time.


What do you do when you check the proof and discover it is false? Just ask again for a proof or discard the emitted instructions?


Depends on your use case, I guess?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: