I see a lot of misconceptions about using ML for compilers. You don't ask the mo...

boomanaiden154 · on Sept 18, 2023

ML for phase ordering is just one problem that ML could solve within compilers.

Heuristic replacement (like loop unrolling) is another big one. For the specific case of loop unrolling, I would think lower level elements like how much iCache pressure the unrolling creates/whether or not the loop could fit in the DSB buffer would matter more.

For your point about existing IRs being too low-level, there has been a large push to try and work on that. MLIR has been used pretty extensively for that problem in ML applications, and languages like Rust have multiple higher level IRs. There's also a preliminary implementation of a Clang-IR for C/C++, and there's even be some work on higher level representations within LLVM-IR itself.

JonChesterfield · on Sept 18, 2023

> prepare a set of passes which are guaranteed to preserve correctness (we already have hundreds of them)

Hundreds of passes, sure. Guaranteed that they preserve correctness is a bit more dubious, that's pretty hard to establish for most transforms. Passes that make no assumptions about prior passes are tricky too since compilers tend to work in terms of a lowering pipeline.

If the compiler has N correct passes that can be combined in arbitrary order without compromising compiler termination, exponentially increasing code size or generally making the output much worse, then you've already built a really good compiler. The subtask of then shuffling the order of passes to see if you missed anything is trivial, using machine learning to control your sort & test loop doesn't seem very compelling here.

My hunch is that the low hanging fruit in compiler dev using LLM is driving a fuzz tester with one. Other things seem worthwhile but difficult.

titzer · on Sept 18, 2023

> The subtask of then shuffling the order of passes to see if you missed anything is trivial, using machine learning to control your sort & test loop doesn't seem very compelling here.

This is called the "phase ordering problem", and it's neither trivial nor solved.

eru · on Sept 18, 2023

> You don't ask the model what instructions to emit.

You could still do that, you'd just also need to ask the model for a proof. (But I guess that's much harder than heuristically picking which passes to apply.)

hoosieree · on Sept 18, 2023

This is the right way to interact with LLMs in general. Ask for what you want, but independently verify the result. Don't be like that lawyer "...but I asked ChatGPT if it was telling the truth, and it said yes!"

eru · on Sept 19, 2023

What you describe is useful, but it's not really what my comment described.

My comment was describing the wacky idea of asking the model to come up with a formal, machine-checkable mathematical proof of correctness, too. That's hard in general.

The idea in the article of just letting the model pick between different, already proven-correct, optimization passes is much saner most of the time.

SiempreViernes · on Sept 18, 2023

What do you do when you check the proof and discover it is false? Just ask again for a proof or discard the emitted instructions?

eru · on Sept 18, 2023

Depends on your use case, I guess?