I've been wondering for a while if LLMs perform better on languages that are more restrictive than Python out-of-the-box. Is anyone using Claude, GPT, Cursor, Aider etc for ADA or functional languages and do they create less errors than code generated in Python, even if it's subjective? The question is a little inspired by this paper: https://arxiv.org/pdf/2410.01215 FROM CODE TO CORRECTNESS: CLOSING THE LAST
MILE OF CODE GENERATION WITH HIERARCHICAL
DEBUGGING. Sometimes LLMs are fantastic and sometimes it feels like a battle to get simple things done or pick up where I left off. I think I'm trying to get a feeling for how much the errors might be a funtion of a languages flexibility to do things in different ways vs the quality of the LLM or whether tools like these in the paper will be needed for future tooling improvements.
I've tried various LLMs with Haskell and I'm generally disappointed, but maybe I should give it another go.
By its restrictive nature, the Haskell compiler will admit fewer buggier programs than another language. But that doesn't mean an LLM will be better at producing code (as opposed to plausible nonsense)
I always preferred the Idris approach of code generation, which predates this deep learning fad. It's a proof search, where you supply the input types and the output types and Idris will find a correct implementation between the two.
By its restrictive nature, the Haskell compiler will admit fewer buggier programs than another language. But that doesn't mean an LLM will be better at producing code (as opposed to plausible nonsense)
I always preferred the Idris approach of code generation, which predates this deep learning fad. It's a proof search, where you supply the input types and the output types and Idris will find a correct implementation between the two.
reply