Also think about the program-synthesis approach proposed by Poetiq.ai.
python programs are being generated and evaluated against previous examples.
Then in-context learning is done programmatically via prompt concatenation.
If you can "score" online the working and non working examples, then you have a very strong reward signal.
Scaffolding is all you need. I am absolutely certain about that.
It's abound finding good ways to approximate the reward function being used during post-training, but at inference time. A general enough reward that can score candidates well will inevitably improve the abilities of LLMs when put inside scaffolds.
I’m one of the maintainers of skfolio, and I wanted to share something I’ve been tinkering with lately.
This project was heavily inspired by Andrej Karpathy’s autoresearch pattern. I wanted to see if I could apply that same "loop" to quantitative finance—specifically, using LLM agents to autonomously iterate on portfolio construction and risk strategies.
Turns out that GLM-5 improved the deflated Sharpe ratio significantly, hitting scores up to 0.93 in my testing.
I've also added a section about how to run Claude Code for free using OpenRouter free-tier models.
reply