Hacker Newsnew | past | comments | ask | show | jobs | submit | Linello's commentslogin

Also think about the program-synthesis approach proposed by Poetiq.ai. python programs are being generated and evaluated against previous examples. Then in-context learning is done programmatically via prompt concatenation. If you can "score" online the working and non working examples, then you have a very strong reward signal.

Scaffolding is all you need. I am absolutely certain about that. It's abound finding good ways to approximate the reward function being used during post-training, but at inference time. A general enough reward that can score candidates well will inevitably improve the abilities of LLMs when put inside scaffolds.

I’m one of the maintainers of skfolio, and I wanted to share something I’ve been tinkering with lately. This project was heavily inspired by Andrej Karpathy’s autoresearch pattern. I wanted to see if I could apply that same "loop" to quantitative finance—specifically, using LLM agents to autonomously iterate on portfolio construction and risk strategies. Turns out that GLM-5 improved the deflated Sharpe ratio significantly, hitting scores up to 0.93 in my testing. I've also added a section about how to run Claude Code for free using OpenRouter free-tier models.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: