|
|
| | Show HN: Llama-8B Teaches Itself Baby Steps to Deep Research Using RL (github.com/dcaples) | |
39 points by diegocaples 16 days ago | hide | past | favorite | 3 comments
|
| | I've been tinkering with getting Llama-8B to bootstrap its own research skills through self-play. The model generates questions about documents, searches for answers, and then learns from its own successes/failures through RL (hacked up Unsloth's GRPO code). Started with just 23% accuracy on Apollo 13 mission report questions and hit 53% after less than an hour of training. Everything runs locally using open-source models. It's cool to see the model go from completely botching search queries to iteratively researching to get the right answer. |
|

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
|
Questions:
-- Does a training over a body of data export into better performance over subsequent bodies of data - as you should also be training meta-skills?
-- Your benchmark revealed a growth from 23% to 53% after an hour: and after further training? If it plateaus, why does it?