Hacker News new | past | comments | ask | show | jobs | submit | diegocaples's comments login

Thanks! This is more of an engine to optimize an *LLM to use* an interface over a dataset. End-to-end reinforcement learning of entire agent pipelines will be an important way to increase their reliability.

I haven't tried to switch the dataset, but I am fairly certain the LLM is training meta-skills. It seems that the majority of what the model learns is to behave in a more reasonable way, and to stop hallucinating + improperly using tools. Not to memorize the data in the body of knowledge.

During the first hour of training, llama learns most of the low hanging fruit (stop messing up function calls and stop hallucinating). So after that, learning slows down.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: