I think this is an excellent way to think about LLM's and any other software-aug...

I think this is an excellent way to think about LLM's and any other software-augmented task. Appreciate you putting the time into an article. I do think your points supported by the graph of training steps vs. response length could be improved by including a graph of (response length vs. loss) or (response length vs. task performance), etc. Though # of steps correlates with model performance, this relationship weakens as # steps goes to infinity.

There was a paper not too long ago which illuminated that reasoning models will increase their response length more or less indefinitely toward solving a problem, but the return from doing so asymptotes toward zero. My apologies for missing a link.