I feel like humans don’t have great intuitions around the ideal format for an LLM. But an LLM probably does have a good sense for what would work well for itself.
And also I think that we shouldn’t necessarily expect to find LLM targeted documentation to be intuitive or helpful as human reader.
Ultimately you could empirically measure how well a given distilled documentation file works compared to another one with a given LLM. Just give it tasks to do that rely on the documentation in thorough ways that give good coverage (would be easy to have Claude generate such code from the docs).
Then see how well the LLM does on a battery of tasks given a particular distilled documentation in its context window along with a request for a particular coding task.
The distilled documentation file that gets the best average score with the shortest length is ideal and that should be the one you use for this.
Yeah, agree that one could / should definitely put it through evals if it were going to be a product. There’s quite a bit to a good evals program, though.
I think the main implementation challenge here - as in most of these things - would be how you retrieve / traverse the distilled documents. I think it’s hard to disentangle your distillation method from your preferred retrieval method. Surfacing stuff that’s relevant to a query is pretty trivial, surfacing that in a way that takes account of complex cross-document references, less so. All old-school search skills, though - suddenly (or soon to be) in big demand, no doubt.
And also I think that we shouldn’t necessarily expect to find LLM targeted documentation to be intuitive or helpful as human reader.
Ultimately you could empirically measure how well a given distilled documentation file works compared to another one with a given LLM. Just give it tasks to do that rely on the documentation in thorough ways that give good coverage (would be easy to have Claude generate such code from the docs).
Then see how well the LLM does on a battery of tasks given a particular distilled documentation in its context window along with a request for a particular coding task.
The distilled documentation file that gets the best average score with the shortest length is ideal and that should be the one you use for this.