> Llama 2 might by some measures be close to GPT 3.5, but it’s nowhere near GPT ...

sytelus · 2023-08-14T05:12:54

Frankly, number of benchmarks you guys are using are too narrow. In fact these benchmarks are "old world" benchmarks, easy to game through finetuning and we should be stop using them altogether for LLMs. Why are you not using Big Bench Hard or OpenAI evals?

MuffinFlavored · 2023-08-13T23:31:33

can I fine tune it on like 2,000 repos at a corporation (code based) and have it understand the architecture?

smoldesu · 2023-08-14T00:43:36

I don't think you can do that with any AI models. It almost feels like a fundamental misrepresentation of how they work.

You could fine-tune a conversational AI on your codebase, but without loading said codebase into it's context it is "flying blind" so-to-speak. It doesn't understand the data structure of your code, the relation between files and probably doesn't confidently understand the architecture of your system. Without portions of your codebase loaded into the 'memory' of your model, all that your finetuning can do is replicate characteristics of your code.

mycall · 2023-08-14T02:31:33

TypeChat-like things might provide the interface control for future context driven architectures, being some type of catalysis. Using the self-reflective modeling is a form of contextual insight.