Hacker News new | past | comments | ask | show | jobs | submit login

> Llama 2 might by some measures be close to GPT 3.5, but it’s nowhere near GPT 4

I think you're right about this, and benchmarks we've run at Anyscale support this conclusion [1].

The caveat there (which I think will be a big boon for open models) is that techniques like fine-tuning makes a HUGE difference and can bridge the quality gap between Llama-2 and GPT-4 for many (but not all) problems.

[1] https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...




Frankly, number of benchmarks you guys are using are too narrow. In fact these benchmarks are "old world" benchmarks, easy to game through finetuning and we should be stop using them altogether for LLMs. Why are you not using Big Bench Hard or OpenAI evals?


can I fine tune it on like 2,000 repos at a corporation (code based) and have it understand the architecture?


I don't think you can do that with any AI models. It almost feels like a fundamental misrepresentation of how they work.

You could fine-tune a conversational AI on your codebase, but without loading said codebase into it's context it is "flying blind" so-to-speak. It doesn't understand the data structure of your code, the relation between files and probably doesn't confidently understand the architecture of your system. Without portions of your codebase loaded into the 'memory' of your model, all that your finetuning can do is replicate characteristics of your code.


TypeChat-like things might provide the interface control for future context driven architectures, being some type of catalysis. Using the self-reflective modeling is a form of contextual insight.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: