Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Finetuning LLMs: Open-source vs. Close-source
2 points by rohit_saha on Oct 8, 2023 | hide | past | favorite | 1 comment
Hello all,

I have been working on benchmarking different LLMs -- both open-source and closed-source.

Repo: https://github.com/georgian-io/LLM-Finetuning-Hub

Precisely, I am comparing their out-of-the-box capabilities (prompting) and their fine-tuned conterparts!

So far, the following models have been benchmarked:

Open-Source: - Flan-T5-large (780M) - RedPajama (3B & 7B) - Falcon 7B - Llama2 (7B & 13B) - Mistral 7B - Mosaic MPT 7B

Close-Source: - AI21 Jurassic-2 (Light, Grande, Jumbo) - Writer Palmyra 30B - GPT 3.5 Turbo 154B

The following trends have emerged:

- For out-of-the-box zero-shot & few-shot prompting, GPT-3.5-turbo takes the cake! Highly likely that being the biggest model out of all helps with generalizability! - Even among other closed-source LLMs such as Jurassic-2 and Palmyra, GPT-3.5-turbo wins! - Open-source models, however, do not fare very well for out-of-the-box tasks. Between all open-source models, Mistral-7B achieves the best stats! - When it comes to finetuning, things get very competitive! We notice that much smaller models such as Llama2-7B, Mistral-7B, Falcon-7B are able to compete with the likes of GPT-3.5-turbo 154B and Jurassic-2 models.

The last point makes me very hopeful that smaller LLMs when finetuned on narrow use-cases / data can give a tough competition to the much larger models.

I am aiming to benchmark more models including Anthropic Claude2, Google PaLMv2, CoHere Command and Inflection Pi. My hunch is these huge closed-source models will generally perform well out-of-the-box compared to (smaller) open-source models. However, finetuning will change the game, where smaller models will compete or even out-compete the larger models!

Since there are so many LLMs out there, would love to get some help, in case anyone consider contributing :)



Put it in a blog post or, even better, a reproducible GitHub tepo

https://news.ycombinator.com/showhn.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: