Hacker News new | past | comments | ask | show | jobs | submit login
I just spent the past 5 hours comparing LLMs
13 points by mlashuel 14 days ago | hide | past | favorite | 7 comments
The top 2 takeaways I took from this is that.

1. No one should still be using GPT 3.5 or Gemini 1.0, They may get the job done, but 98% of the time one of these LLMs (Gemini 1.5 Pro, Bing Copilot, ChatGPT4o, and Perplexity) will give you a better response.

2. The best way to always get the best response is by always comparing multiple AI Chatbots and choosing the best answer. This is because

If you are only using ChatGPT-3.5, llama3, claude sonnet, or mistral 100% of the time there is another LLM with a better answer. If you are only using Gemini 1.0 96% of the time there is another LLM with a better answer.

If you are only using perplexity 86% of the time there is another LLM with a better answer.

If you are only using ChatGPT-4o 73% of the time there is another LLM with a better answer.

If you are only using Gemini 1.5 Pro 73% of the time there is another LLM with a better answer.

If you are only using Bing Copilot 73% of the time there is another LLM with a better answer.

I personally used chatplayground.ai to compare all of them.

I stopped using chatgpt 3.5 a long time ago because this gives you all the pro llms for the same price as gpt4

It is very important to note this research was done on a very small dataset of 22 questions.

The best LLM answer for each question was decided by me simply putting myself in the choose of the person asking that question and deciding which output is the most helpful.

Perplexity is good at giving you a completely answer, most llms will give you bullet point answers, perplexity perfers to write out a complete answer rather than just giving bullet points

Bing Copilot is great because it cites its resources, will even recommend videos to help you

ChatGPT-4o has really descriptive and usually longer answers, but prefers to answer in bullet points

Gemini 1.5 pro is great because it feels like it understands the context of your question more by having a conversational tone

Bing copilot can be great but 20-30% of the time the answers it gives are not even usable

Because Bing copilot is using sources to give you answers the answers feel much more human like and at times more useful then other LLMs that just list a bunch of basic bullet points

Their should be no reason anyone is still using ChatGPT 3.5 today

Gemini 1.0 gives good answers but not as detailed and helpful as Gemini 1.5 Pro

ChatGPT-4o was able to generate really nice data tables, that Gemini 1.5 Pro wasn't able to.

ChatGPT-3.5 ChatGPT-4o Gemini 1.0 Gemini 1.5 Pro Bing Copilot Claude Sonnet Llama 3 Mixtral 8x7b Mistral Large Perplexity




Your number one that you bring up shouldn't be a universal. You're basing everything, it seems, upon LLMS as being knowledge retrieval systems. You are dismissing situations where someone might want more creative output without having to guide the LLM with a more strict prompt.


Thanks. Can you share a sample of the questions you used? Were you comparing the LLMs to answers you knew already, or comparing the new answers which were output by the LLMs

GPT-4T/o is the smartest

Opus has the best vocabulary/apparent creativity

Gemini is the most "neutral"


What's the reasoning for the data? Show the reference, table, graph to backup.


you also recently did a post on reddit, didnt u? you're associated with chatplayground.ai where u did 80k views and got 8 meaningful conversions in the end or so...is that u?

GPT 3.5 and haiku are great for prototyping

3.5 great for summarizing



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: