Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For the most part, I don’t do chatbots except for a couple of RAG based chatbots. It’s more behind the scenes stuff like image understanding, categorization, nuanced sentiment analsys, semantic alignment, etc.

I’ve created a framework that lets me test the quality in automated way between prompt changes and models and I compare costs/speed/quality.

The only thing that requires humans to judge the qualify out of all those are RAG results.





So who is the winner using the framework you created?

It depends. Amazon’s Nova Light gave me the best speed vs performance when I needed really quick real time inference for categorizing a users input (think call centers).

One of Anthropics models did the best with image understanding with Amazon’s Nova Pro being slightly behind.

For my tests, I used a customer’s specific set of test data.

For RAG I forgot. But is much more subjective. I just gave the customer an ability to configure the model and modify the prompt so they could choose.


Your experience matches mine then... I haven't noticed any clear, consistent differences. I'm always looking for second opinions on this (bc I've gotten fairly cynical). Appreciate it



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: