Hacker Newsnew | past | comments | ask | show | jobs | submit | gaflo's commentslogin

Can you elaborate what kind of system you built? I'm curious what specific prompts are getting worse responses with the newer models.

Linguistics, specifically as it pertains to language learning

Edit: Whoops read your question wrong. I do a bunch of NLP on different languages, and use LLMs to pad out and interpret the data. Asking for things like translations, alternatives, transliterations; associating and validating data; transferring data from one language to another; segmentation and cross lingual alignment; the list goes on.

I did manage to get higher quality in the end, so it’s not entirely a regression. But older LLMs were much more capable with less prompting at interpreting disparate data and tying it together.

Most of the work I do does not really have a “right answer,” just a lot of wrong ones, which I think is what trips up LLMs. If I turn on reasoning for any step in my pipeline, the token count goes up 100 fold and the quality gets cut in half.

Edit 2: I did have to move off of GPT though to get the improvements mentioned. Go mistral!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: