But new models are popping up every few months ->> means they trained every couple months.
I don't know if there a correlation between what LLM would choose now and how you product should look to most likely be in LLM data set.
In that YC video i mentioned in post body they discuss tool called ReSend - something like an email gateway for receiving/sending mails. What's interesting - there are a lot of tools like that, but LLM's would every time choose shiny new resend.
Seems like there are something more than just being in the internet for a long time :)
I wrote this post because of exactly those corner cases. If I'm building something agents would use - how do i understand which tool they'd actually choose?
For example you building an API provider for image generation. There are thousands of them in the internet.
I wonder if there are a tool that basically would simulate choosing between your product/service and your competitors one.
It's seems interesting in the context of OpenClaw and rising popularity of agents itself.
Most interesting topics from article:
- Agents ignore most products and pile onto 1-2 "modal" picks
- Sponsored tags hurt selection. Agents penalize ads.
- Position bias flips direction between model versions
- Simple description rewrites moved market share +8–15 p.p.
Last one is the most interesting, i'm thinking that AI-custdevs era is near.
Be prepared - agenprobe.sarm.solutions
reply