I'm sure what you can share is limited, as I'm guessing this is cat and mouse. That being said, is there anything you can share about your implementation?
We’re working on a bot filtering system that blocks all non-browser traffic by default. Alongside that, we’re building a directory of verified bots, and you’ll be able to opt in to allow traffic only from those trusted sources. Hopefully shipping soon.
Verified bots? You mean the companies that got big reading your info so now you know who they are, but not allow any new comers so the people that were taking the data all this time get rewarded by killing competition for them. lol.
You have it exactly right sans the reason to allow them in the first place. They're bots that provide reciprocal value to the site owner. Otherwise why even bother letting them through.
It's wild how people don't get that facebook and googlebot gets let through paywalls and such because they bring the site real tangible revenue. If you want to get the same privileges you have to start with the monetary value provided to the sites you index. Lead gen is hard and major search engines provide crazy value for next to nothing.
Do they? AI bots provide me with nothing (best case scenario) or giving my content in their pages without "read more" links thus lowering my number of visitors.
Search bots, and specially Google, provide my site a lot of value. They respect the robots.txt, I can see that about half my visits come from search, they identify properly as bots. It's almost impossible to notice a search bot in the graphs.
But AI bots suck. They don't even read the robots.txt, they hit the site as hard as it can hold, when they receive a 5xx, a 444 or a 426 they interpret it as "keep requesting hard until you get a 200", they can easily DoS or bankrupt a small site, they use fake user agents. As the OP post shows, their activity can be clearly seen in the log graphs as huge spikes coming from a single client. OpenAI scanned 100% of one of my sites (more than 20,000 individual pages) in two days causing intermitent DoS, while the Google is at 80% of the sitemap.xml. And cherry on top, I still can't see a single visit in my logs that come from their services.
I think you might be confusing search bots with AI bots.