While I don’t disagree with the idea that all crawlers should have equal access, we also need to address the quality of many crawlers.
Google and Microsoft have never hammered any website I’ve run into the ground. Crawlers from other other, smaller, search engines have, to the point where it was easier to just block them entirely.
Part of the problem is that sites want search engine to index their site, but not allow random people just scrapping the entire site. So they do the best they can, and forget that Google isn’t the web. I doubt it’s shady deals with Google, it’s just small teams doing the best they can and sometimes they forget to think ideas through, because it’s good enough.
I think this is a problem which should be solved by automatic rate-limiting and throttling at the application/caching layer (or just individual web server for smaller sites). Requests with a non-browser UA get put into a separate bots-only queue that drains at a rate of ~1/sec or so. If the queue fills up you start sending 429s with random early failures for bots (UA/IP/subnet pairs) that are overrepresented in the traffic flow.
I don't know if such software exists, but it should. It would be a hell of a lot healthier for the web than "everyone but Google f*ck off", and it creates an incentive for bots to throttle themselves (as they're more likely to get a faster response than trying to request as fast as possible).
I suspect that at least some of the bots use web server response times and response codes as part of the signal for ranking. If your website does not appear capable of handling load then it won't rank as highly, because it is not in their best interests to have search results that don't load.
Google and Microsoft have never hammered any website I’ve run into the ground. Crawlers from other other, smaller, search engines have, to the point where it was easier to just block them entirely.
Part of the problem is that sites want search engine to index their site, but not allow random people just scrapping the entire site. So they do the best they can, and forget that Google isn’t the web. I doubt it’s shady deals with Google, it’s just small teams doing the best they can and sometimes they forget to think ideas through, because it’s good enough.