Google is the kingmaker and basically a monopoly on search.
If you're going to light up dollars to share with a bot, let it be with google, who on a lucky day might decide you are king (because you let them index your site)
I would presume and assume google obeys the robots.txt mandate as well.
But I would agree it's a very outstanding and real problem that is YC-worthy - sharing structured webpage data with trusted partners in a generic and efficient way. I've heard about various AI companies that perform such data scraping and structuring with AI, forget the name - this is many notches in sophistication above a Selenium-headless type driver. If only html were made into a model-view-controller neatly and users were let to bring their own views & controllers.
Are there specific laws that deal with rate limits? Honest question - I get that something too fast could be considered DDoS, but so long as it’s below a certain threshold wouldn’t it be okay (not sure how said threshold would be determined)?
In the US, CFAA prohibits causing "damage", which includes "impairment to the integrity or availability" of data or systems. But as with many other things in law, it boils down to the court trying to assess your intent, whether you could've reasonably anticipated the outcome, and what that outcome ended up being.
There's no law that says "you can't send more than n packets per hour".