Stuff like this is why Cloudflare launched the AI Audit feature and the ability to block "AI bots". We're about to launch a feature that'll enforce your robots.txt.
I’m working on a platform[1] (built on Cloudflare!) that lets devs deploy well-behaved crawlers by default, respecting robots.txt, 429s, etc. The hope is that we can introduce a centralized caching layer to alleviate network congestion from bot traffic.
I love the sentiment, but the real issue is one of incentives and not ability. The problem crawlers have more than enough technical ability to minimize their impact. They just don't have a reason to care right now.