But many people feel that the very act of incorporating your copyrighted words into their for-profit training set is itself the bad behavior. It's not about rate-limiting scrapers, it's letting them in the door in the first place.
Why was it OK for Google to incorporate their words into a for-profit search index which has increasingly sucked all the profit out of the system?
My Ithaca friends on Facebook complain incessantly about the very existence of AI to the extent that I would not want to say I ask Copilot how to use Windows Narrator or Junie where the CSS that makes this text bold or sometimes have Photoshop draw an extra row of bricks in a photograph for me.
The same people seem to have no problem with Facebook using their words for all things Facebook uses them for, however.
They were okay with it when Google was sending them traffic. Now they often don’t. They’ve broken the social contract of the web. So why should the sites whose work is being scraped be expected to continue upholding their end?
Not only are they scraping without sending traffic, they're doing so much more aggressively than Google ever did; Google, at least, respected robots.txt and kept to the same user-agent. They didn't want to index something that a server didn't want indexed. AI bots, on the other hand, want to index every possible thing regardless of what anyone else says.
> Why was it OK for Google to incorporate their words into a for-profit search index which has increasingly sucked all the profit out of the system?
It wasn't okay, it's just that the reasons it wasn't okay didn't become apparent until later.
> The same people seem to have no problem with Facebook using their words for all things Facebook uses them for, however.
Many of those people will likely have a problem with it later, for reasons that are happening now but that they won't become fully aware of until later.
Sure. But we're already talking about presumption of free and open here. I'm sure people are also reading my words and incorporating it into their own for-profit work. If I cared, I wouldn't make it free and open in the first place.
But that is not something you can protect against with technical means. At beast you can block the little fish and give even more power to the mega corporations who will always have a way to get to the data - either by operating crawlers you cannot afford to block, incentivizing users to run their browsers and/or extensions that collect the data and/or buying the data from someone who does.
All you end up doing is participating in the enshittification of the web for the rest of us.