I agree with the conclusions of the article, but I'd add that people see search and chatGPT differently because search ultimately drives traffic to their own site - it's almost like a kind of advertising, while LLMs learn from their content and don't give anything back. They arguably could remove human traffic from your site if people don't need to go there to get the information.
I see what LLMs are doing as completely legit, it's not copying, it's more like getting a small nudge towards learning something when they scrape your site, but it's not analogous to search.
> search ultimately drives traffic to their own site
Google is notorious for having slowly whittled away traffic in that the constantly are adding “tools” at the top of their results to give users answers without having to click through.
More egregious Google uses their data to launch or acquire competing businesses in industries that have high value keywords so they can drive up the cpc of those keywords. Often times Google gives their own businesses built in tools at the top of search once again diverting traffic (example: Flights).
Even when they drive traffic to sites it’s because Google dominates the market and therefore SEO, so businesses are always jumping through Google’s hoops and using their services further entrenching them as the market leader.
Ultimately, I’m not sure if LLMs/chat gpt will make this better or worse, but maybe for once content creators can actually get paid for their content if it’s used to train the LLMs (but doubtful it’s a problem in tech not limited to Google).
I’d even add that LLMs not only give nothing back to content creators it’s also taking from them. Thieves had to do actual work to steal ideas from people in the past so the barrier to entry was high. Now I can have an LLM plagiarize people and run it through a synonym tool and publish it without doing much of anything.
Arguments like these are an example of why we need to be reminded of the principle of copyright: to protect artists and creatives from having to compete against themselves in the form of someone simply taking their work and profiting from it as if it were their own without fair compensation or consequences.
No one cares that Google is scraping their website because Google isn’t competing with them, or at the very least not using said site content to compete with them.
Well, this is a pretty meaningless argument. People let Google in because Google actually drives revenue for publishers. Does it look to you like ChatGPT cites every single source where it got its information from? It does not.
I'll also add that the author is being pretty ignorant/infuriating with his opinions and I wouldn't be surprised if this gets flagged just on these merits alone.
At the same time, no human author cites all their sources, either; they're not even consciously aware of all the other materials they've blended together from their lifetime of reading and experience to form the new work.
I think there's a clear distinction between scraping data that potentially results in your traffic to your product, vs. scraping data that potentially results in a competing product.
I see what LLMs are doing as completely legit, it's not copying, it's more like getting a small nudge towards learning something when they scrape your site, but it's not analogous to search.