> If you want summaries from my website, go to my website.
I will. Through Perplexity. My lifespan is limited, and I have better ways to spend it than digging out information while you make a buck from making me miserable (otherwise there isn't much reason to complain, other than some anti-AI ideology stance).
> I want a way to deny any licence to any third-party user agent that will apply machine learning on my content, whether you initiated the request or not.
That's not how the Internet works. Allowing for that would mean killing user-generated content sites, optimizing proxies, corporate proxies, online viewers and editors, caches, possibly desktop software too.
Also, my browser probably already does some ML on the side anyway. You'd catch a lot of regular browsing this way.
Ultimately, the rules of the road are what they always have been: whatever your publicly accessible web server spouts out on a request is fair game for the requester to consume however they like, in part or entirely. If you want to limit access for particular tools or people, put up a goddamn paywall. All the noise about scrapping and stuff is attention economy players trying to have their cake and eat it too. As the user in - i.e. the victim of - attention economy, I don't feel much sympathy for that plight.
Also:
> LLMs — and more importantly the companies that train and operate them — should not be trusted at all, especially for so-called "summarization"
That's not your problem. That's my problem. If I use a shitty tool from questionable vendor to parse your content, that's on me. You should not care. In fact, being too interested in what I use for my Internet consumption can be seen as surveillance, which is not nice.
I addressed this in a different response: I do not care if your browser does local ML or if there is an extension which takes content that you have already downloaded and applies ML on it (as long as the results of the ML on my licensed content are not stored in third party services without respecting my licence). I do care that an agent controlled by a third party (even if it is on your behalf) browses instead of you browsing.
My goal is to licence my content for first party use, not third party derived use.
Your statement "Ultimately, the rules of the road are what they always have been: whatever your publicly accessible web server spouts out on a request is fair game for the requester to consume however they like" is both logically and legally incorrect in pretty much every single jurisdiction in the world, even if it cannot be controlled as such without expensive legal proceedings.
> > LLMs — and more importantly the companies that train and operate them — should not be trusted at all, especially for so-called "summarization"
> That's not your problem. That's my problem. If I use a shitty tool from questionable vendor to parse your content, that's on me. You should not care. In fact, being too interested in what I use for my Internet consumption can be seen as surveillance, which is not nice.
Actually, it is my problem, because it's my words that have been badly summarized.
If the LLM provides a so-called summary that is the exact opposite of what I wrote (as the link I shared previously shows happens), and that summary is then used to write something about what I supposedly wrote, then I have been misrepresented at best.
I have a moral right to work that I have created (under Canadian law and most European laws) to ensure that my work is not misrepresented. The best way that I can do that is to forbid its consumption by machine learning companies, including Perplexity.
> The moral rights include the right of attribution, the right to have a work published anonymously or pseudonymously, and the right to the integrity of the work. The preserving of the integrity of the work allows the author to object to alteration, distortion, or mutilation of the work that is "prejudicial to the author's honor or reputation". Anything else that may detract from the artist's relationship with the work even after it leaves the artist's possession or ownership may bring these moral rights into play. Moral rights are distinct from any economic rights tied to copyrights. Even if an artist has assigned his or her copyright rights to a work to a third party, he or she still maintains the moral rights to the work.
Of course, Perplexity operates under the Wild West of copyright law where they and their users truly do not give one whit about the damage they cause. Eventually, this will be their downfall, because they are going to find themselves on the wrong side of legal judgements for their unwillingness to play by rules that have been in place for a fairly long time.
I will. Through Perplexity. My lifespan is limited, and I have better ways to spend it than digging out information while you make a buck from making me miserable (otherwise there isn't much reason to complain, other than some anti-AI ideology stance).
> I want a way to deny any licence to any third-party user agent that will apply machine learning on my content, whether you initiated the request or not.
That's not how the Internet works. Allowing for that would mean killing user-generated content sites, optimizing proxies, corporate proxies, online viewers and editors, caches, possibly desktop software too.
Also, my browser probably already does some ML on the side anyway. You'd catch a lot of regular browsing this way.
Ultimately, the rules of the road are what they always have been: whatever your publicly accessible web server spouts out on a request is fair game for the requester to consume however they like, in part or entirely. If you want to limit access for particular tools or people, put up a goddamn paywall. All the noise about scrapping and stuff is attention economy players trying to have their cake and eat it too. As the user in - i.e. the victim of - attention economy, I don't feel much sympathy for that plight.
Also:
> LLMs — and more importantly the companies that train and operate them — should not be trusted at all, especially for so-called "summarization"
That's not your problem. That's my problem. If I use a shitty tool from questionable vendor to parse your content, that's on me. You should not care. In fact, being too interested in what I use for my Internet consumption can be seen as surveillance, which is not nice.