This statement from Microsoft is just asking for a copyright infringement lawsuit because the courts have been very clear that web "content" is copyrighted unless it is explicitly placed in the public domain or old enough to no longer be under copyright.
Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI. This would make it easier to file lawsuits against Microsoft and others of their ilk who think they can train their AI with other people's work without fair compensation.
> Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI. This would make it easier to file lawsuits against Microsoft and others of their ilk who think they can train their AI with other people's work without fair compensation.
I see no reason to expect that this would alter or achieve anything. The wide-scale machine learning that’s been happening is entirely dependent on fair use exemptions from copyright. They’re not using it under your license—in fact can’t, current machine learning techniques and open source licenses already make it fundamentally impossible for them to comply—so what you put in it should be completely irrelevant.
No, if the fair use exemption is ever struck down, the entire field is dead in the water until (a) a change in the legal system, or (b) services like GitHub start demanding an additional license as part of their terms of service for the purpose.
No one would let AI get shut down in the US, there’s just too much at stake. Even if we don’t like what’s going on, we’ll take a measured approach in regulating, because otherwise it will just go overseas.
Does GPL does this already? Doesn't it already say that code derived from GPL code should be GPLed? So does that include any code produced by an LLM based on GPL code ?
That would seem to be a logical implication assuming courts reject claims that "everything on the internet is public domain" or that training an LLM on copyrighted material constitutes "fair use" of the copyrighted material.
I suspect it would technically be infringement even for MIT licensed code because the original author's copyright notice would presumably be missing.
Any such lawsuit would be settled out of court, with no admission of guilt, and no damaging information coming out via introduction into public evidence.
Copyright exists, immediately upon creation (not publication) of a work.
It's different from trademark, in that practical applications, enforcement, registration, etc., does not invalidate the copyright.
Copyright can expire, which then becomes, effectively, "public domain."
Registering a copyright doesn't create the copyright. It simply makes it easier to go after those that disrespect it.
I'm pretty sure that the only way to truly transfer the ownership of copyright of a work, is to have agreements in place, before it is created (like "work for hire" contracts).
As a creator you can also explicitly dedicate a piece of work to the public domain, thus relinquishing any copy right to it. That’s what licenses like CC0, WTFPL, and The Unlicense do.
However, even being in the public domain does not in itself mean you can do everything. For example, in France you still have to respect the “moral rights” of the author, meaning you have to include their name and original title.
The "moral rights" in France and Germany, or the "Urheberrecht" in Germany and Austria and others in Europe prohibit even the creator to put things in the "public domain" to the full extend. There are pro and con debates about this, of course.
In the "monkey selfie" case, the monkey lost, and lost hard. Probably because PETA behaved like ... PETA ... They footgun themselves constantly, by acting way too extreme.
I am not sure what you are getting at, all property rights are made up agreements, as is what is defined as property, what can be privatized and what rights that affords you.
Take tangible land, your exclusive use of it has boundaries, for example airspace rights or mineral rights. It is all made up.
The difference is tangible v intangible, but in either case the rights are made up.
What is it with this new ontological wave on the Interwebs? For a mathematical axiom, do you need another axiom that tells us that the first one exists? And so forth?
How would you prove the existence of the universe? Do we not need a bigger universe that contains ours? And so forth? (Don't mention the big bang, which is a bunch of non-falsifiable formulas.)
Who has gotten jail or prison time for copyright violations in recent times?
I’m aware of recent cases in Canada where defendants chose to ignore a court ruling and attempt to republish very similar material as what the court had originally found them to be in copyright violation for. They were then found to be in contempt of the court which is a criminal offence and then ordered to complete jail time and pay substantial fines.
Copyright violations are not criminal offences in countries I’m aware of. Please tell me of any cases where a copyright violator faced jail time for the copyright violation and not for related criminal offences.
> Swedish prosecutors filed charges on 31 January 2008 against Fredrik Neij, Gottfrid Svartholm, and Peter Sunde, who ran the site; and Carl Lundström, a Swedish businessman who through his businesses sold services to the site. The prosecutor claimed the four worked together to administer, host, and develop the site and thereby facilitated other people's breach of copyright law.
>Ontologically, copyright doesn't exist. Copyright is an epistemology.
You keep using these words, ontology and epistemology. I don't think they mean what you think they mean.
>If copyright could exist, then a copyright for the copyright must be able to exist, and it'd be turtles all the way down
This doesn't make any sense.
First, not all things that exist are covered by copyright or have a copyright about them existing (air exists, but doesn't have a copyright. Neither do slugs, pebbles, Uranus, and other existing things).
Copyright is just sets of laws dictating ability to copy, distribute, and so on. It doesn't need a copyright for itself, and even if it did, the regular terms for reproducing any other legal code would suffice.
>Copyright, as intellectual property, is entirely made up as all other intellectual property is.
All human laws and conventions are made up. Doesn't mean anything - copyright is still enforceable with very real prison buildings, cells, and bars - and if resisting arrest for it, very tangible police battons, tasers, and bullets are not out of question either.
I bet if Microsoft were not extracting value from someone else's content, but instead had their content being used to power someone else's business, they'd be singing a very different tune.
Without trying to take a stance on this, I do have to say I like the FastGPT feature that comes with Kagi. It basically does a search and uses those results to answer questions.
Now I'd just want it to have a better UI with history and some sort of notebook mode instead of chat. I'm not sure how, but I don't want to chat with AI, I want a different way to 'instruct' it.
I intend to use Mustafa Suleyman's likeness and name for my next project. It's part comic book/part novel and tells the story of a socially awkward tech CEO getting way out of his comfort zone by moonlighting as a male porn star. It ends with an OJ Simpson style police chase when it's discovered that Mustafa has been embezzling funds to support a drug habit and addiction to plastic surgery.
> But that means torrents of Windows are freeware!
For many, many years now, if you need Windows you can just download it from Microsoft and run simple, non-intrusive activation procedure (not from Microsoft) after installation. No cracks needed. As much security as hip high front porch gate.
So even for MS the understanding was that these things are de facto freeware for anyone that wants them at all.
Feel free to start a business selling computers with pirated copies of Windows and Office pre-installed, or build out a corporate network or cloud service with them, and find out first-hand how much Microsoft really considers their products to be "de-facto freeware".
Not quite. They are trying to build a business around AI and that they spent heaps of money to build and train. The free stuff serves the same purpose it does for all people on the planet, as examples of things that exist.
Conveniently ignoring that you may be sued into oblivion if you have enough money to make it worth it for them. Come on. Windows is only free for people not making significant amounts of money with it. If you do make money... surprise: https://www.bsa.org/
Your assertion that Microsoft allows everyone to use Windows for free is false. What you care or not care about is irrelevant in this context. I have no clue why you brought it up.
Now if you wish to assert that Microsoft allows peons to use Windows for free, as long as it is convenient for them, I can agree with that. They're still a bunch of hypocrites.
Allowing Microsoft to selectively apply the law as it benefits them is not a good thing, you're confused.
If you do not allow people to do something and yet hundreds of millions of people on Earth do it and you do as much as I described to prevent it then you are de facto allowing it. Same thing the MS guy said. Whatever's published on a website is de facto freeware. "no copyright infringement intended". That's how it works outside of lawyers offices.
Commercial policy is not the law so MS can be as hypocritical as they want. I'm happy that their hypocrisy is going in the right direction this time.
I agree, so please Microsoft shut you mouth if I grab your maps, wrap your services and so on, because they are web-based so I am free to do whatever I like with them, relevant licenses does not count.
Why not, if they want my data to train their LLMs why not doing the inverse with their, for business as they do on their own side. If for them all public stuff is free for commercial use...
If you provide content you created online for free, that content is now freeware.
If someone provides content that they didn't create that still has copyright restrictions in real life, that isn't freeware.
It's like all the photos uploaded to Facebook and Instagram are now free to use however the downloader wants (and Meta as well of course). It's true. But people don't like it.
> Don’t blame us, the Torment Nexus is established practice!
Well, it is. And I for one, am absolutely delighted that some people with money finally have an incentive to accept that after three decades of copyright death throes.
I think saying it links its sources is a bit of a stretch. It links related articles which may or may not be the source for what it just said (also, may or may not be related :P)
What tools are you thinking of? I think saying it links its sources is absolutely not a stretch. My experience is with Kagi and with Perplexity; both of which it has even returned messages saying something along the lines of the source documents not being able to answer the question.
Copilot doesn't link to the sources. It doesn't really know what their sources are. It links to article that may be related to what it just said. Many times the sources even contradict what was said. So definitely not a source in that case
Now that we have established that Microsoft information wants to be free, my next project is wget.ai:
wget.ai is a sophisticated real time LLM that trains itself while downloading "content". Like any LLM, it predicts the next output token (byte in this case) based on the statistical training. wget.ai is run at temperature zero. In this revolutionary setting it has arrived at the conclusion that the most likely output byte equals the input byte!
Armed with this theorem, wget.ai can transform and replicate a Windows 11 download in real time. No copying is involved, the advanced algorithms happen to arrive at input == output.
Users of Windows 11 can download activation keys (freeware) from the Internet.
Yes, laptops without a windows license are pretty popular in at least some poorer countries. Most buyers install windows anyway and activate it via massgrave and friends, which lets you save 40 to 100 USD, which is a pretty big deal.
GNU/Linux, ChromeOS (Google GNU/Linux), Android (Google Linux), MacOS, iOS (and iPadOS is a different thing, right?) Are almost certainly collectively more popular than Windows. Even as a primary / exclusive computer. I think a lot of people are able to not pay for Windows 11 in $CURRENT_YEAR, probably most.
Each Windows version has regressed from Windows 7 onwards. To the point that Windows 11 can almost be construed as malware. I'll be using Ubuntu henceforth.
Windows "Teletubby Edition"? :-) No, Win2k was "peak Windows", imho.
Frankly MS later ditched the quite ambitious Windows NT 5.0 project, which was the planed Win2k successor, for a Frankenstein monster made out of the super buggy WinME and Win2k. That became Windows XP.
Coming from Win98, Win98SE, WinMe, WinXP was for sure quite good. But compared to the super stable, fast, and well structured Win2k it was quite a disappointment. It didn't have almost any of the advanced features planed for WinNT 5, it was much more unstable and buggy than Win2k, it was quite chaotic with "old Win95" parts, stuff coming from Win2k, and some things on it own placed randomly.
I like the fact that I can now reproduce any Microsoft content without paying for it. Cheers!
Incidentally, some AI chatbots do link to their sources. And it is a good idea to make that an explicit prompt if you're using one that doesn't. It's also worth prompting for how recent their information is.
I would argue that if I ask ChatGPT something, it doesnt "reproduce" what was written on certain website (or at least it shouldn', without attribution). It takes what it scrapped before and re-tell it in its own words. That isn't reproducing, looks like a grey area not yet addressed in copyright laws.
I would partially agree with the guy, that yes, that was a social contract since 90's, but before the AI era. Back then this use case wasn't anticipated.
Imagine training a LLM vs a group of people from birth on wrong information. The LLM will unquestionably just repeat in "its own words" the wrong information, whereas the group of people will of course believe some of the wrong stuff, but they will also doubt a lot of it as well.
You could say that an LLM is just not good enough yet so the comparison isn't fair. In other words that people are just even more LLM'ing than the LLM, but there simply is no mechanism for an LLM to go from wrong information to right information.
People on the other hand will always doubt, hypothesize, and compare and contrast whatever information they have to at least attempt to form correct answers from correct information. This in a sense is because they actually have their own words.
There is, as of today, never been a smart or creative thing an LLM has ever said that doesn't literally come from other people's words. If LLM's are smart, it's because people are smart.
There’s nothing ambiguous from a copyright perspective, it’s a derivative work. People seem to confuse plagiarism in an academic environment from copyright. Simply using your own words doesn’t mean you’re free from copyright.
However even when something infringes copyright that doesn’t mean anything necessarily happens. Just look at YouTube’s early history or the mountains of fan fiction out there.
But something did happen. Viacom and others sued them, and then YouTube introduced their Content ID system so that they could pay copyright holders for content that others uploaded, as well as to take down videos belonging to copyright holders that did not agree to other people uploading their content.
Yes, it took 2 years after creation and truly massive amounts of copyright infringement before the lawsuits by copyright owners showed up. OpenAI is getting sued, but don’t expect your requesting a website be rewritten to provoke anything unless you publish such rewritten posts at scale or something.
> However even when something infringes copyright that doesn’t mean anything necessarily happens. Just look at YouTube’s early history or the mountains of fan fiction out there.
This part is talking about uploading a copy of something verbatim, the way I read it.
Last time I used Copilot, the "sources" often didn't support what it said and it seemed like they were obtained by adding search results from feeding the answer into Bing after it had already been generated.
And there were of cause tons of SEO slop links among them.
I asked ChatGPT for sources and they were impossible to determine if they were real or not. It'd cite things like "Sky and Telescope magazine" no edition, no page numbers no year, just a vague unverifiable citation
>I like the fact that I can now reproduce any Microsoft content without paying for it
Only if you have the same quality lawyers and financial backup to support them to get you off like MS has. Else what applies to MS doesn't apply to you :)
You are probably joking, but that is literally what MS said, they don't even hide it. A quote from the register: "Suleyman (Head of MS AI) did allow that there's another category of content, the stuff published by companies with lawyers." (https://www.theregister.com/2024/06/28/microsoft_ceo_ai/)
Has this become any better? Every time I asked ChatGPT for sources it makes up papers, with fragments of real paper titles and topically related authors. The supposed paper itself though can't be found anywhere.
DRM and paywalls for thee, industrial-scale scraping for me. /s
It's time for us to build our own miniature versions of Internet Archive with the content that is personally important to us . The powers that be will take it down under the guise of defending copyright, while the bigcos continue to suck up every letter of every page that has a publicly available URL.
I find it good that the concept of IP is collapsing, but this shows clearly the corporate dishonesty around it. For decades corporate sites and APIs have pushed all sorts of illegal EULAs and ToSs in attempt to e.g. ban scraping. Now suddenly all of this is scrapped, with of course no explanations given as to why.
IP isn't collapsing for anyone with the means and connections to enforce the law. Microsoft is essentially pickpocketing the peasantry, while steering clear of the feudal lords like Netflix and Google, who can hit back.
I think you may be interpreting the word “relevant” in a different light than the person you’re replying to.
It reads to me as if you’re saying physical media is important for humanity as a whole and the preservation of knowledge, while your parent comment is saying physical media is no longer significant to individual consumers because it’s not their preferred method of consumption.
Until that eBook inevitably gets uploaded to a piracy site. The implication is that if a web crawler can find it anywhere then it's fair game, regardless of provenance.
Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI. This would make it easier to file lawsuits against Microsoft and others of their ilk who think they can train their AI with other people's work without fair compensation.