Quite uninteresting to read as the article does not go into any depth and it feels simply like the "hacking agent" also wrote the blockpost. Learned nothing
I'm just trying to understand licenses, but doesn't the choice of MIT contradict the inital "non-commercial purposes" as MIT says 'including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software' - Therefore, the non-commercial purposes is actually void and I can use the software to the limits of MIT defines? And because it is already MIT, they can relicense only future software but not this piece anymore?
One is the MIT license does not prohibit selling. And wrapping it in a "for non-commercial uses" clause creates a contradiction difficult, if not impossible to enforce.
> OpenAI is committed to protecting our customers with built-in copyright safeguards in our systems. Today, we’re going one step further and introducing Copyright Shield—we will now step in and defend our customers, and pay the costs incurred, if you face legal claims around copyright infringement. This applies to generally available features of ChatGPT Enterprise and our developer platform.
So essentially they are giving devs a free pass to treat any output as free of copyright infringement? Pretty bold when training data sources are kinda unknown.
For large-scale usage, it doesn't matter what the devs want. If the lawyers show up and say "We can't use this technology because we're probably going to get sued for copyright infringement", it's dead in the water.
It's a logical "feature" for them to offer this "shield" as it significantly mitigates one of the large legal concerns to date. It doesn't make the risks fully go away, but if someone else is going to step up and cover the costs, then it could be worthwhile.
For large enterprises, IP is a big deal, probably the single biggest concern. They'll spend years and billions of dollars attempting to protect it, cough sco/oracle cough, right or wrong.
To add to this, indemnification of this type is pretty standard in sensitive fields. One example I recall was closed captioning providers. If they caption some content incorrectly in a way that exposes their customers to legal action they guarantee that they will take the blame and they have insurance specifically to handle any settlements based on legal actions.
I would expect this is a critical piece for medium to large enterprises that want to adopt LLMs. There are organizations for which this kind of indemnification isn't a nice to have, it is a requirement before even considering a product.
The investors will only get their 1000x if OpenAI can convince people its risk free to use. So they'll happily cover the legal battle to prove it or spent every last company penny trying
They are my content, actually, from the last ~15 years of being on the internet. I don't care about it personally, and even if I did it is really obviously fair use so even if I find it objectionable I don't get to actually legally compel someone to stop.
What on earth is fair use about a public company deriving its whole valuation from the processing of content taken from the internet without any regard for licensing, or robots.txt rules??
The technology is cool, I get it. But saying ”I don’t mind, they can use my content“ is on par with ”I don’t need privacy, I have nothing to hide“ in terms of statement quality.
I can see a specific argument you are making as something along the lines of, just because some people are ok with their content being used by large corporations to train their for-profit LLMs doesn't mean it should be ok for those companies to take my content and use it to train their LLMs without my permission.
But, I suppose I see it the other way as well. Just because you don't want large corporations to train their LLMs using your content doesn't mean that society has to settle on making it illegal. As an imperfect analogy: just because some people don't want to have their picture taken when they are out in public doesn't mean that taking pictures of people in public ought to be illegal.
So I think we have to get passed the "I don't like this, so it is evil" kind of thinking. As in the analogy to pictures of people in public, there is some expectation of privacy that we give up when we enter out into public. Perhaps there is some analogy there to content that we freely release into public. Perhaps we need stricter guidelines on LLM attribution. I don't have an answer, but I'm not going to allow this decision to be de facto made by the strong emotions of individuals who have already made up their minds.
Approximately all of that content becomes significantly more useful to society when fed into blender and released as ChatGPT, as long as the latter is generally available, and even accounting for it being for-profit, and any near-term consequences of propping up an SV company. By significantly I mean orders of magnitude, and that's going by the most naive take method of dividing utility flowing from ChatGPT by the size of training data.
So yeah, it may not be ideal, it's also of general public interest so much, that bringing up copyright seems... of poor taste.
(Curiously, I don't feel the same about image models. Perhaps that's because image models compete with current work of real artists. LLMs, at this point, don't meaningfully compete with anyone whose copyright their training possibly infringed.)
> This isn’t a ”copyright risk“, it’s a Silicon Valley corporation getting away with declaring copyright just… obsolete.
While this does not fully represent my views on what's a very complex issue, since you phrased it like this, I feel compelled to say: about damn time someone did it.
I am not a lawyer, but this doesn't seem quite "free". Note that they aren't indemnifying customers for any consequences of said legal claims, meaning that customers would seem to bare the full brunt of those consequences should there be a credible copyright infringement claim.
But it does guarantee that any customer that can’t afford a big legal team uses their big legal team, reducing the chances of a bad (for them) precedent caused by an inept defense.
It also discourages predatory lawsuits against small users of their API by copyright trolls, which would likely end up settled out of court and not give them the precedent they want.