Are you a real person ? Obviously there is more to it than that. Just because you’ve seen Mickey Mouse and can draw you should get to sell anything you can make with his likeness ? Or record any song you’ve heard ?
Obviously there is some gray area in the training conversation but let’s not pretend that these content owner arguments are baseless just to push progress at all costs ahead.
I get his point, intellectual "property" boundaries could be limited to public or common benefit (I guess to extend the current interpretation of "fair use").
If we strictly interpret intellectual property, we couldn't have platforms such as Google Search (and some people actually think like this, like News or Images websites).
For now, I guess the main priority could be to fight patents, and software patents in particular. This system is completely obsolete and prevent innovation.
Imagine if Google patented the LLMs and decided to do nothing with them, or if OpenAI said "ok nobody can create LLMs based on Transformers except us".
Even more when it goes around medicine.
Today they patent the blood oxygen sensor, tomorrow it will be the glucose sensor in the Apple Watch.
I don’t know enough of biology or genetics or evolution, but
surely the millions of years of training that is hardcoded into our genes and expressed in our biology had much larger “training” runs.
This is a negotiation tactic by the NYT to drive up the licensing price. Period.
The Napster/Music Industry analogy has no resemblance to this situation.
The only meaningful question that might be answered as a result of this is, what permission and access rights do crawlers have to content that is publicly and legally available.
Right. If a publisher found a specific Xerox machine was being used to copy and commercially distribute a book, in violation of copyright, they'd ask for an injunction on the person doing that. With OpenAI, the NY Time can see their copyrighted material on both the input (training) side and distributed output (generated) side of a specific LLM implementation. So they cry foul on OpenAIs actions, not LLM in general.
There appears to be an open question about if the LLM can freely ingest copyrighted material and output it verbatim without violating copyright. That seems like an obvious "no" to me, unless we decide that LLM has special treatment.
Also the use of the content as per provision on the web.
NYT is paywalled - you have to agree to a license to access it, there are exclusions in that agreement that I don't understand but I think may be important in this discussion!
This wasn’t obvious, but it seems likely when you put it that way.
Unlike other iconic company/founder origin stories OpenAI really felt like they hit a special team dynamic that was on the verge of some equally special.
In light of this OpenAI still feels like they will be a relevant player, but I’ll be expecting more from Sam and Greg.
It happens everywhere and with everyone. I wouldn’t read into it that much. Personally I find the lack of capitalism more sincere and I see everyone at every level on the totem do it.
yes i was consfused as well as i read more and more of that endless rant.
but it turns out for me the subject itself was interesting food for thought. not the article
Obviously there is some gray area in the training conversation but let’s not pretend that these content owner arguments are baseless just to push progress at all costs ahead.
His point stands.