It’s not possible to have a license over an ML model trained on other peoples’ works, since such models are uncopyrightable. They’re more like a phone book; a collection of facts trained by an entirely un-creative process. https://news.ycombinator.com/item?id=36691050
This hasn’t been proven in court, but it seems the most likely outcome.
Not saying that this applies to LLMs but if you describe them as "a collection of facts [collected and] trained by an entirely un-creative process" then it begins to sound like one could argue for Database Right.
Llama 2 is open source-ish. Weights are freely available and can be commercially used, but only if you have less than 700m users and agree to some "don't do naughty things" terms.
Nope. It's limited by 700m monthly-active users at the time Llama2 was released, a weird catch clause for a handful of Meta competitor companies. The license doesn't satisfy OSS requirements, but it is quite reasonable.
If the JSON license isn't considered open, due to requiring that "The Software shall be used for Good, not Evil.", then I don't see how tacking an additional financial threshold onto it makes it more open. I don't think meta even released the training dataset, so you cant even replicate it (should you have the funds to do so).
There are other LLMs that don't have such restrictions, and publish their training data.
Open source is both a colloquial term for available/modifiable/distributable code (like Llama2), and a strict OSI-approved list of licenses. I'd say opensource-ish is a great fit here.
Edit: this is in fact fairly interesting discussion because LLM is a new breed of digital products. Meta's terms are practical for limiting the usage for commercial applications, and they are designed to protect the general population. It's not the worn out "protecting us from ourselves", its actually preventing Llama users from harming non-users. Yes, we can be jaded and say it's about protecting the brand and dissociating from bad actors. My point is that it's hard to apply usual arguments for open source and freedom of computing, when you're defending rights of people who want to harm other people.
Sure there is, BSD is bad because xyz, GPL is bad because zyx. That said Llama restrictions are rather harsh and you are not allowed to improve other models with it. So no freedom there just some ok beer.
From memory llama 2 license does allow tuned models with suitable credit & license inclusion. The restricted using it to train other models though (a bit like people use gpt4 to generate question/answer pairs to train their models)
Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang