Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A fully open-source (Apache 2.0)implementation of llama (github.com/lightning-ai)
158 points by osurits on March 28, 2023 | hide | past | favorite | 52 comments
We believe that AI should be fully open source and part of the collective knowledge.

The original LLaMA code is GPL licensed which means any project using it must also be released under GPL.

This "taints" any other code and prevents meaningful academic and commercial use.

Lit-LLaMA solves that for good.




I think implying that GPL is not "fully open source" is a hot take. It's specifically designed to ensure you and anyone you distribute your code gets the same freedoms. Maybe you don't agree that it's a good license but that is its intention. GPL vs BSD-type licenses I guess is decades long argument by now.

Maybe I'm a naive idealist but IMO the GPL-family of licenses are underrated. You can use them to make sure you don't work for free for someone who won't share their improvements.

I liked the choice of AGPL for AUTOMATIC1111 Stable Diffusion web UI. (https://github.com/AUTOMATIC1111/stable-diffusion-webui)

Commercial interests are very allergic to AGPL which ensures the project stays community-run and new features and fixes will prioritize the most ordinary user doing things for fun.


I think OP mischaracterized the issue with the license, its more that the weights don’t fall under the same scope. They’re research use only, no commercial use allowed.


Yeah, this is weird because there are plenty of open source implementations of the lama model on github - alpaca.cpp in c++ is one, there are many others in PyTorch such as the one used by ChatLLaMa. But without the weights they're not very useful (unless you're going to try to train it yourself - good luck with that unless you've got a lot of compute power available).

A quick check on github and I find this one also with an Apache license: https://github.com/chris-alexiuk/alpaca-lora and alpaca.cpp with an MIT license: https://github.com/antimatter15/alpaca.cpp/blob/master/LICEN...


Not sure, but I think the point was that if you have something in GPL license (like the code in this case) it's open source, but that doesn't mean you can use that for your business application. That's because GPL requires you open sourcing all derivative work and most businesses don't want to/can't do that.


The AI ecosystem is almost entirely Apache 2/MIT/BSD, and GPL is just incompatible with it.

This is a blocker to mixing and matching, a simple Apache 2 rewrite fixes that problem.

Weights? It’s another issue but we’ll be looking forward to fixing that too.


How is it incompatible? You can use code under all of those licenses in a GPLd work.


You can’t close down GPL code as you see fit. Irrevocable openness is scary for AI community, it seems.

Telling…


Does AI output constitute derivative work of the AI? Would be interesting…


> They’re research use only, no commercial use allowed.

if you agreed to Facebook's contract. If you haven't, it's either public domain or copyright infringement.


As far as I'm aware the GPL/BSD license argument is basically dead now and people just use whatever. In retrospect it seems to be less an argument about whether or not copyleft clauses are bad and more to do with Berkley not wanting to deal with RMS.

>Commercial interests are very allergic to AGPL which ensures the project stays community-run

Mostly because AGPL is not a Free license unless you take great pains to build license compliance into the program that you ship. If you don't do this, then people who want to modify your code need to first build the license compliance mechanism before they can do anything else. This is not how any other Free license works. And compliance is not always obvious, either. Hector Martin has documented a few different cases of terrible AGPL uses. My favorite is an Ethernet PHY[0], which practically speaking cannot offer AGPL source in the way the license intends. AGPL only works for one particular use case, which is web[1] applications written in an interpreted language that can introspect its own source code. So Perl, PHP, and Python to varying degrees.

Also, let's keep in mind that Stable Diffusion's weights are licensed under a moderate copyleft with a morality clause - CreativeML OpenRAIL-M. Morality clauses are incompatible with all flavors of GPL, and the "program" clause in GPL is vague enough to encompass the model weights. At least, assuming that the model weights are copyrightable, which they might not be. Morality clauses are also non-free, though I'll settle for "don't use this for political disinformation campaigns or porn" over "pony up for our hosted API where we can enforce new morality clauses whenever we like".

If you want a no-corpos license, then don't use a license at all[2]. Non-commercial clauses will also work since they effectively confer no rights[3]. Keep in mind that anyone who can gain sufficient copyright interest in the code can sue, and that AI art tends to be a bottomless well of scenesters. I'd rather not subject ordinary users to legal risk, though.

If you want a "service provider loophole-proof" license, use the OpenWatcom License. It is far less ambiguous and has a reasonable compliance path: if you use the software you have to publish source. Period. It's simple, it does what the AGPL set out to do, and people would use it if it wasn't for Stallman saying this:

> This is not a free software license. It requires you to publish the source code publicly whenever you "Deploy" the covered software, and "Deploy" is defined to include many kinds of private use.

This sounds like a fixable problem: just make the clause only trip on modification, so that if you use a modified version privately you have to publish those modifications, but unchanged software doesn't have to be published. Someone hosting unmodified versions of the software isn't a threat to software freedom, and we consider Freedom Three more violable than Freedom Zero - that's why we tolerate GPL and why AGPL was drafted. But as far as I'm aware such a license does not exist and the few people interested in Extremely Strong Copyleft just use AGPL despite its flaws.

[0] https://social.treehouse.systems/@marcan/110038008055623292

[1] Hector Martin has also posited working around the AGPL's requirement to provide source on network access by putting the web app behind a reverse proxy that hides the source. I am not willing to test this by getting sued by the Mastodon developers.

[2] The various Silly Licenses might work as sufficient corporate deterrent insamuch as a court is willing to disregard them.

[3] Specifically, there is no copyright definition of noncommercial use, and most copyright laws assume that the mere utility of the work in question is inherently commercial. There is no "as long as they aren't making money off of it" license because not having to pay for the work is considered making money off of it.

To be pedantic, Creative Commons -NC does state that filesharing is non-commercial, so that can be interpreted as a "BitTorrent only" license clause.


FYI, there's something fishy going on in this thread. Multiple people from the LightningAI team theaniketmaurya (developer advocate for Lightning AI) and rasbt (developer at Lightning AI) are shilling for this post without disclosing their affiliations. The account that submitted this (osurits) also only has two comments, also with the same behavior.

Having interacted with the Lightning AI team in the past, this is unsurprising behavior.


If you suspect vote manipulation, email hn@ycombinator.com. Dang is good about replying to email and he has server-side logs available for more investigation.


IANAL, but this seems very fishy to me: 1) I don't understand how this isn't a derivative work of the original code, as I very highly doubt you've done a clean room implementation. I doubt this would hold up in court.

2) Doesn't the original FB license also apply to the weights? Just re-implementing the code would not change the license on the weights. So while THE CODE may now be re-licensed, the weights would still fall under the original license.

I'd love if someone with more legal understanding could shed some light on this.


1) I've looked at both codebases and this one is definitely a derivative of the nanoGPT. You can compare all three implementations yourself as they are actually surprisingly compact and readable.

2) The issue whether weights are copyrightable at all has not been settled yet. If they are, there is a fair use doctrine that allows transformative works of a copyrighted work. The line is a bit blurry but consider Cariou v. Prince case[1] where addition of colour to some black and white photos was considered enough to be transformative. Similarly, full fine tuning on current news or adding visual modality could potentially create a brand new model in the eyes of a law.

[1] https://cyber.harvard.edu/people/tfisher/cx/2013_Cariou.pdf


>I don't understand how this isn't a derivative work of the original code

The original code is Apache 2 licensed. Derivatives are fine and allowed. This retains the same Apache 2 license as Facebook's code.

It's only the model that isn't covered by that permissive Apache 2 license. A model produced by a derivative of the permissively licensed code, or even by the original code itself, is not a derivative or the original non-permissively licensed model produced by the original code and is non-infringing even if it is a bit-perfect replica.

> Doesn't the original FB license also apply to the weights?

Again, there are different licenses for the code and the model and neither license actually applies to the weights within the model only the actual exact model. If this project produced a bit-for-bit replica of Facebook's model it would still not infringe on that model's license.

But it doesn't produce a bit-for-bit replica. Even if Facebook were to re-run their same training code on their same hardware would they could not produce the exact same weights as before since massively parallel matrix multiplications are not deterministic. Benign environmental noise like microscopic fluctuations in temperature make a difference in the outcome.


> Apache 2

Isn't the original GPLv3[0]?

[0]: https://github.com/facebookresearch/llama/blob/main/LICENSE


Correct, the original is GPL 3.

To produce this implementation from the LLaMA paper we started from github.com/karpathy/nanoGPT, the LLaMA architecture is really similar to GPT. For instance we added rotary positional encoding starting from the original RoPE repo published with the paper.

We finally ran the original model to make sure the two models were numerically.


Can a model itself be copyrighted?

https://www.copyright.gov/circs/circ33.pdf

> To register a work with the U.S. Copyright Office, you must identify the copyrightable subject matter forming the basis of your claim. To be copyrightable, a work must qualify as an original work of authorship, meaning that it must have been created independently and contain a sufficient amount of creativity. Most works meet these conditions. Some works, however, contain elements that either lack the required creativity or are placed outside the bounds of copyright by the law. This circular highlights different types of noncopyrightable subject matter. For more information, see chapter 300, section 313.3, of the Compendium of U.S. Copyright Office Practices.1

https://www.copyright.gov/comp3/chap300/ch300-copyrightable-...

> 313.2 Works That Lack Human Authorship

> Similarly, the Office will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author. The crucial question is “whether the ‘work’ is basically one of human authorship, with the computer [or other device] merely being an assisting instrument, or whether the traditional elements of authorship in the work (literary, artistic, or musical expression or elements of selection, arrangement, etc.) were actually conceived and executed not by man but by a machine.”

> Examples

> Reducing or enlarging the size of a preexisting work of authorship.

----

I'm going to go with that models themselves lack human authorship or any sufficient amount of creativity and thus aren't copyrightable.

They may fall under trade secret... but they need to be treated as one.

This would also mean that licenses based on copyright (most of FOSS licenses) wouldn't be applicable to them.


This is the clearest example of an attention grab I have seen - it does nothing for commercial use of Llama unless they provide a version of the weights produced by them and not Facebook. (and they don't...they ask you to download them from Facebook's repo)


Bs.

Prevents meaningful academic.....

How the hell does agpl prevent academic use? Commercial use sure because agpl follows 4 freedoms and commercial often wants to take someone else's work, slap their brand without acknowledging the original work. That and the downstream is often closed source for "business reasons" which causes their users to not enjoy the fruits of the first party's licensing.

Where does academia come into it? Are researchers now keeping everything under wraps for "shareholders interests"?

Isn't academia supposed to be open culture from the start without any restrictions so what am I missing or are they mixing two unrelated things?

Also, I think I might be wrong but isn't it merely converting llama into their version? Uh ...


I'm not saying this is how it should be, but a lot of the author lists of published papers on scaling properties of large language models have been employees in research divisions within big tech companies or academics holding dual positions with those companies and with their university.

> Where does academia come into it? Are researchers now keeping everything under wraps for "shareholders interests"? Isn't academia supposed to be open culture from the start without any restrictions so what am I missing or are they mixing two unrelated things?

Yeah academia was never perfect, but it's becoming more and more like you describe. It's been happening for a while and that's a whole other thing.


>GPL...prevents meaningful academic and commercial use

WTF are you talking about?


GPL is a copyleft license which requires you to share anything that you build using the original software. This makes it difficult for commercial use.


> GPL is a copyleft license which requires you to share anything that you build using the original software.

That's not true.

> This makes it difficult for commercial use.

Yeah, too bad it's so difficult for companies to use Linux commercially. /s



llama.cpp is also MIT

https://github.com/ggerganov/llama.cpp

previously discussed here https://news.ycombinator.com/item?id=35100086

and one of the rust wrapper: https://news.ycombinator.com/item?id=35171527 (also MIT)


But aren’t the weights still not for commercial use?


That's what I thought too, the source code was not an issue so much as that.

What we need is some sort of "Large Language Model at Home" (like SETI@home was) that could crowdsource the creation of the model which would be free to use.


Right, so sort of like https://github.com/bigscience-workshop/petals but for the training phase. I suppose different training runs could be proposed via a RFC type of procedure. Then it’s not only the open source model maintainers that put the effort, but also supporters of the project can “donate” their hardware resources.


Some form of that is very likely the future


So all the copyrighted stuff they trained it on is fair game, but the weights are not? This is legally uncharted territory I think.

How much work would it be to create a "transformative" work from the llama weights, so that facebook has no claim?


If you hate GPL so much then I assume that you don't run any GPL licensed code on your machines then. I admire your resolve because I would think that is pretty hard!


No, the GPL doesn't prevent meaningful academic or commercial use; rather, it seeks to prevent individuals from taking advantage of free software to limit the freedom of other users. It is important to note that if you live in a free country, there are laws that protect the liberties of all citizens and prevent actions that could restrict those freedoms.


> We believe that AI should be fully open source and part of the collective knowledge.

As do I.

> The original LLaMA code is GPL licensed which means any project using it must also be released under GPL.

Yep. This ensures that AI is "fully open source and part of the collective knowledge."

> This "taints" any other code and prevents meaningful academic and commercial use.

Taints? As in "makes fully open source"? Isn't that the goal?

> Lit-LLaMA solves that for good.

Lit-LLaMA helps people create proprietary closed-source AI instead of the fully open source AI required by Llama. Okay.


There are already a million ways to run LLaMA. This doesn't change the issue at all, which is that the weights aren't commercially licensed.


Yes, agree that the weights aren't commercially licensed (yet)! The other ways to run LLaMA are using GPL license which makes it difficult for commercial use even if someone trains and upload the weights publicly.

This could be a step in for the change :)


Please, train your model on some texts about copyright, licenses, open source licenses, GPL, because it produces nonsense.

1) GPL license is not an engine. It cannot run anything.

2) A product, produced by a code or machine or mechanically, is not copyrightable at all, or it has license of it original source. You can create GPL products on MS Windows using MS compiler. You can create proprietary products on Linux using GNU compiler.

A binary output of a program, produced by a compiler such as GCC, is the derivative from it source code, so it has the same license as the source code. The compiler just transforms the text source file into the binary output file.

For AI, the situation is the same: AI engine transforms a source data set into a binary file with weights, so binary weights is the derivative from source data set, thus its license is the same as in the original source set.

When AI engine runs weights, it transforms an input query into output using also data from the source data set, thus creating a product with mixed content, which must obey licenses from both sources.


I think some businesses and people are worried about using GPL code in their code bases because that's incompatible with their own licenses.


Just noting that HuggingFace has a Llama code implementation[1]. It's also under an Apache 2 license.

While this seems to be nice code I don't particularly see any reason to use that over HuggingFace transformers, where you can easily swap out alternative implementations.

Also, going to legal restrictions on the Facebook LLama code when there are much stronger restrictions on the use of the model seems an odd thing to do. It's true that in some - not all - jurisdictions it is possible the model might not be copyrightable - but you'd have a bold legal department to rely on those arguments. It's also moderately likely that an instruction-tuned Llama (like Alpaca) would be copyrightable even in those jurisdictions.

TL;DR: Use the HuggingFace transformers library. You can experiment with Llama and switch to truly free models like GPT-J or anything new that arrives very easily.

[1] https://huggingface.co/docs/transformers/main/model_doc/llam...


Llama by FB is under a non-commercial license not a GPL license, so I assume you are using a different base model, what model is that?



This isn't a new model, it's just new code.


So who cares if its GPL licensed it can never be put into production


The GPL allows for commercial use. Just adhere to the license.


You probably should care if you integrate the inference code in other software packages that aren’t GPL licensed.


ah i see so Its just a new way to run inference on the non commercial model


I'm still confused about this. Does it require you to have a chatGPT API key for it to work?


I see this as a win for the AI community. The key for LLMs is to enable people to train collaboratively and innovate more quickly in this space. Are there any examples or demos available that showcase the capabilities of "lit-llama"?


I am in love with this implementation considering the ability to run on 8 GB VRAM and Apache 2.0 license.


I am curious though how would the model weights work out?


I guess that means time to fire up a few GPUs later today and get some weights! We should have a weight exchange platform for that maybe, haha.


You mean like a blockchain? I jest, but only a little.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: