Question to AI/ML folks : Is there no comparable open source model? Is the future going to be controlled by big corporations who own the models themselves? If models are so computationally intensive to produce does it mean than the more computational power a company has the better its models will be?
RE: Open Source models, there is the AnthropicAI model making the rounds on Twitter[1] and Stability.ai (makers of Stable Diffusion) are working on one [2].
If we use recent history as an example, OpenAI announced DALL-E in Jan 5, 2021 [3], announced v2 and a waitlist for public use in July 20, 2022, and Stable Diffusion shipped an open source model on August 22, 2022 [4] using ~$600K of compute (at retail prices on AWS) [5].
I don't see how it's likely that any company can acquire a durable technology moat here. There are scale barriers to entry, but even VC sized funding can overcome that.
This is the text of the license from this repo [1]. Seems pretty open to me?
About this license
The Responsible AI License allows users to take advantage of the model in a wide range of settings (including free use and redistribution) as long as they respect the specific use case restrictions outlined, which correspond to model applications the licensor deems ill-suited for the model or are likely to cause harm.
This isn't a very minor point, as this was an explicit discussion and is also OSI's translation of Debian's translation of Richard Stallman's "freedom 0".
That is, it's an important, and explicit, tradition/consensus in FOSS that users aren't restricted in the purposes for which they may use the software.
I don't have an opinion on this one way or another, but if the RAIL license concerns you, then perhaps you can take it up with the organization behind it? https://www.licenses.ai/
No, the GNU licences place copyleft obligations on distribution/conveyance. But they allow you to run the programs for any purpose, without field of endeavour restrictions, or moral police. You don't even need to accept a licence to run a GNU program.
Out of interest are there any copyleft style neural network licenses - eg that require fine tuned model weights are published? (And Affero GPL style in terms of servers and distribution meaning these days)
I know that people have criticized copyleft for decades as being unfree in some sense, and I suppose citing either FSF's definition or those derived from it would be circular in the sense that they were all written by people who assumed that copyleft was acceptable.
But those definitions are clear that the "right to run the program for any purpose" must not be restricted by copyright licensing terms, and that copyright licensing "must not restrict anyone from making use of the program in a specific field of endeavor". Neither of those are infringed by restrictions on further distribution. (In fact, even freeware licenses that prohibited redistribution entirely could be compatible with this specific rule.)
You might say that it was surprising or hypocritical not to have a corresponding freedom related to redistribution, which would then preclude copyleft licensing. The BSD projects have tended to act as though they recognized this additional rule (that it's important to allow sublicensing and not to attach the same conditions to derived works, including allowing the possibility that end users of derived works will get fewer rights). But even in this case, nobody has suggested that it was "free" or "open" to directly limit the purposes for which end users could run a program.
That is not open source. That's at most source available. Consider, what will they do if they think you're using ill-suited use cases? How would they even determine something like that?
The closest open source contender is BLOOM: https://huggingface.co/bigscience/bloom. It has an almost identical architecture to GPT-3 (hence, to ChatGPT), and in particular the same number of parameters (175B). It was also trained on a similar amount of data, as far as we can know. Still, it's not like you can just "download it and run it", even just to _load_ the model into memory you need ~400GB of memory, to run it at any decent speed you need a lot of GPUs, so it's not really like consumer hardware. And the process to train it cost about 2 to 4 million $, so replicating it is definitely not for everybody. But also not just for "big corporations"...
> Is the future going to be controlled by big corporations who own the models themselves?
On this subject, there is an effort stemming from BigScience to build an open, distributed inference network, so that people that don’t have enough GPUs at home can contribute theirs and get text generation at one word per second: https://github.com/bigscience-workshop/petals#how-does-it-wo...
Getting a server with > 400GB of RAM and a heap of GPUs can be done for less than $6,000 - $10,000 if you're scrappy. Not cheap, but also not out of reach for individuals.
I don't think that figure is correct, you need a "good heap" of GPUs, not just anything... in particular, even just to run inference, you need at least 400 GB of GPU memory, not just RAM. You can't just plug a dozen "cheap" GPUs and call it a day, because if I remember correctly consumer GPUs have at most 32GB of RAM each. Hence you'd need at least 12 of those top-tier GPUs (which certainly don't come at $500 a piece). Probably more, because you can't trivially split weights across GPUs so perfectly (you probably have to put an integer number of layers on each GPU).
In practice these models are typically run using top-tier A100 GPUs, which apparently is the cheapest thing you can do at scale: https://forum.effectivealtruism.org/posts/foptmf8C25TzJuit6/.... It looks like you can get away with just $10/hour, but I'm not sure I believe it. In one hour you can roughly generate 6 million English words this way, that's quite cheap.
But if you want to own the full hardware, then it's quite more expensive. You need 8 of those A100 GPUs, which come at $32k a piece, so you're in the ballpark of > $300k to build the server you need. Then there's of course running costs, these GPUs burn 250W a piece, plus the rest of the server we're at about 3kW power. That's not much, maybe $0.50/hr, plus maybe another $1/hr to cool the room it's in, depending on where it is (and the season, I guess in winter a fan might suffice, it's about as powerful as a couple small electric heaters). So with an upfront expense of > $300k, you're maybe down from $10/hr to $1.5/hr, saving something like $8.5/hr, which is $6k / month (minus the rent of whatever place you put the server in).
All in all, it's definitely feasible for a small start up as well, but not very much for an individual.
Closest you can get is probably with Google T5-Flan [1].
It is not the size of the model or the text it was trained on that makes ChatGPT so performant. It is the additional human assisted training to make it respond well to instructions. Open source versions of that are just starting to see the light of day [2].
” This repository has gone viral without my permission. Next time, if you are promoting my unfinished repositories (notice the work in progress flag) for twitter engagement or eyeballs, at least (1) do your research or (2) be totally transparent with your readers about the capacity of the repository without resorting to clickbait. (1) I was not the first, CarperAI had been working on RLHF months before, link below. (2) There is no trained model. This is just the ship and overall map. We still need millions of dollars of compute + data to sail to the correct point in high dimensional parameter space. Even then, you need professional sailors (like Robin Rombach of Stable Diffusion fame) to actually guide the ship through turbulent times to that point.”
I didn't know I needed permission when sharing things that I find publicly online. They should use a private repo and try to be more polite when making demands.
We really need this, because if it's only the absolute economic elite that will have access to this 99.9999% of people are going to be dominated, psyopped, outworked, censored and drowned in noise whether on the "left" or "right" for ultimate AI centrism aka. bootlicker discourse.
A free for all will still result in a dizzying paradigm shift though, but there is no alternative. Like Guttenberg but exponentially faster - locality will become central, "corny pop culture hacker dungeons" / hackspaces could become important as no one will know whats real and only locally controlled compute and algo power is to be trusted.
It is quite outrageous that despite the fact that nearly the entire tech world is built on FOSS, Open Source still doesn't get the recognition it deserves from governments.
Despite its recognition in techie circles open source is largely unknown in both corporate and government environments and there is probably a significant operational gap to overcome as these entities have very rigid rules of engagement.
Among others, they generally dont think for themselves. The picture could change significantly if the various intermediaries, consultants etc who live off these ecosystems found ways to make open source profitable for them
It depends what you mean by “open source.” You can find countless open source implementations. The issue is training. Meta has released their LLM weights. I believe Google has too.
The existing open source models which compete at this point suck ass in comparison.
I'm likely to do less NLP research going forward and more CV research because I can't locally run most LLMs but I sure as shit can run most of the diffusion models at home.
It’s not the model that’s the edge here but the scale of compute. A PC user is not going to be able to run such a behemoth which has billions of input neurons. This is not an ideal ML model