Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Question to AI/ML folks : Is there no comparable open source model? Is the future going to be controlled by big corporations who own the models themselves? If models are so computationally intensive to produce does it mean than the more computational power a company has the better its models will be?


RE: Open Source models, there is the AnthropicAI model making the rounds on Twitter[1] and Stability.ai (makers of Stable Diffusion) are working on one [2].

If we use recent history as an example, OpenAI announced DALL-E in Jan 5, 2021 [3], announced v2 and a waitlist for public use in July 20, 2022, and Stable Diffusion shipped an open source model on August 22, 2022 [4] using ~$600K of compute (at retail prices on AWS) [5].

I don't see how it's likely that any company can acquire a durable technology moat here. There are scale barriers to entry, but even VC sized funding can overcome that.

[1] https://twitter.com/goodside/status/1611556749726605312

[2] https://humanloop.com/blog/stability-ai-partnership

[3] https://en.wikipedia.org/wiki/DALL-E

[4] https://en.wikipedia.org/wiki/Stable_Diffusion

[5] https://twitter.com/emostaque/status/1563870674111832066?lan...


Does the AnthropicAI model have published weights? It is listed as closed in HELM[0].

[0]: https://crfm.stanford.edu/helm/v0.1.0/?models=1


Not that I know of. But my understanding is that's their intent.


What source is that understanding based on? I was not under the impression that Anthropic was intending to open source their models at all.


Stable Diffusion's code and model weights can be downloaded, but it's proprietary, not Open Source.


This is the text of the license from this repo [1]. Seems pretty open to me?

About this license

The Responsible AI License allows users to take advantage of the model in a wide range of settings (including free use and redistribution) as long as they respect the specific use case restrictions outlined, which correspond to model applications the licensor deems ill-suited for the model or are likely to cause harm.

[1] https://huggingface.co/CompVis/stable-diffusion-v-1-4-origin...


Compare https://opensource.org/osd

This isn't a very minor point, as this was an explicit discussion and is also OSI's translation of Debian's translation of Richard Stallman's "freedom 0".

That is, it's an important, and explicit, tradition/consensus in FOSS that users aren't restricted in the purposes for which they may use the software.


I don't have an opinion on this one way or another, but if the RAIL license concerns you, then perhaps you can take it up with the organization behind it? https://www.licenses.ai/


There's no point. RAIL are well aware their licenses are proprietary (see their FAQ), and they are happy with that.


Doesn't Stallman's "free" license dictate what you can and cannot do with the software?


No, the GNU licences place copyleft obligations on distribution/conveyance. But they allow you to run the programs for any purpose, without field of endeavour restrictions, or moral police. You don't even need to accept a licence to run a GNU program.


Out of interest are there any copyleft style neural network licenses - eg that require fine tuned model weights are published? (And Affero GPL style in terms of servers and distribution meaning these days)


I am not an expert, but I'd just use the the normal LGPL/GPL/AGPL licenses for the models.


>place copyleft obligations on distribution/conveyance

Ok, so a major restriction on what you can do with the software.


I know that people have criticized copyleft for decades as being unfree in some sense, and I suppose citing either FSF's definition or those derived from it would be circular in the sense that they were all written by people who assumed that copyleft was acceptable.

But those definitions are clear that the "right to run the program for any purpose" must not be restricted by copyright licensing terms, and that copyright licensing "must not restrict anyone from making use of the program in a specific field of endeavor". Neither of those are infringed by restrictions on further distribution. (In fact, even freeware licenses that prohibited redistribution entirely could be compatible with this specific rule.)

You might say that it was surprising or hypocritical not to have a corresponding freedom related to redistribution, which would then preclude copyleft licensing. The BSD projects have tended to act as though they recognized this additional rule (that it's important to allow sublicensing and not to attach the same conditions to derived works, including allowing the possibility that end users of derived works will get fewer rights). But even in this case, nobody has suggested that it was "free" or "open" to directly limit the purposes for which end users could run a program.


That is not open source. That's at most source available. Consider, what will they do if they think you're using ill-suited use cases? How would they even determine something like that?


This tweet has a lot of additional details with timelines for releasing weights of LLMs: https://twitter.com/swyx/status/1579551634832883712?s=20


The closest open source contender is BLOOM: https://huggingface.co/bigscience/bloom. It has an almost identical architecture to GPT-3 (hence, to ChatGPT), and in particular the same number of parameters (175B). It was also trained on a similar amount of data, as far as we can know. Still, it's not like you can just "download it and run it", even just to _load_ the model into memory you need ~400GB of memory, to run it at any decent speed you need a lot of GPUs, so it's not really like consumer hardware. And the process to train it cost about 2 to 4 million $, so replicating it is definitely not for everybody. But also not just for "big corporations"...


In terms of quality, I think BLOOMZ, or mT0, are the best open-source ones.

The non-finetuned BLOOM does not appear favorably (in English) compared to GLM or OPT, which both have published weights: https://crfm.stanford.edu/helm/v0.1.0/?group=mmlu and Flan-T5 is above OPT-IML: https://arxiv.org/pdf/2212.12017.pdf

> Is the future going to be controlled by big corporations who own the models themselves?

On this subject, there is an effort stemming from BigScience to build an open, distributed inference network, so that people that don’t have enough GPUs at home can contribute theirs and get text generation at one word per second: https://github.com/bigscience-workshop/petals#how-does-it-wo...


Getting a server with > 400GB of RAM and a heap of GPUs can be done for less than $6,000 - $10,000 if you're scrappy. Not cheap, but also not out of reach for individuals.


I don't think that figure is correct, you need a "good heap" of GPUs, not just anything... in particular, even just to run inference, you need at least 400 GB of GPU memory, not just RAM. You can't just plug a dozen "cheap" GPUs and call it a day, because if I remember correctly consumer GPUs have at most 32GB of RAM each. Hence you'd need at least 12 of those top-tier GPUs (which certainly don't come at $500 a piece). Probably more, because you can't trivially split weights across GPUs so perfectly (you probably have to put an integer number of layers on each GPU).

In practice these models are typically run using top-tier A100 GPUs, which apparently is the cheapest thing you can do at scale: https://forum.effectivealtruism.org/posts/foptmf8C25TzJuit6/.... It looks like you can get away with just $10/hour, but I'm not sure I believe it. In one hour you can roughly generate 6 million English words this way, that's quite cheap.

But if you want to own the full hardware, then it's quite more expensive. You need 8 of those A100 GPUs, which come at $32k a piece, so you're in the ballpark of > $300k to build the server you need. Then there's of course running costs, these GPUs burn 250W a piece, plus the rest of the server we're at about 3kW power. That's not much, maybe $0.50/hr, plus maybe another $1/hr to cool the room it's in, depending on where it is (and the season, I guess in winter a fan might suffice, it's about as powerful as a couple small electric heaters). So with an upfront expense of > $300k, you're maybe down from $10/hr to $1.5/hr, saving something like $8.5/hr, which is $6k / month (minus the rent of whatever place you put the server in).

All in all, it's definitely feasible for a small start up as well, but not very much for an individual.


Got it, thanks for the information! I hadn't known it was all VRAM for model serving.


Saw this on here the other day. Uses BLOOM. Not as good as chatGPT, but it's something

http://chat.petals.ml/


Closest you can get is probably with Google T5-Flan [1].

It is not the size of the model or the text it was trained on that makes ChatGPT so performant. It is the additional human assisted training to make it respond well to instructions. Open source versions of that are just starting to see the light of day [2].

[1] https://huggingface.co/google/flan-t5-xxl

[2] https://github.com/lucidrains/PaLM-rlhf-pytorch


From, [2]:

” This repository has gone viral without my permission. Next time, if you are promoting my unfinished repositories (notice the work in progress flag) for twitter engagement or eyeballs, at least (1) do your research or (2) be totally transparent with your readers about the capacity of the repository without resorting to clickbait. (1) I was not the first, CarperAI had been working on RLHF months before, link below. (2) There is no trained model. This is just the ship and overall map. We still need millions of dollars of compute + data to sail to the correct point in high dimensional parameter space. Even then, you need professional sailors (like Robin Rombach of Stable Diffusion fame) to actually guide the ship through turbulent times to that point.”

[p.s. ^ was just fyi/heads-up + https://github.com/CarperAI]


> This repository has gone viral without my permission

That's a bizarre thing to write for a public repository


I didn't know I needed permission when sharing things that I find publicly online. They should use a private repo and try to be more polite when making demands.


We really need this, because if it's only the absolute economic elite that will have access to this 99.9999% of people are going to be dominated, psyopped, outworked, censored and drowned in noise whether on the "left" or "right" for ultimate AI centrism aka. bootlicker discourse.

A free for all will still result in a dizzying paradigm shift though, but there is no alternative. Like Guttenberg but exponentially faster - locality will become central, "corny pop culture hacker dungeons" / hackspaces could become important as no one will know whats real and only locally controlled compute and algo power is to be trusted.


The software seems fairly open.

The computing bill around large language models is extraordinary (hence MSFT Azure being a strategic partner).


It is quite outrageous that despite the fact that nearly the entire tech world is built on FOSS, Open Source still doesn't get the recognition it deserves from governments.


Despite its recognition in techie circles open source is largely unknown in both corporate and government environments and there is probably a significant operational gap to overcome as these entities have very rigid rules of engagement.

Among others, they generally dont think for themselves. The picture could change significantly if the various intermediaries, consultants etc who live off these ecosystems found ways to make open source profitable for them


People who thought AI would ever be "open source" and not "stolen from the labor of millions or even billions in order to replace them" are fools.


Why? Huggingface is most likely worth as much as OpenAI is today and everything they do is open sourced or basically open sourced.

Huggingface is one of the most beloved companies in the world right now too. Lots of interest in them within open source developer communities.


The content that these models ingest though have licenses that they ignore. That's the "labor" being "stolen from millions or billions."


It depends what you mean by “open source.” You can find countless open source implementations. The issue is training. Meta has released their LLM weights. I believe Google has too.


The existing open source models which compete at this point suck ass in comparison.

I'm likely to do less NLP research going forward and more CV research because I can't locally run most LLMs but I sure as shit can run most of the diffusion models at home.

It's a sad situation.


It’s not the model that’s the edge here but the scale of compute. A PC user is not going to be able to run such a behemoth which has billions of input neurons. This is not an ideal ML model


what about https://github.com/karpathy/nanoGPT (currently trending on Github)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: