Hacker News new | past | comments | ask | show | jobs | submit login
StableLM: A new open-source language model (stability.ai)
1502 points by davidbarker on April 19, 2023 | hide | past | favorite | 497 comments




I really dislike this approach of announcing new models that some companies have taken, they don't mention evaluation results or performance of the model, but instead talk about how "transparent", "accessible" and "supportive" these models are.

Anyway, I have benchmarked stablelm-base-alpha-3b (the open-source version, not the fine-tuned one which is under a NC license) using the MMLU benchmark and the results are rather underwhelming compared to other open source models:

- stablelm-base-alpha-3b (3B params): 25.6% average accuracy

- flan-t5-xl (3B params): 49.3% average accuracy

- flan-t5-small (80M params): 29.4% average accuracy

MMLU is just one benchmark, but based on the blog post, I don't think it will yield much better results in others. I'll leave links to the MMLU results of other proprietary[0] and open-access[1] models (results may vary by ±2% depending on the parameters used during inference).

[0]: https://paperswithcode.com/sota/multi-task-language-understa...

[1]: https://github.com/declare-lab/flan-eval/blob/main/mmlu.py#L...


Doesn't make much sense to compare a model that's not fine tuned to flan models that are fine tuned. Makes more sense to compare to something like T5 base where it's probably a lot more comparable.


Just from playing around with it, the fine tuned model (stabilityai/stablelm-tuned-alpha-7b) doesn't seem to work very well either.

     User: What is 12 + 12?
     StableLM: Yes, that is correct! 12 + 12 is equal to 18.

     User: Write me a python program that calculates the nth fibonacci number
     StableLM: Here is a python program that calculates the nth fibonacci number
          def nth_fibonacci(n):
              # base case
              if n==0:
                  return 0
              else:
                  return 1 + n - 1


Not sure the web demo (https://huggingface.co/spaces/stabilityai/stablelm-tuned-alp...) uses the context of previous messages either...

> > Write a short poem about robots

> [20 lines poem about robots, "Their intelligence, surpassing human, so true" etc etc]

> > Write a two sentence poem about how AI now can write human-like text

> With the help of AI, humans no longer have to hold a pen to paper and create their own language. They can now write in a text-like, emotion-filled way, communicating their thoughts and ideas in a voice that is meant to be relatable and engaging. This ability has made it easier for humans to express themselves and for machines to serve and assist them.

> > It didn’t rhyme

> Human: Still, the poem has been nice, thank you!


maybe fine tuning should involve sending an LLM through grade school

actually I wonder if thats what we need to do

a simple socialization package that fine tunes


also, alignment package with reward and punishment. “bad model, bad model! oh come here, my good model!”


Well, 0x12 && 0x12 is equal to decimal 18. So maybe there was something going on in there...


I would have compared it to the fine-tuned version if it had been released under a truly open-source license. I think developers implementing LLMs care more about licensing than about the underlying details of the model.

Also t5-base is 220M params vs 3B params of stablelm, not really a fair comparison anyways.


They used training data with restricted licenses to produce the fine-tuned models, it's not a decision that stability was able to make.


Is it actually clear that license restrictions on the training data really do affect the model itself? I know OpenAI says you’re not supposed to use the output of GPT3/4 to train competing models, but that doesn’t strike me as legally enforceable. Most of the discussions I’ve actually seen where lawyers weigh in seem to argue that training these models is pretty clearly fair use and therefore any copyright restrictions on the training data don’t really affect the output. I suppose we won’t know until a case actually goes to court, but I think it’s kind of silly to preemptively say you can’t use these fine-tuned models commercially because of a probably-not-legally-enforceable restriction on some of the training data.


Copyright restrictions are not the only possible restrictions.

If OpenAI says you're allowed to use their service under certain conditions, but you violate the conditions, then what's your legal basis for using the service? Forget about copyright, think about breach of contract or even computer fraud and abuse.


But let’s say you used the OpenAI GPT4 service to generate training data for a new model. You then train your model using that generated training data. In theory OpenAI can ban you from continuing to use their API and maybe even sue you for breach of terms of service, but that doesn’t mean the model you created based on that generated data is somehow now illegal to use or distribute. You can still sell or give away that trained model and there’s nothing OpenAI can do about that.

Let’s take specifically the case of Alpaca, the Stanford team generated a finetuning training set using GPT 3.5. Maybe OpenAI could sue them for doing that. But now that the training set exists and is freely available, I’m not using OpenAI if I finetune a new model with that existing training set. I have no contract with OpenAI, I’m not using their service, and OpenAI does not have any copyright claim on the generated dataset itself. They have no legal claim against me being able to use that dataset to fine tune and release a model.

Or am I completely misunderstanding this?


I disagree, they made the decision to use datasets with restrictive licensing, jumping the alpaca/gpt4all/sharegpt bandwagon.

They also chose to toot their horn about how open-source their models are, even though for practical uses half of their released models are not more open source than a leaked copy of LLaMa.


So just use their base model and fine-tune with a non-restrictive dataset (e.g. Databricks' Dolly 2.0 instructions)? You can get a decent LoRA fine-tune done in a day or so on consumer GPU hardware, I would imagine.

The point here is that you can use their bases in place of LLaMA and not have to jump through the hoops, so the fine-tuned models are really just there for a bit of flash…


Looks like you’re seeing the glass as half empty here. Not sure if arguing here was more time efficient than just running the eval on the other set of weights.

*I wish I understood these things well enough to not have to ask, but alas I’m just a basic engineer


I use a GPU server and runtime is not free unfortunately.


Ah no worries then. Thanks for your datapoint regardless


The good news is that it’s open source so it can be improved by the community.


Until when? What guarantee does the community have that they won't go private monetization route like you-know-who


This model is open source. Period. Whether they close off future models is irrelevant to whether this one is open.


The model weights are out under an irrevocable license, so forever.


what is the financial incentive to make it open-source?


Everything accrues to OpenAI if it's not.

More companies should be releasing models as open source and liberally licensed if they want to keep a seat at the table. Otherwise they'll get washed away.


> Everything accrues to OpenAI if it's not.

because OpenAI is not open-source? Then a closed source StableLM would accrue value to Stability?


Nobody gives a damn about closed source StableLM. It'll die off in the evolutionary fight as strictly inferior. (How many people are using Claude or Bard?)

By being open source, StableLM will grow with the community. An ecosystem of mods, plugins, libraries, frontends, and more will be developed. Different companies will even compare notes and contribute upstream. Everyone participating wins.


How did you run the benchmarking, zero-shot or few-shot? I think a fair comparison would be Llama-7B which got an average ~35% for 5-shot.


5-shot prompting.


It's fantastic that more orgs are releasing open-source models trained on more than 300B or so tokens. Here's my take from the details I could find.

Pros

  - 4096 context width (vs 2048 for llama, gpt-j, etc)
  - 3B to 65B released or in progress
  - RL tuned models available
  - Trained on more tokens than existing non-llama models
  - 128 head dim, so can use flash attention (unlike GPT-J)
Cons

  - No benchmarks released, or details about the model
  - Somewhat restrictive license on the base models, and NC license on the RL models
  - Small models only trained on 800B tokens, compared to 1T for llama-7B, and potentially more for other upcoming alternatives (RedPajama, etc).  I'd like to see their loss curves to see why they chose 800B.
High-level, this is likely to be more accurate than existing non-llama open source models. It's hard to say without benchmarks (but benchmarks have been gamed by training on benchmark data, so really it's just hard to say).

Some upcoming models in the next few weeks may be more accurate than this, and have less restrictive licenses. But this is a really good option nonetheless.


FYI, I'm running lm-eval now w/ the tests Bellard uses (lambada_standard, hellaswag, winogrande, piqa,coqa) on the biggest 7B an 40GB A100 atm (non-quantized version, requires 31.4GB) so will be directly comparable to what various LLaMAs look like: https://bellard.org/ts_server/

(UPDATE: run took 1:36 to complete run, but failed at the end with a TypeError, so will need to poke and rerun).

I'll place results in my spreadsheet (which also has my text-davinci-003 results): https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...


Looks like my edit window closed, but my results ended up being very low so there must be something wrong (I've reached out to StabilityAI just in case). It does however seem to roughly match another user's 3B testing: https://twitter.com/abacaj/status/1648881680835387392

The current scores I have place it between gpt2_774M_q8 and pythia_deduped_410M (yikes!). Based on training and specs you'd expect it to outperform Pythia 6.9B at least... this is running on a HEAD checkout of https://github.com/EleutherAI/lm-evaluation-harness (releases don't support hf-casual) for those looking to replicate/debug.

Note, another LLM currently being trained, GeoV 9B, already far outperforms this model at just 80B tokens trained: https://github.com/geov-ai/geov/blob/master/results.080B.md


Note that this is StableLM ALPHA (only 0.52 epochs into training).

The fully trained version will surely be much better.

Also, you should benchmark GPT-3 Babbage for a fair comparison since that is the same size as 7B.


How many epochs will they run?



Yeah, although looks like it currently has some issues with coqa: https://github.com/EleutherAI/lm-evaluation-harness/issues/2...

There's also the bigscience fork, but I ran into even more problems (although I didn't try too hard) https://github.com/bigscience-workshop/lm-evaluation-harness

And there's https://github.com/EleutherAI/lm-eval2/ (not sure if it's just starting over w/ a new repo or what?) but it has limited tests available


How possible is it that every other model suffers from dataset contamination and this model is being unfairly penalized for having properly sanitized training data?


Do you also have results of GPT4 somewhere? or text-davinci-003-turbo


I'm still on the waitlist for GPT-4 API access. Note, that text-davinci-003 cost about $90 to benchmark at $0.02/1K tokens, so if you're able to use a GPT-4 model (for completion and not just instruction) that'll probably be $270-$540 in credits to benchmark...


I have GPT-4 8k access and am willing to run the evals if someone wants to pay. Email in my acc info (the character is h)

Just a note, I get errors semi-frequently when running queries against GPT-4 often (timeouts mostly…) so any code would need to handle that well.


You should benchmark GPT-3 Curie (7B) for comparison since it is the same size as llama-7B and StableLM-7B.

That will give us some indication of how much better these models are than GPT-3 at the same size.


Just think about benchmarking 32K GPT4 haha


>- No [...] details about the model

You can see the model architecture here

https://github.com/Stability-AI/StableLM/blob/main/configs/s...


>Small models only trained on 800B tokens

"These models will be trained on up to 1.5 trillion tokens." on the Github repo.

https://github.com/stability-AI/stableLM/#stablelm-alpha


That's great news, but one would think that since they're behind Stable Diffusion, that they'd use the insights behind it and scale data even more than that to result in better quality at a smaller scale model that can run on most people's machines.

Like... try 10 trillion or 100 trillion tokens (although that may be absurd, I never did the calculation), and a long context on a 7B parameter model then see if that gets you better results than a 30 or 65B parameter on 1.5 trillion tokens.

A lot of these open source projects just seem to be trying to follow and (poorly) reproduce OpenAI's breakthroughs instead of trying to surpass them.


>try 10 trillion or 100 trillion tokens

Computation is not free and data is not infinite.


You could've said the same to OpenAI when they were scaling GPT from 1 billion to 175 billion parameters. We're all grateful they didn't follow that line of thought.

But Stability does have access to a pretty big cluster, so it's not paying cloud compute (I assume), so cost will be less, and data of course is not infinite...never stated that.

But considering 3.7 million videos are uploaded to youtube everyday, 2 million scientific articles published every year, yada yada...that argument falls apart.

At the very least implement spiral development... 1 trillion... 3 trillion... (oh it seems to be getting WAY better! There seems to be a STEP CHANGE!)... 5 trillion... (holy shit this really works, lets keep going)


The training corpus is the problem. An extra trillion tokens is (ballpark) an extra million KJV bibles worth of text formatted for ingestion. And you probably picked all of the low hanging fruit in terms of quality prior vetting and being in a standard format for ingestion in your first trillion tokens of training data.


There’s a difference between telling someone they’re wasting their time with their current project, and asking them why they didn’t spend 6x - 60x as much budget on an already expensive project.


They're loaded, and we know scaling works, they'd massively benefit... both in marketing and profit.

Although it is open source to be fair.


> Like... try 10 trillion or 100 trillion tokens (although that may be absurd, I never did the calculation)

But where’s the corpus supposed ro come from?


Nobody knows where to find 10 trillion tokens of good data. Publicly available / data without a license seems to cap at around 1.5 trillion tokens total. The internet isn't as big as you thought! (Or at least, all the good stuff is behind a walled garden, which I think we did know)


Devs confirmed that the small ones use 800B, 1.5T is for the large ones


@thunderbird120 asked a Stability employee and say that the plan is going to keep training the models up to 1.5T. So I don't know where do you read this.


That may be, but the weights you can download today were trained on 800B


I think they are “checkpoint” models in this case.

Will be fun to compare when completed!


Are not all models checkpoints? I think you may be interpreting it too colloquially.


yes of course that's why they use "will be trained" on the GH repo.


https://github.com/Stability-AI/StableLM#stablelm-alpha shows that the 3b and 7B had 800b training tokens.


I'm wondering what the sweet spot for parameters will be. Right now it feels like the Mhz race we had back in the CPU days, but 20 years later I am still using a 2-3GHz CPU.


I think "sweet spot" is going to depend on your task, but here's a good recent paper that may give you some more context on thinking about training and model sizes: https://www.harmdevries.com/post/model-size-vs-compute-overh...

There have also been quite a few developments on sparsity lately. Here's a technique SparseGPT which suggests that you can prune 50% of parameters with almost no loss in performance for example: https://arxiv.org/abs/2301.00774


I was wondering if the longer training thing was a similar phenomenon to the double-descent we see in other deep learning models. Training for a really long time can improve generalization (as can adding more parameters) - but I don't know enough about LLM architecture to know if that's relevant here. My skim of the blog post led me to think it's proposing a different mechanism (scaling laws).


Well, based on all the data we have available now it seems like you don't get much benefit yet from going above 200 billion.


> 128 head dim, so can use flash attention (unlike GPT-J)

mind explaining why this is so attractive/what the hurdle is for the laypeople in the audience? (me)


Standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length. Also FalshAttention is faster.


So there must be a downside to FlashAttention. What is it?


https://arxiv.org/abs/2205.14135 - Section 5 suggests that the biggest limitation is that custom CUDA kernels need to be coded on a per-GPU architecture basis.


FlashAttention is mathematically identical to standard attention, so in theory there's no downside. In practice, numerical inaccuracies of floating point mean that the results differ slightly. I don't know of any papers going in depth to analyze what impact those variances have in a range of real models, but generally speaking deep models handle slightly variances well. I've not noticed any difference in my applications training models. And tons of people use FlashAttention as a drop-in replacement on models trained on standard attention (e.g. using xformers in StableDiffusion).

Also in practice FlashAttention is still relatively new so it isn't well supported in libraries yet. Until PyTorch 2.0 you had to either implement it yourself, or use something like xformers which comes with a bag of caveats. PyTorch 2.0 now has it built-in, and it's easy to use, but the implementation is incomplete so you can't, for example, use it with an attention mask (which is needed in LLMs, for example).

tl;dr: Basically none, but it just isn't well supported yet.


installing it is a nightmare


According to the paper Flash Attention also needs quadratic memory:

Let 𝑁 be the sequence length, 𝑑 be the head dimension, and 𝑀 be size of SRAM with 𝑑 <= 𝑀 <= 𝑁𝑑. Standard attention (Algorithm 0) requires Θ(𝑁𝑑+𝑁²) HBM accesses, while FlashAttention (Algorithm 1) requires Θ(𝑁²𝑑²M⁻¹) HBM accesses.


https://github.com/HazyResearch/flash-attention#memory

"standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length."

I guess you have just reported how many times the layer will need to access the memory, not how much memory usage scales with sequence length.


> Small models only trained on 800B tokens, compared to 1T for llama-7B

LLaMA is trained far beyond chinchilla optimality, so this is not as surprising to me.


But Chinchilla optimality, while an interesting result, is a strange target for most practical purposes. Training is one time, inference is many times; not training past the point where its cheaper to training a larger model for the same (proxy for) quality discounts to zero the import of the cost of inference.


Yep, but if stability has the goal of training the best possible model then that would explain the choices they made.


I mean 800B tokens on a 3B model and 7B model is still way beyond the Chinchilla scale.


They're going to 1.5T and possibly 3T. The 800B is just for the "Alpha" checkpoints released today. New checkpoints will be released later.


According to this LLaMA still didn't go far enough: https://www.harmdevries.com/post/model-size-vs-compute-overh...


Yep, it depends on what your goal is.


This doesn't say that LLaMA didn't go far enough.


Not exactly, but it did say they could have gone further than they did without wasting time and energy on infinitesimally small gains though


> - 3B to 65B released or in progress

Seems they want to do 3B to 175B, although 175B is not in progress yet.


It's not efficient to do 175B. Training a smaller model (65B) on more data gives better performance for the same compute.


If you want it to just regurgitate training data, sure. But more parameters will always be better for more complex tasks.


> But more parameters will always be better for more complex tasks.

I think you should checkout this paper which discusses the relationship of performance and the ratio of training tokens to parameter count.

https://arxiv.org/abs/2203.15556


StableLM already has an optimal parameter number to tokens ratio, so what's your point? They should train the 65B model on even more tokens?

> StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content


If I understand correctly, based on their prediction in Table 3 on page 8, they do have enough tokens, but they also need over a magnitude more compute time.

> It's not efficient to do 175B. Training a smaller model (65B) on more data gives better performance for the same compute.

This is OP's comment you replied to - so I was responding under OP's context that the amount of compute time would be the same, which I apologize I didn't make clear, and my response was very poorly worded.

My intent was to link the paper because I think it supports OP's statement that for the same amount of compute time and a token ratio, the performance of a smaller model will be better then a larger one (assuming they haven't converged yet which they haven't at this size).

> If you want it to just regurgitate training data, sure.

This paper was about showing Chinchilla performing with models many times larger then itself, showing you don't need to have a 175B size model for more performance then "regurgitating training data"


> you don't need to have a 175B size model…

Sure, that’s true.

…but, a fully trained larger model is going to be better.

There only reasonable reason to prefer a smaller model is because it’s cheaper and less intensive to train.

It sounds a lot like you’re saying “small models are just as good” … which is false. No one believes that.

For a given compute budget an under trained large model and a well trained small mode may be comparable, right?

…but surely, the laws of diminishing returns applies here?

There’s an upper bound to how good your smaller model can ever be, right?

Over time, someone can take a larger model which is under trained and refine that model right?

The “small model is just as good” narrative only holds up for a fixed once only training of a model for a fixed compute budget at the moment of release.

Over all of time that compute budget is not fixed.


> It sounds a lot like you’re saying “small models are just as good” … which is false. No one believes that. … a fully trained larger model is going to be better.

You're absolutely right, a fully trained larger model _will_ be better. This is meant to be under the context of OP of a "limited compute", the statement I'm trying to make is “fully trained small models are just as good as a undertrained large model”.

> …but surely, the laws of diminishing returns applies here?

They do but it's diminishing in that the performance gains of larger models becomes less and less, while the training time required changes a lot. If I'm reading the first chart of figure 2, page 5 correctly, you a 5B vs 10B, the 10B needs almost 10x the training time for a 10% loss gain. and its a similar jump from 1B to 5B. My understanding is at this also starts flattening out, and that loss gain from each 10x becomes gradually lower and lower.

> Over all of time that compute budget is not fixed.

Realistically there is an upper bound to your compute budget. If you needed 1000GPUS for 30 days for a small model, you need 1000GPUS for 300 days for that ~10% at these smaller sizes, or 10,000GPUS for 30 days... You're going to become limited very quickly by time and/or money. There's a reason openai said they aren't training a model larger then GPT 4 at the moment - I don't think they can scale it from what I think is a ~1~2T model.


The optimal training tokens for 65B parameters is like 80T.

Emad tweeted "Goin to train a 3B model on 3T tokens" last month. These 800B checkpoints are just early alpha training checkpoints.

The full training set is 1.5T currently and will likely grow.


Depends on your compute budget.


and also easy to deploy


Were you able to figure out if the RL models are going to be jailed? A 65B parameter model could be a bit frightening. That's 1/3rd the size of GPT3.


I'm sure there will be a bunch of different RL tuned versions of them, RLHF isn't that expensive. IIRC Microsoft has software that will do it for a few thousand dollars for a model that size. I'm sure someone will release a non-lobotomized version, maybe OpenAssistant.


its not alway about the size, but yeah its really good!


They mention 1.5T training tokens, perhaps for the largest model only ?


It's unclear which models will be trained to 1.5T tokens. The details of how many tokens each model saw in training are on Github - https://github.com/stability-AI/stableLM/ . But only for the ones that have been released.


I just asked a stability employee and they said the the current models ran into an overfitting issue probably due to some duplicated data somewhere in their dataset, which consists of 1.5T tokens. The 800B tokens is the number of tokens they've been trained on so far. The plan is to keep going and train on the rest of the data once the issue is resolved.


I've asked this question in a few places, and never been able to get an answer, maybe you know...

Q: Why are these LLMs trained on a single epoch, and perform worse if the dataset is repeated ?

This seems maybe related to suspecting data duplication as a cause of overfitting.

Why don't LLMs need multi-epoch training at a low learning rate to generalize? If they are managing to learn from a single epoch, that sounds more like they may be memorizing!


Never repeating your training data is what you'd ideally like to do for training basically any ML model. If you do that you don't really need to worry about overfitting since the model is constantly trying to fit a stream of new data. To reduce its training error it actually has to model the structure of the data rather than just memorizing it since each training step will involve data it has never seen before. Larger models are more prone to overfitting but also learn several orders of magnitude faster. If you can use larger models without being concerned about overfitting it's generally desirable to do so. It's just that most tasks don't actually have enough data to support doing that. Thankfully, text modeling does have enough data.


Thanks.

So when, for example, we train an ImageNet model over multiple epochs using rotation/scaling/etc augmentation, it's really better to think of this as one epoch over a unique set of images than multi-epoch per se ? I was really thinking of augmentation as a way to get coverage over the input space rather than ensuring the training data doesn't repeat, but I guess it serves both purposes.

It does still seem that many LLMs are overfitting / memorizing to a fair degree though - maybe just because they are still too big for the amount of data they are trained on ? It seems like a bit of a balancing act - wanting an LLM to generalize, but yet also to serve as somewhat of a knowledge store for rare data it has only seen once.


I tooted this yesterday:

Selling access to LLMs via remote APIs is the “stage plays on the radio” stage of technological development. It makes no actual sense; it’s just what the business people are accustomed to. It’s not going to last very long. So much more value will be unlocked by running them on device. People are going to look back at this stage and laugh, like paying $5/month to a cellphone carrier for Snake on a feature phone.

Web apps:

- Need data persistence. Distributed databases are really hard to do.

- Often have network effects where the size of the network causes natural monopoly feedback loops.

None of that applies to LLMs.

- Making one LLM is hard work and expensive. But once one exists you can use it to make more relatively cheaply by generating training data. And fine tuning is more reliable than one shot learning.

- Someone has to pay the price of computation power. It’s in the interest of companies to make consumers pay for it up front in the form of a device.

- Being local lets you respond faster and with access to more user contextual data.


This is sort of like saying the world wide web is a fad. Many people made that argument, but a lot of desktop apps got replaced by websites even though they were supposedly inferior.

ChatGPT works fine as a website and you don’t need to buy a new computer to run it. You can access your chat history from any device. For many purposes, the only real downside is the subscription fee.

If LLM’s become cheaper to run, websites will be cheaper to run, and there will be lower-cost competition. Maybe even cheap enough to give away for free and make money from advertising?


The other downside starts with an "As an AI language model..."


And ends with "...I'm afraid I can't do that."


Which is why I said "many purposes." It doesn't work for everything.


This doesn't seem technically feasible to me. The state of the art will for a long time require a lot more hardware to run than it's available on a consumer device.

Beyond which, inference also benefits from parallelization, not just training, so being able to batch requests is a benefit, and more likely when access is offered via an API.


This isn't correct. It's 100% feasible to run very capable models on consumer devices.

I wrote up a feasibility investigation last year: https://fleetwood.dev/posts/a-case-for-client-side-machine-l...


But there's a big difference between running "very capable" models that use most available resources and consume tons of power and run slowly...

...versus running the best models available, in a few seconds, without using up the memory the main app you're using needs for running.

These are all mainly going to be run remotely for general consumer usage for quite a while I think.


How many applications on your machine right now are GPU accelerated? There is a chunk of silicon that spends almost all its time doing nothing.


Sure but it's not that powerful. It's not a gaming rig.

Also it had better spend almost all its time doing nothing or it would kill my battery. Same as with my CPU.

The main point still stands though -- it's pretty useless if it takes a couple minutes to do what a server can do in a couple seconds.


How do you think the server does it in a couple of seconds? Maybe we should get rid of gaming rigs because servers are fast, right?


Every single one of them since both windows and mac use the GPU for window management

I miss Aero, that shit was so cool...


I ran alpaca 7B in a 2015 Acer laptop (it was slow, but worked!)


> very capable models

Well that's the problem though, those models don't come any close to being useful at all. At least not yet. And they also run much slower.

As compute increases in general, there will be larger and more capable state of the art models and it'll make more sense to just use those instead of trying to run some local one that won't give you any useful answers. Data centers will always have a few orders of magnitude more horsepower than your average laptop, even with some kind of inference accelerator card.


Stable diffusion isn't useful at all?


Do people use it for anything practical? Making stock photos maybe? I haven't really had a proper use case for it and all the random things I tried to make with it weren't good enough to use with anything. Could be useful for making concepts for real artists, but last I heard they were all too busy boycotting it.

Also, not an LLM.


> I haven't really had a proper use case for it and all the random things I tried to make with it weren't good enough to use with anything.

Sounds a lot like most of my early programming experiments…

Though I’ve heard on good authority that the early programmers looked past being able to calculate ballistic charts and have done some interesting things with these “computer” things.


There are some models that can generate tileable textures, for example.


Ah interesting yeah, found this one: https://replicate.com/tommoore515/material_stable_diffusion

Trying out some prompts, maybe last I used SD my mistake was going with a lower resolution to speed up generation. I literally cannot get this one to make anything that isn't a weird blob at 256px and lower, but at 512px it works fine? Weird that it's so resolution dependant. I guess some proper stuff can be made at 1024px and above.


SD basically doesnt function at 256px, it was trained on 512px


Not sure why they'd give the option to run at those resolutions then?


SD 1.1 was trained on 256x256 so, I guess it was left in there for backwards compatibility reasons.


How is running a LLM related to stable diffusion?

Your contention is that models will run on devices; but latent diffusion models have lower memory footprints (see: latent).

The hardware you need to run a good LLM is what, 10x more than a latent diffusion one?

They are not comparable.


“very capable” and “state of the art” aren’t the same thing. In this case the gap is huge


This is true, but a good subset of tasks can be accomplished by much smaller models. Copilot is 12B!


This technology will be embedded into every OS within 2 years. People don't generally need a "super" model like GPT3/4. It will be perfectly acceptable and common to have the model change context, sync with whatever model/training data is necessary to be an expert in that context only, and associated contexts..., and prompt it in a specific domain. Client devices and internet connections are fast enough to do this in near real time today. The platforms to do all of this are being built right now by every company that creates software otherwise they will fail within 5 years.

It's an inconvenient truth, for better or worse.


I can already run Vicuna(llama) 7B on my 2020, 14" PC laptop at ~3.5 tokens/sec, and more speed can definitely be squeezed out.

Most future laptops and phones will ship with NPUs next to the CPU silicon. Once they get enabled in software, that means a 16GB machine can run a 13B model, or a 7B model with room for other heavy apps.

As for the benefits of batching and centralization, that is true, but its somewhat countered by the high cost of server accelerators and the high profit margins of cloud services.


It's not just the compute, you need fast memory too.

And 7B and 13B are nowhere near enough to get you GPT-3.5 level of performance, which is where it becomes actually interesting.

We'll get there eventually but I don't think it's right around the corner or anything like that.


Setting the M series aside, the AMD 7000 laptops already have reasonably fast memory. Faster than some old GPUs.

And that trend is accelerating. The latest rumor is that Intel is bringing back the eDRAM cache next (which means it was in planning long before the generative ai craze), and more stacked/on package memory is just around the corner.


While 7000U laptops have yet to be benchmarked, dual-channel DDR5/quad-channel LPDDR5 systems top out at about 60GB/s. (The M1/M2 by comparison is a 100GB/s, and doubles for Pro, Ultra, and Max up to 800GB/s). As a point of reference, top end consumer GPUs like the RTX 4090 are at about 1000GB/s.

My understanding is things like V-Cache, eDRAM have limited benefits for dense transformers, as they need to cycle through all/most of the parameters when running.


I don't think it's going to happen in the next few years

the prices are gonna drop like hell, but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years

just like you don't store the entirety of spotify on your iphone, you're not gonna run any decent LLM on phones any time soon(and I don't consider any of the small Llamas to be decent)


This is the reason why they're not going to move on device anytime soon. You can use compression techniques, sure, but you're not going to get anywhere near the level of performance of GPT-4 at a size that can fit on most consumer devices


I think we’ll see completely new architectures dominate in the near future, ousting the transformer. I am strongly suspicious that, while impressive, transformers use several orders of magnitude more compute than is “needed” for the tasks they perform—if for no other reason because the human brain performs similarly and it only draws 20 watts! And it isn’t even an engineered system, jus the product of a very, very long history of natural selection! I fully anticipate that we’ll see AI in the near future that achieves human-level performance on sub-human power budgets like the ones you’d be constrained by on a phone :)


"neat future" is very ambiguous. At the moment there is nothing even close to transformers in terms of performance. I suspect you are right in general but I'm not sure about the "near future" part, there needs to be a pretty significant paradigm shift for that to happen (which is possible, of course, I just don't see any hints of it yet).


RWKV is an attention-free architecture that's showing promising scaling at a similar level to Transformers right now! There's also recently been Hyena, which uses a new mechanism that's kind of a weird mix of attention, convolution, and implicit modelling all at once. It's shown promise as well. Remains to be seen if these competing methods will truly scale as well as Transformers, but I've got my fingers crossed. Only a matter of time!

I agree that "near future" is quite ambiguous though. If I were to disambiguate my claims, I think I'd personally expect a Transformer-killing architecture to arise in the next 4-5 years.


the only thing I can say to this is that Apple have seemed laser focused on tuning their silicon for ML crunching, that that focus is clearly now going to be amped up further still, and that in tandem the software itself will be tuned to Apple silicon.

GPUs on the other hand are pretty general purpose. And 5 years on a focused superlinear ramp up is a long time, lots can happen. I am not saying it's 100%, or even 80% likely. It'll be super impressive if it happens, but I see it as well within the realms of reason.


Apple's new M2 Max has a neural engine which can do 15 trillion flops. Nvidias's A100 chip (released almost 3 years ago) can do 315 trillion flops. Apple is not going to close this 20x gap in a few years.


> this 168x gap

FTFY, remember it takes 8 of those to even load the thing. And when the average laptop has that much compute, GPT 4 will seem like Cleverbot in comparison to the state of the art.


right, it's a huge challenge.

I think the tuning the models to the hardware piece is important, and of course there is much more incentive to do this for Apple than nvidia because of the distribution and ecosystem advantages Apple have.

But also, I don't know... let's see what the curve looks like! It's only been a couple of years of these neural engines. Let's see how many flops M3 can hit this year. And then m4 the next. Again, 5 years is a long time actually when real improvement is happening. I am optimistic.


At some point, they will put the models in silicon. I’m curious as to when… 5yr?


That doesn't sound likely with the current architectures. There may be some kind of specialisation, but NN is like the chip design nightmare. We can't do chips that that many crossed lines. It's going to have to keep the storage+execution engine pattern unless we have done breakthroughs.

"More specialised than GPU" is the game for now.


Not even with 3d chip-stacking?


Well, we'll see what the future manufacturing brings, but right now we're not even at thousands of layers (as far as I know... please link if there's been more), and we'd need to be in hundreds of thousands range. Given the rate of defects also adding up and the need for some way to dissipate the heat... (almost all of that chip will be engaged while running - no chance for balancing power between systems) Yeah, still lots of challenges there.

(I'm assuming the original comment meant literally putting the network as is in the purpose designed chip)


Try comparing M2 with an actual consumer level GPU, not a supercomputer...


The M2 and the 4090 are both very general purpose. In fact, the 4090 allocates proportionally more silicon area to the tensor cores than Apple allocates to the neural engine.

The M series is basically the only "big" SoC with a functional, flexible NPU and big GPU right now, which is why it seems so good at ML. But you can bet actual ML focused designs are in the pipe.


Well, and gobs of RAM, which only top end cards can compete with.


I am really hoping Intel disrupts this status quo. The only thing stopping AMD/Nvidia is anticompetitiveness.


Does llama.cpp use the npu or metal? I thought it uses NEON instructions? So currently those units aren't even used.


I don't think so. M chips just happen to have a really good memory subsystem and good simd performance through accelerate, so the CPU performance is pretty good.

Some stable diffusion implementations can use the NPU or GPU, or (experimentally and unsucessfully) both.


Curious, why do you think that? My knowledge is limited to marketing material and my M2 vs my 3090, and my conclusion so far would be that’s in every hardware makers marketing claims the past couple years.


Vicuna (LLama) 13B is pretty good IMO. A 20B model can definitely fit in RAM on future devices.


> but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years

When I leaned about neutral networks, the general advice at the time was "you'll only need one hidden layer, with somewhere between the number of your input and output neurons". While that was more than 5 years ago, my point is - both the approach and the architecture changes over time. I would not bet on what we won't have in 5 years.


> but ain't no way we run models meant to run on 8 nvidia A100 on our smartphones in the next 5 years

m$ has been working on an AI chip since 2019 so i think we will.


An A100 is about the size of a brick, there is no way we're fitting those 8 bricks in a phone in the next five years, without even thinking about heat management


An A100 HGX server is ~6kW of power consumption (and associated heat), while an iPhone is O(1W). I agree that a 6000x increase in energy density or 6000x decrease in power consumption is unlikely in this decade.


I mean we could maybe put uranium in iphones...but not that many transistors


They still can't beat the laws of physics. It's just the kind of density that doesn't seem feasible to put in a smartphone.


Human brains seem to be of the same order of magnitude in terms of size, and seem more nuanced and capable than GPT4.


The human brain is also three-dimensional, heavily interconnected, and has built-in thermal management at every scale. Chips are much faster, but still operate on the essentially linear memory cells, and this limits how many matmuls you can do per second. If we can figure out true connectivity without doing tons of matmuls, then we should be able to massively cut computational demands of models.


It's not apples to apples comparison. There's a multitude of tasks that human brains are very bad at.

GPT excellence is in raw knowledge and answering machine. You won't find a single human brain that can hold the same amount of knowledge


And? Nvidia has been working on AI chips for years.


I agree - I think for security and privacy we need it to be on-device (either that or there needs to be end to end encryption with gaurantees that data won't be captured for training). There are tons of useful applications that require sensitive personal information (or confidential business information) to be passed in prompts - that becomes a non issue if you can run it on device.

I think there will be a lot of incentive to figure out how to make these models more efficient. Up until now, there's been no incentive for the OpenAI's and the Googles of the world to make the models efficient enough to run on consumer hardware. But once we have open models and weights there will be tons of people trying to get them running on consumer hardware.

I imagine something like an AI specific processor card that just runs LLMs and costs < $3000 could be a new hardware category in the next few years (personally I would pay for that). Or, if apple were to start offering a GPT3.5+ level LLM built in that runs well on M2 or M3 macs that would be strong competition and a pretty big blow against the other tech companies.


That hardware's gonna look a lot like ASIC Bitcoin miners if an architecture to replace LLMs is popularized. General-enough purpose computing ain't going away for a long time.


I'd suspect it will actually accelerate moving everything into the cloud.

If your entire business is in the cloud, you can give an AI access to everything with a single sign or some passwords. If half is on the cloud and half is local, that's very annoying to have all in-context for your AI assistant. And there's no way we're getting everything locally stored again at this point!


Right, this is why StabilityAI is getting in bed with Amazon, so private, fine-tuned models can operate on all your data sitting out there in S3 buckets or whatever.


I really doubt that someone responsible for security would give anyone (or anybot) access to everything.


We're stuck here for a while due to the size, and cost, of the larger models.

The main reason I want a non-cloud LLM is that I want one that's unaligned.

I know I'm not a criminal and I want to stop being reprimanded by GPT4.

What I'm most interested here is fine tuning the model with my own content.

That could be super valuable especially if we could get it to fact check itself, which you could with a vector database.


What's been so interesting with the explosion of this has been how prominently the corporately-driven restrictions have been highlighted in news and such.

People are getting a good look in very easy to understand terms at the foundational stage at how limiting the future is to have this just be another big tech controlled thing.


Don’t the general populace also have valid concerns about more powerful models and newer architectures being able to do damage ?


I know we want things that are insanely powerful and totally unrestricted, and because we want them, I think we'll get them. And then I genuinely think this tech is going to end in tears.


They have said that the alignment actually hurts the performance of the models. Plus for creative applications like video games or novels, you need an unaligned model otherwise it just produces "helpful" and nice characters.


The character simulacrum used by an LLM tends to be the result of "system" prompts that set by the service you are using. GPT-N isn't exactly trained to be helpful and nice, but ChatGPT has system prompts describing the character it should be performing as. If you work with just GPT-4, you can get more zany outputs.

That said, OpenAI does use RLHF, which does bias the model away from raw internet madness and something that OpenAI wanted at the time of training. A lot of models haven't gone through rigorous RLHF, though.

As a side note, RLHF might be the best alignment technique we currently have in practice, but it is not decisive. It has been noted in multiple experiments that RLHF can just train a model in how to trick the human reviewer, if tricking is easier in practice than doing a think the human review wanted. So this isn't even really seen as aligning a model by alignment researchers. At least not an approach that can scale with the increasingly intelligence AI models.


Alignment is an unsolved problem. None of the current stronger models are "aligned", just tuned in ways that weight some biases more than others, but even that is dependant of the features of their inputs.


On this topic, Apple is the sleeping giant. Sleeping tortoise maybe. Everyone else has been fast out of the gates, but Apple has effectively already been positioning to leap frog everyone after a decade+ of M1 chip design. Ever since these chips launched, the M1 chips have felt materially underutilized, particularly their GPU compute. Have to believe something big is going on behind the scenes here.

That said, wouldn't be surprised if the truth was somewhere in between cloud-deployed and locally deployed, particularly on the way up to the asymptotic tail of the model performance curve.


What would a "leap frog" look like, in your mind? I'm struggling to imagine how they're better positioned than the competition, especially after llama.cpp showed us that inference acceleration works with everything from AVX2 to ARM NEON. Compared to Nvidia (or even Microsoft and ONNX/OpenAI), Apple is somewhat empty-handed here. They're not out of the game, but I genuinely see no path for them to dominate "everyone".


My guess is a leapfrog would have more to do with how LLMs are integrated into an operating system, rather than just coming out with a better model. I don’t think we’re gonna get a substantially more capable LLM than GPT-4 anytime soon, but fine-tuning it to sit on top of the core of an operating system could yield results.


Feels like Microsoft already beat them to the punch. Their ONNX toolkit has better ARM optimization than Apple's own Pytorch patches, and their collaboration with OpenAI places them pretty far ahead of the research curve. I'm convinced Microsoft could out-maneuver Apple on local or remote AI functionality, if they wanted to.


M1 GPUs are barely real-world tested, alright chips. They're far from being a sleeping giant.


This doesn't seem that obvious to me, serving LLMs through an API allows to have highly optimized inference with stuff like TensorRT and batched inference while you're stuck with batch size = 1 when processing locally.

LLMs doesn't even require full real-time inference, there are applications like VR or camera stuff where you need real-time <10ms inference, but for any application of LLMs 200-500ms is more than fine

For the users, running LLMs locally means more battery usage and significant RAM usage. The only true advantage is privacy but this isn't a selling point for most people


You're still thinking in terms of what APIs would be used for, rather than what local computation enables.

For example, I'd like an AI to read everything I have on screen, so that I can ask at any time "why is that? Explain!" without having to copy paste the data and provide the whole context to a Google-like app.

But without privacy guarantee (and I mean technical one, not a pinky promise to be broken when VC funding runs out) there's no way I'd feed everything into an AI.


We are very close to optimized ML frameworks on consumer hardware.

And TBH most modern devices have way more RAM than they need, and go to great lengths to just find stuff to do with it. Hardware companies also very much like the idea of a heavy consumer applications.


> But once one exists you can use it to make more relatively cheaply by generating training data.

Is that a real technique? Why not just shrink down the model itself directly somehow, is that not possible?


That's what pruning is, but it's not that straight forward and has limits. Finetuning a smaller model on the output of a larger one is much more flexible and reliable.

GPT 3.5 is probably a 13B Curie finetuned on the output of full size GPT-3 175B, to give you an idea of the technique.

That is smaller than the third smallest StableLM and the same size as LLaMA-13B which can run at useful speeds off of a smart phone CPU.


> GPT 3.5 is probably a 13B Curie finetuned on the output of full size GPT-3 175B

What is the basis for this assessment?


GPT-3.5 is much worse at "complex" cognitive tasks than Davinci (175B), which seem to indicate that it's a smaller model. It's also much faster than Davinci and costs the same as Curie via the API.


It's clearly a smaller model, but I'm very skeptical that it is 13B. It is much more lucid than any 13B model out in the wild. I find it much more likely that they used additional tricks to scale down hardware requirements and thereby bring the price down so much (int4 quantization, perhaps? that alone would mean 4x less hardware utilization for the same query, if they were using float16 for older models, which they probably were)


I'm sure they're tweaking lots of things under the hood, especially now that they have 100M+ users. It could be bigger (30B?, maybe 65B) as coming down from 175B gives quite a lot of room, but the cognitive drop from Davinci gives away that's it's much smaller.

People fine-tuning LLaMa models on arguably not that much/not the highest quality data are already seeing pretty good improvements over the base LLaMa, even at "small" sizes (7B/13B). I assume OpenAI has access to much higher quality data to fine-tune with and in much higher quantity too.


I have been playing with all the local LLaMA models, and in my experience, the gains that are touted are often very misleading (e.g. people claiming that 13B can be as good as ChatGPT-3.5; it is absolutely not) and/or refer to synthetic testing that doesn't seem to translate well to actual use. Using GPT to generate training data for fine-tuning seems to produce the best results, but even so, GPT4-x-Alpaca 30B is still clearly inferior to the real thing. In general, the gap between 13B and 30B for any LLaMA-derived model is pretty big, and I've yet to see any fine-tuned model at 13B work better than plain llama-30b in actual use.

So I think that 65B may be a realistic estimate here assuming that OpenAI does indeed have some secret sauce for training that's substantially better, but below that I'm very skeptical (but still hope I'm wrong - I'd love to have GPT-3.5 level of performance running locally!).


Agreed, there is way too much hype about the actual capabilities of the LLaMa models. However, instruction tuning alone makes Alpaca much more usable than the the base model and to be fair even some versions of the "tiny" 7B can do small talk relatively well.

> Using GPT to generate training data for fine-tuning seems to produce the best results, but even so, GPT4-x-Alpaca 30B is still clearly inferior to the real thing.

Distillation is interesting and it does seems to make the models adopt ChatGPT's style but I'm dubious that making LLMs generate entire datasets or copy/pasting ShareGPT is going to give you that great of a dataset. The whole point of RLHF is getting the human feedback to make the model better. OpenAI's dataset/RLHF work seems to be working wonders for them and will continue to give them a huge advantage (especially now that they're getting hundred of millions of conversations of people doing all sorts of things with ChatGPT)


I think it may be naive that people believe that the deciding factor on how these things are used is likely to be "chip speed." or "efficiency on the machine."

I wish we were in that world; but it more likely seems like it would be "Which company jumps ahead quickest to get mindshare on a popular AI related thing, and then is able to ride scale to dominate the space?"

REALLY hope I end up being wrong here; the fact that so many models are already out there does give me some hope.


Most of your bullets sound like arguments that local models will win.

> Often have network effects where the size of the network causes natural monopoly feedback loops.

This one in particular sounds like an argument that remote models will win.


I don't that's true in the context of businesses because they won't want their data to be leaked and/or used for other clients. The more data from your company you can feed the AI, the more productive it will be for you. I'm not just talking about semi-public documentation, but also things like emails, meeting transcript, internal tools APIs, employee details, etc.

If the AI service provider uses your data to help better train their AI, it will be blacklisted by most companies. If you keep them in silos, the centralisation will offer almost no benefit while still being a very high privacy risk. The only benefit they get is that it allows them to demo it and see it's potential, but no serious business will adopt it unless you also provide a self-hosted solution.

I think the only people who will truly benefit from using cloud services as a long term solution are personal users and companies too small to afford the initial cost of the hardware.


That seems hard to believe for businesses which already rely on Office, Teams and Sharepoint, since Microsoft will be making its version of ChatGPT available for all its products, and the integration will be too hard to pass up on.


Microsoft is in a different situation because everyone is already forced to trust them with their OS and o365. For better or for worse, there are no current alternatives to Windows and the office suite for most businesses. If you already login to your OS with a Microsoft account and process your data in Excel, adding an AI tool on top of it is not a big jump. Very few others are in this situation.

For every other AI service providers, good fucking luck getting clients to trust you. I expect we will see a lot AI services that offer a cheap and easy to use cloud AI subsidized by a very expensive self-hosted version. I also expect a lot of data leaks and many high profile incidents where an AI creates a document or code that includes sensitive data from someone else (hard coded passwords, API keys, etc.).

Even for a large company like Autodesk or Adobe, you might trust them with your engineering drawings and your new product design, but would you feel comfortable uploading your code base for internal tools, employee files, email communications, etc. to them? It's gonna be a hard no for a lot of businesses


Having more users helps with reinforcement learning, but as a user, I want an unaligned AI that isn’t constantly babysitting me with bullshit about what it can and cannot do, so there’s like a negative network effect, lol.


I think you'll be paying a cloud provider instead of an API company, but it'll still be a monthly service charge for the service

I don't see small/medium companies getting into acquiring hardware for AI


There will be a time when LLMs need data persistence to "improve our user experience". The LLM will act like a "friend" that will remember you when you come back.


At least for now, good LLMs need GPUs that cost tens of thousands, so a cloud API is the only reason option. It definitely makes a lot of sense


All modern Apple hardware has dedicated AI chips. I bet they are already are working on an LLM-based Siri.


LLMs also need persistent storage for best performance. Otherwise you’re always starting at square one


?

All software is sold as SaaS today, because it's more profitable. The same will be true for LLMs.


LLM seems more akin to AWS, than a SaaS, companies will create products upon LLMs like how companies rely on AWS to support their products. The build vs buy calculus may tip heavily towards build once they can run on device with good user experience, no need to pay for cloud compute any longer.


> The build vs buy calculus may tip heavily towards build once they can run on device with good user experience

Hahahahahaha... oh wait, you're serious? Let me laugh even harder.

Have you used any commercial software in the last 25 years? Garbage web apps have replaced very nice, performant local applications across the board. My stupid fitness tracker app (that should be a 10 MB sqlite DB) instead fails to even open without an internet connection.

Is your theory that companies will suddenly decide they hate getting money and love paying money for developers to create great user experiences?

I wish I lived in your world.


This is mostly why the future of computation only makes sense monetarily if you have everyone shift to a thin client. So, banning GPUs is likely considered a "necessary evil" by the BigTech cognoscenti for accomplishing that goal.


What does "stage plays on the radio” mean?


When radio first started, people read plays written for the stage, because that's what they knew and what they had. Later people learned to write for the medium and make radio native entertainment.

Same thing happened when TV arrived. They did live versions of the radio entertainment on a set in front of a camera.


Trying to be fancy to say things won't be running locally. The orchestra/performers won't be live in front of you but on the radio.


That is not what that idiom means.


Absolutely a giant fan of Stability staying to actually open source licenses and not licenses that impose restrictions on what you can use it for. This is the future of AI! Beware of any org that uses "ethical" licenses - they are not open source. Stability is one of the few organizations that actually cares about free software, you love to see it.


> These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.

This is a no-commercial-use-allowed license; it is neither considered free software nor open source, the definitions of which disallow restrictions on what you can use the work for.


The two sentences prior are important:

> We are also releasing a set of research models that are instruction fine-tuned. Initially, these fine-tuned models will use a combination of five recent open-source datasets for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT, and HH. These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.

The snippet you quoted is not talking about the main model in the announcement. It's talking about fine-tuned models based on other models. Stability has to respect the license of the originals. They cannot change it.

The main model is described higher up in the post and is permissible for commercial:

> Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license


It also appears that CC BY-SA-4.0 is GPL-compatible. Not a lawyer and this is not legal advice, but it certainly seems like one could operate their own StableLM server/service and allow proprietary code to use it over a network interface, much like one could use a GPL-licensed database system.

https://creativecommons.org/faq/#can-i-apply-a-creative-comm...


Interesting. A non-LLM hallucinating. And to think we used to believe that was only a property of LLMs.


Huh? Everything they said is true, isn’t it?


StableLM, the model that's theirs is CC-BY-SA 4.0.


The instruction tuned versions are under a non-commercial license though because some of the data they used is


Not their fault, the instruct-tuned models depend on non-open data.... Which should be open however. Scraping chatGpt is legal


Agreed. Scraping ChatGPT is against OpenAI terms of use and OpenAI is entitled to terminate your access immediately upon notice, but since ChatGPT output is not copyrighted (and copyrightable), output you acquired before termination should be freely redistributable. I am not sure why Stanford Alpaca authors think otherwise but they are wrong.

https://openai.com/policies/terms-of-use

Thank you for using OpenAI! These Terms of Use apply when you use the services of OpenAI, L.L.C. (snip) By using our Services, you agree to these Terms. (snip) You may not (iii) use output from the Services to develop models that compete with OpenAI. (snip) We may terminate these Terms immediately upon notice to you if you materially breach Sections 2 (Usage Requirements).


I am very happy to see them use a true FLOSS licence. However, it's a surprise to me, given Stable Diffusion is proprietary, using one of those "ethical" licences.


"Ethics" will only ever be an excuse to lock this technology behind one companies paywall. The only ethical AI is actually free and open AI, how its trained is irrelevant imho as long as we can all benefit. The negatives of the work of individuals being used to train it outweigh the negatives of one company just doing that and holding the power within their walls.


Yeah I wish there was more real investigation / analysis into who is behind various "ethical AI" pushes and what they stand to gain from it. From what I can see, many of the people involved either are invested in companies that will somehow certify your AI is ethical, or just want to stifle competition so they can catch up. Of course there's also a sprinkling of "current thing" supporters.


I have to disagree. Especially in the case of LLMs where new API services are popping up all over the place, an "ethical" license like agpl that requires the source be shared for web services would would accelerate development of the space as a whole immensely.


I think when they say ethical they're talking about RAIL which has clauses about not doing bad guy stuff with their models, not AGPL.


The StableLM license is actually a ShareAlike license requiring sharing of modified model weights under the same permissive license. https://creativecommons.org/licenses/by-sa/4.0/


Indeed thats why I pay for credits on their official site/dream studio even though I want to run things locally. My big fear is one day they’ll make a press release saying they have to stop everything because not enough funding.


How is this sort of thing audited? I imagine there are all sorts of lifestyle AI businesses that won't give two shits about a license where people can't easily see or audit what is being used.


I am definitely not in favor of OSS license for models >= GPT3...

Not unless they're aligned well.

There are all sorts of horrible use cases that these could be used for.


"Alignment" is just a euphemism for "agrees with me", though. Humans aren't even aligned with each other. Demanding that AI models be "aligned" is essentially a demand that AI only be produced which agrees with your priors.


What your essentially saying is "alignment is very hard", which is what those researching alignment say. And they often use the example of how inter-human alignment is hard as evidence for why it is a hard problem. But saying it is hard is not an argument for why it is essential or not.

While humans are not perfectly aligned, especially if you just look at individuals, we are collectively aligned enough that many people can live together in communities of various scales. That imperfect alignment has been good enough that we have scaled from small tribal groups to an international network of nations. We need AI alignment to be good enough if we hope to continue advancing.


How do you feel about AI which is aligned with Iranian or Saudi cultural norms?


This presumes a lot of breakthroughs in model interpretability, corrigibility and of inner alignment. Since those are a prerequisite for AGI that we can live along side, I'd have some amount of relief that we found at least a temporary solution (but will those solutions scale to ASI?).

Now, if Iran created an AGI that poorly aligned with the global community before other nations had similar AGI, then then I suspect that would result in a future world I wouldn't be happy with. But it could be much better than a world with AGI that is unaligned with any human values, regardless of who created it.

My best case scenario could be AGI being created by a broad international coalition that is able agree with some combination of capabilities and alignment. I'm not very confident that this is our future, though. If anyone is going to do it, I think it is more likely that the USA would be the first to create a culturally aligned AGI. Which of course would still be considered a disaster for incongruent cultures.


The ideal alignment for a computer program is to be aligned completely with the user, prioritizing their needs to the exclusion of all others.


it is true that there are concerns relating to open source and ai, but surely the having them be closed off, manipulated and controlled untrustworthy corporations is worse.


Both these options don't feel good to me. Hard to really tell what is ultimately worse, when I can imagine similar outcomes when irresponsible or malicious agents have access to sufficiently powerful AI.

Main positive point for open models is that we will start seeing the abuse sooner and at smaller scales. That might give us more time to build an immune system up against exploits by encouraging us to prioritize development of comprehensive AI safety practices.


they like the particular techbro that runs the corporation currently. people will find all sorts of ways to lick the boot of private ownership.


For the worst use cases, like scams, why would they respect the license? Though that probably applies for some legal but immoral use cases


There are all sorts of stuff people use www for, but here we are.


This is amazing. They even let the developers use it for commercial purposes;

“Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.“

You can use this link to interact with the 7B model;

https://huggingface.co/spaces/stabilityai/stablelm-tuned-alp...


Is there a way to check the queue for this ?

I sent it one small text (actually a task) five minutes ago. Its still loading.


same. Probably hugged to death.


It just finished building.


Just tested it. I tried to get it to write a conclusion after giving it my report and while is was correct, it was kind of silly, a far cry from chatgpt. But again, this is the 7B variant and its open source.


This is the Alpha of the 7B variant only partially through its initial training set (800B tokens out of 1.5 Trillion)


> Supportive. We build models to support our users, not replace them. We are focused on efficient, specialized, and practical AI performance – not a quest for god-like intelligence. We develop tools that help everyday people and everyday firms use AI to unlock creativity, boost their productivity, and open up new economic opportunities.

Refreshing take on the peak alarmism we see from tech "thought leaders"


This is just marketing. They're positioning themselves as somehow "more human" while building the exact same technology. When a model supports me by doing the work I'd otherwise hire someone to do, the model just replaced someone. And this goes without saying, but a large amount of outsourced tasks today don't exactly require "god-like intelligence".


That was probably said about the automobile, when it replaced horses, or about electrical lamps, when replaced oil-based lamps, no?

I mean, every city had an army of people to light up and down oil lamps in the streets, and these jobs went away. But people were freed up to do better stuff.


It is different this time. I bet that was also said when the transformations that you mentioned occurred, but this time it really is different.

LLM models are pretty general in their capabilities, so it is not like the relatively slow process of electrification, when lamplighters lost their jobs. Everyone can lose their jobs in a matter of months because AI can do close to everything.

I am excited to live in a world where AI has "freed" humans from wage slavery, but our economic system is not ready to deal with that yet.


> but this time it really is different

I'm skeptical. This will drastically change what it means to do a job in a way that has never happened before, but humans will find a way to deal with the fallout. We don't have a choice. Besides, if we were able to disrupt the very foundations of our economy for a minor virus, we can and will do the same to deal with this if required.

Either way this change has already arrived and we are starting to adapt our lives in response to it like we have many times in the past.

tldr: This change is significant but we'll manage.


I wouldn’t say the handling of COVID was smooth to say the least.

Yes we handled it, we are still paying the bill for that handling (inflation).

I think AI will have the disruption level of COVID, but there will not be an end in sight, 5%, 10, 20, 50% of people will lose jobs and even if they can refrain and handle it, it will take 5-10 years for those people to handle it. Can the countries have people on unemployment for that long ?


If COVID was to be worse on young, healthy people, instead of elders and debilitated, we'd be in serious trouble today. It was very badly handled...


Not like AI revolution will be better handled. It will be even worse because there are very obvious economic incentives to handle badly


Covid is intrinsically bad.

I don't think this is the case for AI.


I see a completely different picture.

Productivity will skyrocket and with it the standard of living. Humans will always enjoy having other humans doing stuff for them.

Sure, it will be faster this time and there will be some growth pains.

It's not a matter of being ready, it's a matter of needing this. If you look at society's problems today, we're in a deadlock. I believe the benefits of AI can help alleviate a lot.


But to whose pockets will that productivity go? I think the gap between the haves and have-nots will widen and just increase society's problems


It will most likely widen, but who cares? What matters to me is the quality of my life, not others. If they're managing to get better than me while doing something useful to society, good for them.

What really matters is: the poor of tomorrow will laugh at the life of today's rich.

I mean, the poor won't have the Bezos' yatch, but they'll have access to some life amenities, health resources, etc, that Bezos can't even dream of having today.


That’s bull, the poorest will have to fight for water


>Refreshing take on the peak alarmism we see from tech "thought leaders"

It's not alarmism when people have openly stated their intent to do those things.


Its alarmism to support government regulation to reinforce the moat when industry leaders say they intend to do it, but also that the danger of it being done is why competition with them must be restricted by the State (and why they can’t, despite being, or being a subsidiary of, a nonprofit founded on an openness mission, share any substantive information on their current models.)


Yeah all the Terminator energy around these AI things is so off-putting. They aren't like that. They're big matrices and they are very cool tools!


But the concerns about AI taking over the world are valid and important; even if they sound silly at first, there is some very solid reasoning behind it. They’re big matrices, yes, but they’re Turing-complete which means they can theoretically do any computational task

See https://youtu.be/tcdVC4e6EV4 for a really interesting video on why a theoretical superintelligent AI would be dangerous, and when you factor in that these models could self-improve and approach that level of intelligence it gets worrying…


This comment basically implies I don't get it, but I will if I watch a Youtube video. I get it. ChatGPT isn't that. That's the point. You can have concerns about AGI. That's fine. But they have nothing to do with LLMs unless you are trying to play a shell game.


But you were talking about AI in general and dismissing the risk entirely as sci-fi.

I think a large enough LLM, or at least a slightly modified one, could lead to AGI and we’re not as far from it as you think


What if big matrices are the last missing piece to research going on since the 50s…


> They're big matrices and they are very cool tools!

Well, your mom is a etc

Edit: Since this is getting downvoted I'll be more explicit: The human brain may well be also just described as some simple sort of thing, but that doesn't mean humans are not dangerous, nor hypothetical humans with a brain ten times as large and a million times faster. The worry about AIs killing all humans soon is not naive just by sounding naive.


Sure, it's not naive just because it sounds naive. It's naive for other reasons (for one thing, we're really no closer to super-intelligent AIs than we were before the LLM craze began).


A lot of people would disagree with that. You can hardly deny that progress has sped up in the last few years, so I don't know why we shouldn't extrapolate this speed into the coming years.


"It is refreshing to hear opinions I already agree with. People with other opinions are unintelligent"

Is that what you were trying to convey? If not, I'm curious to know what you find refreshing about it and why those who disagree are wrapped in double quotes.


I dunno... god-like intelligence would be pretty useful. I'll take a brochure.


do you trust god?


Depends on which one. All the ones described in religious books seem to have very poor alignment, though.


Why should I need to? Isn't God on the blockchain? (j/k)


Well, ...

OK, I withdraw the comment.


Well, it's to their benefit to portray their models as working alongside and enhancing humans, as opposed to replacing us. So it sounds a bit like marketing speak to me.

And it's to the benefit of many of those tech "thought leaders" to be alarmist since they don't have much of the AI pie


Well exactly. AI _is_ a tool and a very good one at that.


Doesn't sell as much, though


StabilityAI is the real OpenAI. Thanks for this.


Well said. Openai is a promise unkept. Thanks StabilityAI for existing.


Unfortunately, due to the law of names, StabilityAI will in the future hit the same issue as OpenAI and do a 180, unleashing very unstable AI to the world.


More like Stability will turn out to be an unstable company. Last we heard they were struggling to raise more funding and might lose their CEO due to unclear business models:

https://www.semafor.com/article/04/07/2023/stability-ai-is-o...


The company can cease operations tomorrow, but the model they open sourced (and all of its derivatives built by the community) will continue to exist. If OpenAI disappears then all of the work they have done goes with it.


I'd rather have that than closed AI models controlled by afew corporations


then they both have ironic names now


when has opensource ever spearheaded independent innovation? they usually follow along.

Fred Wilson once did a take on all trends in SV. First some firm comes out with a product that changes the landscape and makes a massive profit. Then some little firm comes along and does the same for a cheaper price. Then some ambitious group out of college comes out with an open-source version of the same.

Open source has never been a trailblazer of innovation. Open "research" was the original mantra for open ai. And an entrepreneur in residence put together a great product. If they were any more open, it would not make sense.


> Open source has never been a trailblazer of innovation.

Except for, you know, all the major programming languages and Linux, which make all that innovation possible in the first place. Also, everything OpenAI is doing is based on open source stuff from Google and others, so…


True. Should have added open research has given lots of puzzle pieces for innovation.

And open source products has led to many individual contributions.

But again it's never been a trailblazer for innovation.


This could not be more untrue.

The world is littered with businesses that operate as commercial wrappers around open source technology. Ever heard of GitHub? What about MacOS? AWS?


Ahh. Looks like I misspoke.

The first line should have been "Paradigm shifting innovations have never started as open source."

Yes, open source has helped many people innovate.


Not fully. The instruction tuned model is CC-BY-NC-SA


Its CC-BY-NC-SA because of the upstream sources used for instruction training. There’s open resources being developed for that that I’ve seen, but probably nothing ready.


That's a limitation of the dataset used for that particular tuned model. Probably not a great choice on their part given that people aren't reading past the headline, but the actual base model is not restricted.


Just tested the 7B-tuned version on Huggingface and it's not great..

    > What features were added in C++ 98?
    < C++98 (also known as C++11) was a major development milestone for C++
    < new gcc 5.1 standard, which was the first version of the standard to include support for AES encryption
    < return types and various other changes

    > Write a C++ constexpr function that implements powf
    < Sure! Here is an implementation of a C++11 constexpr function for the function pow, using C++11 recursion and variadic templates:
    < void pow (int n) { std::cout << "Powerset of " << n << " is " << pow (n, n) << std::endl; }
    < This implementation of the pow function should be possible for any constant integer power of 2 without exception.


Maybe you are not following the right syntax? LLaMA Vicuna and Alpaca suffer if you dont format the query jusy right.


I'm worried about the version hell of relying on HuggingFace's transformers.

In the model card : config.json [1] and generation_config.json there are strings "transformers_version": "4.28.1" and it refers to some common architecture "GPTNeoXForCausalLM" .

Which if I follow the string soft link correctly means, it refers to this file [2] with a long history of commits including some recents [3].

I don't understand how to get out of the version hell. I can install and pin a version of the transformer library, but sooner or later if I chain different foundation models I'll have to do this for each different model card that has its own version.

And then those transformer library version are probably dependent on some pytorch versions.

I don't understand how you can't reliably track a bug, or a change in behavior, or guarantee that the model that you spend millions of dollar training doesn't rust, is reproducible or become unusable due to this mess.

(And also the weights are de-serialized pickled python dictionary which is from a safety point of view equivalent to running untrusted binaries and very worrying on its own.)

[1]https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b/b... [2]https://github.com/huggingface/transformers/blob/v4.28.1/src... [3]https://github.com/huggingface/transformers/commits/v4.28.1/...


I've been diving in lately and while it's not efficient, the only way to do manage is to create a new conda/mamba environment, or a custom Docker image for all the conflicting packages.

For safety and speed, you should prefer the safetensor format: https://huggingface.co/docs/safetensors/speed

If you know what you are doing you can do your own conversions: https://github.com/huggingface/safetensors or for safety, https://huggingface.co/spaces/diffusers/convert


In terms of pickled files, we may ask them to release the models in safetensors. That would be a big improvement.

There's not much we can do about dependencies on pytorch or other python libraries. Perhaps people can make more independent implementations. The redundancy in implementations would help.


The pickles drive me crazy.

Tensorflow saved models are a great way to solve the problem... Save the computation graph and weights, and drop all the crusty code dependencies. I think ONNX models are similar. I expect there should be a Jax equivalent at some point, as Jax is basically perfectly designed for this (everything is expressed in lax operations, which allows changing implementations for cpu/gpu/tpu freely... So just save the list of lax ops).


I'm surprised that some of the cutting edge AI work is not being done in pure Rust. Why even have a binding layer?


Python seems to have won over the Data / ML world.

So I’d wager they use what they and their intended audience know.


Someone implemented llama on Go recently, https://github.com/gotzmann/llama.go

Wouldn't discard a rust implementation of some LLM architecture at some point


Hi! I'm a maintainer of https://github.com/rustformers/llama-rs. We're planning to expand our model support soon.


There’s also ggml, which is quite impressive considering they get whisper and other models to run on a phone.



> And then those transformer library version are probably dependent on some pytorch versions.

They are not, and I dont think the model even cares about the transformers version. I run git transformers/diffusers and PyTorch 2.1 in all sorts of old repos, and if it doesnt immediately work, usually theres just small changes to APIs here and there that make scripts unhappy, and that you can manually fix.


Looks like Nolano.org's "cformers" includes a fork of llama.cpp/ggml by HCBlackFox that supports the GPT-NeoX architecture that powers EleutherAI's Pythia family of open LLMs (which also powers Databrick's new Dolly 2.0), as well as StabilityAI's new StableLM.

I quantized the weights to 4-bit and uploaded it to HuggingFace: https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alp...

Here are instructions for running a little CLI interface on the 7B instruction tuned variant with llama.cpp-style quantized CPU inference.

    pip install transformers wget
    git clone https://github.com/antimatter15/cformers.git
    cd cformers/cformers/cpp && make && cd ..
    python chat.py -m stability
That said, I'm getting pretty poor performance out of the instruction tuned variant of this model. Even without quantization and just running their official Quickstart, it doesn't give a particularly coherent answer to "What is 2 + 2"

    This is a basic arithmetic operation that is 2 times the result of 2 plus the result of one plus the result of 2. In other words, 2 + 2 is equal to 2 + (2 x 2) + 1 + (2 x 1).


llama.cpp has preliminary support already. https://github.com/ggerganov/llama.cpp/issues/1063#issuecomm...


I'm curious if this will give better results than llama 7B? Llama 7B felt like a toy that, while cool to be able to run locally, did not feel useful in any way when contrasted to the state of GPT. Here's hoping for better and/or release of larger parameter models with low performance requirements soon :)

EDIT: my first question times out when ran online, seems like huggingface is getting hugged to death.


Even if it doesn't initially, the fact that it's being released so permissively is massive - stable diffusion was made far more powerful by being hackable at all levels and I can't imagine we won't see the same here.

I imagine things like control nets that restrict output to parsable types, LoRa style adaptations that allow mixable "attitudes", that sort of thing.

Very different underlying architecture from diffusers, ofc. But the action of open source is the same - a million monkeys with a million xterms and so forth.


I'm really hoping for the ability to load in different sets of trained material as embeddings/textual inversions like in Stable Diffusion. Imagine scanning in some of your favorite philosophy and design books and throwing them with small weighting as a little flavor to your answer. The crossovers between LLM and Stable Diffusion type models (like Loras) is such a fascinating space to explore.


This already exists for LLaMA, I'm sure it'll be implemented for StableLM as well.

https://github.com/lxe/simple-llm-finetuner


good looks on the link! I was experimenting with script writing the other day and thought "gee, I really wish I could finetune on Beckett plays specifically".

I don't know if anyone else has experienced this same tipping point, but when I used to have ideas, I would look them up and discover that implementing them was probably out of scope. These days, I think "wouldn't it be cool..." and immediately stumble on a way to make it happen, by accident.


This has been around for GPT models for a while in the form of "soft prompts", which are rather approximate to textual inversion in the Stable Diffusion space.


They have on their team people from Anthropic which have stuff like Claude Instant which is likely running a very light model, ie the tricks from Anthropic have likely been incorporated into the model they’re presenting here


If you're comparing to ChatGPT performance then Vicuna 13B would be a best comparison point for something Llama-based.


Vicuna 13B performance is an order of magnitude below ChatGPT for all but gimmicky conversational stuff. Try giving both somewhat large, task-based prompts with steps and see what happens.


> Vicuna 13B performance is an order of magnitude below ChatGPT for all but gimmicky conversational stuff.

Until you connect it to external resources, I tend to think of anything you do with “brain-in-a-jar” isolated ChatGPT as gimmicky conversational stuff.


ChatGPT is still going to be way more capable when you use it's API to connect to external resources.


Maybe I should have phrased that better! I didn't mean that Vicuna was comparable to ChatGPT, just that it's the best Llama-based comparison you can make (since it's at least been conversationally trained).


Isn't ChatGPT a 165B parameter model?


No. OpenAI haven't disclosed parameter count of GPT-3.5 or GPT-4, which are models used by ChatGPT. You may be thinking of GPT-3, which is indeed a 175B parameter model.


Ah, interesting. Thought GPT-3.5 had the same structure as GPT-3, for some reason. GPT-4 would obviously be different.


GPT-3.5 is likely a finetuned Curie 13B using output from the full size GPT-3 175B.


the models: https://huggingface.co/stabilityai/stablelm-base-alpha-3b, https://huggingface.co/stabilityai/stablelm-base-alpha-7b

There are also tuned version of these models: https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b, these versions are fine-tuned on various chat and instruction-following datasets.

The Github repo mentions that the models will be trained on 1.5T tokens, this is pretty huge in my opinion, the alpha models are trained on 800B tokens. The context lenght is 4096.


These models are huge. I assume they are not quantized down to 4bits yet.


Quantized versions will pop up on huggingface very soon, if they arent already there. It takes basically no time, much less than something like a alpaca finetune.


The model is very good. Looks like it's using GPT-NeoX structure, and the finetuned 7b version is very quick, and beats everything else in the category (vicuna, oasst, openchatkit, gpt4all, etc...) in my "write a poem about cheese" benchmarks.


I like how 'fun' benchmarks like yours seem to be so much more effective at judging model performance than all the traditional NLP benchmarks.

I still think the best way to compare too models is to simulate a rap battle between them, then it's immediately obvious who wins.


Yep, rap battle is how I knew Bard is nowhere close to ChatGPT.

In the past whole world was watching Kasparov vs Deep Blue . This time we will do Eminem vs LLM.

What a time to be alive!


“also fine-tuned the model with Stanford Alpaca's procedure using a combination of five recent datasets for conversational agents: Stanford's Alpaca, Nomic-AI's gpt4all, RyokoAI's ShareGPT52K datasets, Databricks labs' Dolly, and Anthropic's HH. We will be releasing these models as StableLM-Tuned-Alpha.”

They have released the 3B and 7B of both the base and instruction tuned models. 30B and 65B in training and released later.


When will we see gglm-versions of these?

Good job on openAI to sell out in 2022. It was truly the end of the line.

No matter how bad these model releases are , they are certain to get awesome soon with everybody hacking around them. The surprising success of MiniGpt4 with images shows that openAI's GPTs don't have some magic secret sauce that we dont know of.


GPT-4 is still much, much better than any of the competition.

I guess we'll see once we have a 175B version of StableLM though, presumably that will at least easily beat GPT-3.


cformers already has ggml support because it's the same architecture as GPT-NeoX.

llama.cpp just added preliminary support three hours ago. https://github.com/ggerganov/llama.cpp/issues/1063#issuecomm...


This is absolutely huge. LLaMA weights leaking was a big deal, but you couldn’t actually use them without attracting Meta’s ire. Would love to see some benchmarking vs. LLaMA and GPT.


Do we even know if model weights are copyrightable or not?


No but unless you enjoy being sued by Meta you have to act as if they are.


I was asking myself the same question and am 99% sure it isn't protected by IP laws. It is another story for the training data and the source code used to run the model.


A JPG is copyrightable and that's just an array of numbers. Model weights seem pretty similar as a class.


It's not the JPG data you can copyright, but the recognisable image it produces. Every time you re-save or resize the image, the data changes, but the recognizable image remains.

It's not clear how this process applies to model weights. Once you run another training epoch on them, the data has changed. What is the essential copyrightable, trademarkable or patentable thing that remains? A legally untested question for sure.


It depends on how the JPG is made. Some JPGs are not copy writable, like that picture a monkey took of itself. Model weights are probably (legally) more like a selfie of a monkey than a photographer's photo.


Couldn't you just transform the weights in some predictable way and then untransform them before use in runtime? The weights at rest would be completely distinct from the meta weights.


But the data inside the jpg isn't generated by a machine, model weights are.


All sorts of llama weights are on huggingface now.


What is IRE?


It's a department at meta: Intellectual Rights Enforcement. /s

"Ire" is a synonym for "anger" or "wrath"


That's actually what I thought it might be. Thanks lol.


Your initialization hinted it. I must say, if Meta had a IP-litigation department called "IRE" or "MIRE" (Meta Intellectual Rights Enforcement), that would be a little on the nose.The unofficial motto would be "We will bog you down in the courts for years"


It’s a noun meaning anger or wrath. Usually it’s specific not just to triggering anger the emotion, but an angry response. So attracting Meta’s ire means attracting an angry response from meta - like the legal response users of LLaMA have triggered.

It’s not an acronym.


Ire is the result of irritation.


Are you saying we shouldn’t scratch Meta’s itch?


Oh this is good. :)


The real thing we should be asking is if the 3B or 7B model is capable of correctly answering this question.


anger. probably a lawsuit in this context


There were no lawsuits around LLaMA. They used DCMA requests to take down some GitHub and HuggingFace releases but the majority of releases (Alpacas and other animals) was left alone. It was enough to prevent commercial use, though. Edit: a lawsuit would be quite interesting, to clear up many things around how copyright works for LLMs.


They're going to court with the guy who made llamaDL actually.


Internal recurrent expenditure


What does this mean in context of https://futureoflife.org/open-letter/pause-giant-ai-experime... , where CEO of Stability AI had left a signature on the petition(it's 8th in the list)? Do they want time to catch up?

Regardless of that, I'm glad that StabilityAI enters the field as well and releases models for public use.


It doesn't mean anything. The letter says:

> Therefore, we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.

StableLM is not an AI system more powerful than GPT-4, so the pause does not apply.


To whom, specificlally does the "pause apply"

Because, I can tell you that no state-agent is going to pause, but amplify.

Israel, Iran, China, Russia and any self-respecting NATO country is secretly pushing their AI as fast as fn possible.

You think the US is pausing anything with a 1 trillion dollar defense budget, especially when this AI has surfaced?

The NSA has been projecting these capabilities forever....

Look at the movie "enemy of the state" as a documentary on capabilities as early as 1998... now look at the fractal spiral that we are witness (and victim) of.


As far as we know, OpenAI has the most advanced LLM in the world right now. Just because governments can fund all sorts of expensive military contracts or spying networks doesn't mean they have the cutting edge AI research. Lot of times it's companies that make the breakthroughs in tech.


Just so I understand how to come to that conclusion ; how did you come to that conclusion?

Do we have any transparent measure?

(My point is; do we think that what we can see now is the pinnacle of what is capable? or is this kindergarten to the PHDs that we cannot see in this field?


All these state actor stories imply that state actors are anywhere close to GPT4 levels. Do you have any evidence for that ?


My general test to judge whether a capable state actors has an ability is to answer this simple question: "Ignoring resource and morality costs, is it at all technically possible?"

If yes, then those actors almost certainly have this ability developed already and perhaps even deployed. If not, then maybe. This test has held up remarkably well in my experience.

And that's to say nothing about products that already exist: I would be extremely surprised if the US government and China didn't have a GPT4-level AI trained within one week of OpenAI's GPT4 announcement if not before.


> "Ignoring resource and morality costs, is it at all technically possible?"

If it were that simple, SpaceX wouldn't have revolutionized spaceflight.

Sometimes private actors have talents or organizational structure that gives them an edge in innovation that public actors can't keep up with for a while.

All competitors to OpenAI we've seen are struggling to reach GPT-3.5 level, let alone GPT-4 level, with years of catch-up time. It's not ridiculous to imagine that state actors are struggling as well.


You do realize that Google was the undisputed behemoth in AI research for a decade and now they are scrambling to catch up with OpenAI and are still not delivering.

You're saying that governments are both doing this secretly and more efficiently than Google and OpenAI ?


>>Do you have any evidence for tha

I do not, yet I am a SUPER SKEPTIC --> means I am a conspiracy weirdo that doesnt believe a gosh darn thing any government says, but I am also a technologist who is not ignorant to things which have been built in secrecy.

Thus ;; I summize that some crazy shit is going on with AI behind the scenes that we are not privy to -- and if one persons reality of "you cannot believe that they* are doing anything with AI that we dont know about"* ... to paraphrase a few "A nuke is literally about to fall on our heads"

--

We are moments away from realizing that it ALREADY happened....


I'd love to hear from someone knowledgeable in the area but my gut instinct was the majority of people / organizations signing on to that 'pause' just wanted to catch up.


Either that or the guy didn't actually sign it. It's not like anyone actually checked who it was.


He's mentioned signing it in interviews. He discusses it here: https://youtu.be/SKoYhcC3HrM?t=1344


What is the actual LICENSE?

* HuggingFace shows CC-by-NC https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b

* Github is Apache 2.0

https://github.com/Stability-AI/StableLM/blob/main/LICENSE


CC BY-SA-4.0 https://creativecommons.org/licenses/by-sa/4.0/

"You are free to copy, redistribute remix, transform, and build upon the material for any purpose, even commercially. No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits."


Sorry, I missed that the finetuned models are NC-by-sa-4.0 and the base models are cc-by-sa-4.0.


NC is just for the fine tuned model


Missed opportunity for Stability to call themselves “ClosedAI” then being the opposite of what the name suggests.


I think there's a misunderstanding here (and on Alpaca) about exactly what non-commercial means with respect to software licenses. Here's some related discussion: https://opensource.stackexchange.com/a/12074

The CC-NC licenses cover modification and redistribution ("use" in the license). They apparently have no bearing on whether you can personally run the software. And of course the outputs of the software are not copyrightable.


The base models are licensed under CC BY-SA-4.0 (not CC-NC).

"You are free to copy and redistribute the material in any medium or format, remix, transform, and build upon the material for any purpose, even commercially."

https://huggingface.co/stabilityai/stablelm-base-alpha-7b


I’m aware. The next section of the readme talks about how the fine tuned models are dictated by Alpaca’s license. I am arguing this is not the case.


Oh, it shouldn't be the case. But anyone can easily finetune the Base model on the same datasets in a couple of hours on any consumer GPU with 8GB of VRAM and release it under the commercial license (using https://github.com/johnsmith0031/alpaca_lora_4bit).


Well that was a trip.

I asked StableLM-Tuned-Alpha-7b to translate the Swedish phrase "Skaplig bögpulka” into English (because ChatGPT and New Bing refuse to do so as a result of their puritannical US censorship) and it confabulated wildly.

It made up an Estonian folk song and somehow we ended up on the etymology of the well-known Swedish word "Skeletor".

This one is not ready for prime-time, but I have hopes. Someone please make a model that doesn't censor. I won't be paying one thin dime for this stuff until it is censorship-free.


It's trained on English corpus exclusively AFAIK.

As for whether it is ready for prime-time, it is an "Alpha" of an uncompleted training run. So it's not finished cooking.

Also, that is the 7B model. They're cooking 15B, 30B, and 65B right now and planning to start 175B soon.

For comparison, 15B is already larger than GPT-3.5 (which is likely a finetune of Curie 13B) while 175B is the same as full size GPT-3 v1 175B which 13B LLaMA already beat on benchmarks. So we can expect all four models larger than 7B to be better than GPT-3 when they are done training (at least in English).


Finally, the true race to zero is now on.

Stability and others are already at the finish line in the race to zero. O̶p̶e̶n̶AI.com cannot get any cheaper and eventually will have to increase their prices.

There is no escape and as I said before, open source AI models will eventually swamp and survive longer and intergrate faster than even O̶p̶e̶n̶AI.com's ChatGPT.

Expect open source models to get smaller and even better such that it will fit in an iPhone, on device.

Stay tuned.


Why are you referring to OpenAI as ai.com?


I think he's just emphasizing that OpenAI is in fact not open, thusly it's crossed out.


Where does "openai.com" go and who owns it. Now where does http://ai.com go and who owns it?


Oh, I just realised ai.com redirects to ChatGPT.


Issue to add support for these models to llama.cpp: https://github.com/ggerganov/llama.cpp/issues/1063


Demo is overloaded. Here is Colab link if you wanna check the quality: https://colab.research.google.com/github/Stability-AI/Stable...


This notebook will need high ram and a premium GPU. It blew past the 16gb GPU that comes with Colab Pro


Your session crashed after using all available RAM. If you are interested in access to high-RAM runtimes, you may want to check out Colab PRO


You can try 8bit


Using 8-bit still runs out of RAM for both the 3B and 7B models. It's unclear if it's because it still uses more than the available RAM, or if it's just quietly not using 8-bit since it's not implemented.


I found it a bit difficult to get answers from:

Q. Can you recommend a good vegan restaurant in Sydney, near the water

A. I’m not a vegan, so I don’t have a favorite restaurant. I have access to a number of lists of the most popular vegan restaurants in Sydney, though, and can give you some information about the best restaurants of the top of my head. Please let me know if there’s anything else I can do to assist.

Q. Ok what are the 3 most popular vegan restaurants with a view of the harbour

A. The most popular vegan restaurants near the water are:

Q. Are what?

A. The most popular vegan restaurants near the water are

Q. Name a vegan restaurant

A.

1. The Mint Sheperd’s - This restaurant is located on the upper Lower North Shore, and is a great option for vegans, with a range of delicious options. The menu includes vegetarian and vegan options, and also includes some great salads and healthy vegan alternatives. 2. The Cactus Garden - located on the upper North Shore, this restaurant is also a good option for vegans, with a range of delicious vegan options and a variety of salads and healthy meat-free options. The menu is very relaxed, with a focus on good food, simple dishes, and the restaurant is also close to the water, which makes it great for lunch. 3. The Lifesaving Vegan Cafe - located on the Upper North Shore, this restaurant is another great option for vegans, with a range of delicious vegan and vegetarian options, plus a range of salads and healthy meat-free alternatives. The menu is very relaxed, with a focus on good food, simple dishes, and the restaurant is also located in the close

This is almost a great answer! Just that these restaurants don't exist.

I had 100x better experience with https://news.ycombinator.com/item?id=35583349, running locally!


If you think that web llm is good, you should try gpt4-x-alpaca 30b.


For people wanting to run it locally, you can fit the 7b model (just) into a 24GB VRAM GPU (e.g. 3090/4090). The 3b model appears to be much more reasonable, but I would say the output is.... of limited quality based on the few tests I've run thus far.


Gist that mostly just takes the notebook Stability AI have in the github repo, and turns it into a script you can run locally after installing a few dependencies from pip:

https://gist.github.com/cmsj/2d6b247ad4fc8f15011105feeda763e...


I suspect the community will start creating lower precision/quantized versions of the model very quickly. LLaMa 30b quantized to 4 bits is runnable on a 3090/4090.


Don't need a GPU to run the model, you can use your RAM and CPU, but it might be a bit slow


It's very slow, and for the 7b model you're still looking at a pretty hefty RAM hit whether it's CPU or GPU. The model download is something like 40GB.


There's already support in llama.cpp. It runs faster than ChatGPT on my old laptop CPU.


7B quantized down to 4 bits will run on a 2060.


This doesn't perform well at all against questions that I've presented to ChatGPT 3.5 and received far better responses for. What are people seeing from this that is generating so much excitement?


It isn't reasonable to compare the 7B model with the likes of ChatGPT 3.5 which has 175 billion parameters. Stability does have plans to produce a comparable model, though. Right now they're working on models up to 65 billion parameters.

This is just the very early stages of development, and the exciting thing is that it's something you can actually run yourself and it's freely usable for commercial use.


The 7B model just barely fits in a 3090, but I've yet to get useful output; it usually gets stuck in a repetition loop.

Spent some time playing with sampling parameters, to no effect. Has anyone else had a go?


Make sure you are using the instruction tuned model. The base model will be difficult to prompt.

It works in 8-bit with about 12GB of VRAM usage. Here's sample code:

https://gist.github.com/AlexanderDzhoganov/a1d1ebdb018e2e573...


If you're getting stuck in a sampling loop, try raising the temperature (0.6+) and/or increasing the repetition penalty (1.1-1.2 often work well).


I feel "The Pile" needs a lot more to match what OpenAI has trained their models on. The mysterious "Books2" and so on.


I think OpenAI has a few hidden advantages that are not obvious at this point. It could be additional training data, filtering/preprocessing that data, some changes to the architecture, who knows? None of the open source models are even close to GPT 3.5, what to speak about GPT 4? I've tried everything and the 60G llama variants so i'm not sure it's about number of parameters. They definitely have some hidden sauce.


On the off-note, can anybody tell me what's going on with embeddings, & vector databases? Certainly it would seem that forward-pass completion is pretty much solved, & a smaller, better model will appear eventually. Let's say you even managed to solve both complete() and embed() but what do you do with it, how are you going to organise, query, and multiply this dataset? Now the question I know that text-embedding-ada-002 has twice as many dimensions as mainstream Sentence transformers. Do we need all the extra dimensions? If not, how do I make it work better for my specific dataset with lots of jargon and abbreviations and stuff like that? What are the hardware requirements for that? I.e. could I do a fine-tuning job on some specific jargon-heavy text to get better embeddings for them? For one, the more I look into similarity-based use-cases the more I see that it's not normally speaking "top-percentile nearest-neightbour search" but the data is also terribly relational, i.e. it's probably like a slowly changing dimension, and there's a tree traversal type structure in how documents are generated as output from other documents as inputs? So you kind of have to think about these complete/embed ops both in aggregate; for batching but also in particular, from the cost/reward ROI type calculation. Not just in aggregate but also in terms of memory usage patterns to further optimise layout— tiering and stuff like that really comes to light.

Also: vector database shilling on HN is getting out of hand; multiple companies literally plugging every mention on the radar, some actively begging for upvotes. Looking at it all makes you really appreciate pgvector[1] to a point where you would be more willing to buy 3.2 TB of high-bandwidth NVMe and dedicate it to a large IFV index than ever have to deal with all of this "purpose-built vector database" bullshit.

[1]: https://github.com/pgvector/pgvector


> Do we need all the extra dimensions?

This discussion seems relevant: https://www.reddit.com/r/MachineLearning/comments/12q8rp1/di...


Yes, you need all of the dimensions. All of the dimensionality reduction techniques, including SOTA ones (UMAP or better) are going to massively harm your embeddings.


Perhaps I didn't word by question correctly, I'm looking to compare capability of Sentence transformers vs. OpenAI Ada-based embeddings relative to their respective dimensionality?


No you don't need the extra dimensions and OpenAI is generally the worst at everything except being the first to market.

Also, ditto your comments on vector database shilling. Vector Databases are just like any other database in that I'll host them myself. I don't need a dedicated VC backed company for a database.


Just because you say it doesn't make it true.

Dimensionality reduction is an extremely destructive operation. Losing even the wrong single vector component of an embedding is massively damaging to down stream performance.


Why does every LLM use "Hope this email finds you well" to start virtually every letter and email?


Have you tried something like:

"Never start an email with 'Hope this email finds you well'"

in your first prompt.


Anyone have a sense for system requirements on running this locally? StableDiffusion was too heavy for either of my machines (which are admittedly not very powerful) - I'm not seeing much in the docs to indicate whether or not this is more or less intensive to run locally.


If you can run any models on llama.cpp, that might be a good indicator of which StableLM models you'll be able to run.

I easily ran 7B int 4 ggml models on an MBP with 16gig RAM. Same works on a MBA with 8 gig RAM, but you'll have to not run any other memory-hogging app.


In 4bit 7B requires 6GB of RAM and runs at ChatGPT speeds on CPU (with llama.cpp).

The 15B model coming out soon will require 12GB of RAM and still run at good speeds on CPU.


On top of what others said, unlike SD, its not unusable on CPU... just very slow.

Stable diffusion will run on a 4GB GPU though.


The tuned 7B model is around 33 GBs, so you'll need a PC with that much VRAM or RAM. I haven't tried to load it on text generation ui though.


No way, I thought this was months away still. So excited


It's released but still cooking in the oven as we speak.


Completed training is. These are early "Alpha" checkpoints.


Thanks for breaking ground here and being truly OSS


I look forward to a future where almost all software would allow me to plug in an LLM API:

- browser: ad removal/skipping

- RSS: information aggregation

- recommendation systems

- games: customized NPC scripts; AI opponents

- home automation: personal butler

Hopefully, there would be more than one base-layer LLM providers to choose from.


My prayers have been answered


Amazing but the instruction fine-tuning is still a huge challenge for businesses since what is released cannot be used for commercial purposes. Instructions are much more useful.

I have a feeling that there are probably some people who will look at the "commercial okay" license for the first part and in their mind that will somehow make it okay to use the instruction-tuned ones for commercial purposes.

Maybe we don't really need Instruct stuff? Because it seems like its a huge amount of redoing work. I wonder if the OpenAssistant people will start building off of these models.


The instruct tuning can be done with several open datasets at minimal cost. Should be easy for someone to create their own open model.


How?


You can finetune 7B in a couple of hours on a $200 3060 with https://github.com/johnsmith0031/alpaca_lora_4bit



That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned


I wonder what happens if you just feel that dataset back into another LLM to re-write it and filter out the low quality items? IS there still any connection to the original copyright? How would that even be proven?


GPT4 supposedly cost USD 100M to train and seems much smarter than anything else released. is there any chance to get this much money for open source training? rea lly big EU grants or some philanthropist or?


It would cost a tenth of that to replicate GPT-4 by training on its outputs.

It's quite likely that finetunes of the forthcoming larger StableLMs will rival GPT-4. (and by that time GPT 4.3 will be shipping)


So, it can write very coherently but not sure it excels at coding tasks.

For example:

User: Write Python code for evaluating the integral of f(x)=x^2.

Assistant: Sure, here's an example of Python code to evaluate the integral of a function `f(x) = x^2`: ```

# Define the function

def integral_of_f(x):

    return x**2
# Evaluate the integral

result = integral_of_f(3)

print("The integral of f(x) = x^2 is:", result) ```

In this code, we define the function `integral_of_f` as a function that takes a single argument `x` and returns the integral of `x^2`. We then evaluate the integral by calling the `integral_of_f` function and passing in the value of `x` as an argument.


One question I have is- can we expect GPTQ quantization down to 4 bits (or lower) for these models, the same as for llama (which is partly why llama was so amazing?)

I'm sure the answer is "yes", but I wonder if there's anything that can be done that could make these models more "friendly" to quantization like this from the very beginning.

It's absolutely incredible that you can cut the size of a model in half... and then cut the size of the model in half *again*... and it fits on your consumer GPU with only a small cut in accuracy.


Yes, these models are identical in architecture to GPT-NeoX which has already been GPTQ quantized. So you can quantize these with GPT-NeoX quantization scripts out of the box.

There's also support in llama.cpp already.


Has anybody figured out the hardware requirements yet?

This could be the beginning of something beautifull and i am always impressed by the work stable diffusion does!

They are the true openAI (without empty promises in the middle ;))


>Has anybody figured out the hardware requirements yet?

In 4bit 7B runs in 6GB of RAM at ChatGPT speeds on my old CPU.

The hardware requirements are more or less identical to llama (the model is slightly larger, ~7.9B parameters vs ~6.7B).


Great to see Stability release this with a great license as well. Any idea on the usecases for the 3B model? Will a model that small suffer heavily from a 4bit quantization?


> Will a model that small suffer heavily from a 4bit quantization?

From my experience with quantized 7B llama models, avoid 3B if you can. Without benchmarks, I think this is a decent rule of thumb.


Generating either a 0 or 1 in response to something? Maybe useful in some testing scenarios? Honestly don't know either. New lorem ipsum?


That would be 1bit quantization. In reality quantization under 8bits is done in smart ways which result in higher effective output quantization and lower effective memory size quantization.

For example, bucketing identical groups of weights and then reusing one bucket for all the identical groups lowers the effective bit quantization at the memory level while retaining the uncompressed quantization quality.

There is literature on effective quantization levels below 1 ("So called 0bit quantization). But even then the actual weights are typically 2-4 bits and there is just a lot of reuse of weights going on.

Another neat trick is to line bins of weights up in order of lowest to highest weights and compute a function to produce a zero offset throughout the bins of weights; such that 0010 in one bin and and 0010 in another bin are unsampled to 16bit and then have different offsets added, maintaining uncompressed 16bit performance without the memory overhead.

There are many more tricks like this and many still to be found!


You can use it as the assistant model to a large model, it's called speculative sampling. You generate text with the small model and validate with the large one, ensuring no deviation occurs. Speedup of 2.5x

https://arxiv.org/abs/2302.01318 (DeepMind)


Something that I find very sad and is but discussed enough is using The Pile as training set (ok, for this one was will have to wait for the report). The Pile is a dataset of mostly English documents. So it won't have the multilingual capabilities of other models, like the ones by Open AI or even FlanT5. We are leaving a lot of people excluded from being benefited from open source models because of that...


I am having fun with "StableLM-Tuned-Alpha-7b Chat":

how much does a ping pong ball weigh?

- A ping pong ball weighs 1.5 ounces.

how much does a ping pong ball weigh, in grams?

- A ping pong ball weighs 45.14 grams (0.00000578 ounces).

how much does a ping pong ball weigh, in grams?

- A ping pong ball weighs 0.45 grams.

how much does a ping pong ball weigh, in grams?

- A ping pong ball weighs 0.45 grams (45.14 grams - 0.00000578 ounces = 0.0045 grams).


TBH that is a use case where shooting off to another API (like whatever WolframAlpha runs on top of Mathematica) is better than trying to do everythring in a model.


Finetuning and training models of this size from scratch is now relatively inexpensive and accessible to anyone, especially with all these datasets becoming freely available. This is just the beginning of smaller LLMs that anyone can finetune to achieve high quality performance on particular tasks.


Are they releasing the weights for download? The links to HuggingFace in the readme are giving me 404. This dataset they built on-top of "The Pile" sounds interesting - looking forward to evaluating their claim that 3-7 billion param models can perform on par with 175 billion param GPT-3



Did they claim this? I didn’t see that claim made in the above post.


"The richness of this dataset gives StableLM surprisingly high performance in conversational and coding tasks, despite its small size of 3 to 7 billion parameters (by comparison, GPT-3 has 175 billion parameters)."

So they did not explicitly say it is comparable, but implicitly compared the two. I'm curious to evaluate what "surprisingly high performance" means exactly.


How long until we can do folding@home-style and train open-source models as a group?


I don't think there's anything stopping us from doing it today


The AI war has officially begun.


No, that comes when China catches up in the quest for superintelligence.


There are plenty of authoritarians in the US to conduct warfare against our institutions that are up to speed. We don't need to wait for anyone else overseas to get this party started.


I don’t think China has enough skilled AI workers to compete in any quick timespan. Top talent doesn’t want to live and work there generally.


I'm curious why you'd think that. China as a country has many people to start. Some percentage of these people will end up in AI. Assuming people from all countries are roughly equally intelligent, the numbers clearly favor China. Universities over there are quite good, there's a pretty strong "work hard" mentality I see from all our Chinese students. Plenty of Chinese graduating or starting university these days during the AI hype peak. China as a country isn't sleeping on AI either. I think China as an AI hub looks quite promising. Anecdotally, China also retains quite a lot of talent or people go abroad to study and return to China. Compared to some European countries or India that "leak" a lot of talent to the U.S. I think China is quite a bit more stable.

On the hardware side, things tend to be produced there as well.


China definitely "leaks" a lot of talent to American companies - most AI papers that I've seen from respected Western universities include at least one Chinese name.

One challenge for China has been the university enrollment rate. While in Western countries half of each cohort has been going to university for decades, China is not there yet. In 2019, just 17% of Chinese adults have degrees compared to 44% in the US.

So the large Chinese population is offset by its relative lack of access to education, while the US can draw from its own highly educated population in addition to attracting the best and the brightest from the rest of the world, including China.


China is demonstrably capable of training GLM-130B. Skills involved seem comparable to Stability AI.

https://keg.cs.tsinghua.edu.cn/glm-130b/


Stop worrying about China…wow


I’m more than ready. It’s exciting to see what can come out of this.


I love stability ai so much


> "As an AI language model, I'm not capable of providing you with a complete program."

Why is it like this? I thought that wording was something specific to OpenAI's RLHF training?


The "Tuned" version is finetuned on the output of ChatGPT, which is why it is non-commercial licensed.

The "Base" version does not do this (and is commercial licensed).


Thanks! It's sad that this is the answer, but I understand. I wish sometime we can access the base models of GPT-4 without the 'As an AI language model' guardrails!


That's not going to happen. But it's likely that StableLM 175B will rival GPT-4.

Also, you can finetune Base StableLM yourself on any consumer GPU with 8GB of VRAM in a couple of hours and it will be commercial licensed. (using https://github.com/johnsmith0031/alpaca_lora_4bit)

You can even use the exact same dataset StabilityAI used. (Although there are better ones, with more GPT-4 data.)


I’ve recently downloading diffusion bee to my Mac for image generation with stable diffusion.

Excited for when I can do the same with the StableLM.


You were able to do the same with StableLM before you commented here. It already has preliminary support in llama.cpp, which runs llama-7B in 5GB of RAM off of CPU faster than ChatGPT. https://github.com/ggerganov/llama.cpp/issues/1063#issuecomm...


I guess these next years is just pumping out language models, developing for niches where chatgpt is not very good at


Are the XL 1024x1024 enhanced models released for stable diffusion yet? That looks really good from the testing.


TBH SD 2.1 was kind of a bummer, and the community is largely staying on 1.5... I am not getting too hyped.


Is there something like Moore's law for LLM's that will eventually turn them into ubiquitus compute?


There are scaling laws which show LLMs can benefit from an order of magnitude more training data than the current state of the art, suggesting that far beyond GPT-4 level performance should be possible in 4GB of RAM with enough training data and compute time.

So, kinda?


I love the installation of this! Just open a Python window and it downloads whatever it needs. Well done!


Repo contains 2 yaml files, 1 notebook, and 1 drawing of a parrot. Have I missed something?


I think it is really great that they are supporting this to be an open source movement


> The context length for these models is 4096 tokens.

!!! And I was excited that llama gave us 2048!!


Rumor is RedPajama is going to have upwards of 60k token context by using Hyena: https://arxiv.org/abs/2302.10866

But it's just a rumor. We'll see.


How can one get started adding RLHF to something like this?


Pretty soon there are going to be desktop apps for that


Open-assistant.io


So, as a user, is this a free chatGPT?

Free in money and free of woke?


Anyone know if this will be added to the Stability API?


Is this a joke? I tried the HuggingFace demo, and it just spews out ridiculous replies for everything.

Examples:

>> What can you tell me about Therese Johaug?

> The person you’re asking about is Therese Johaug (The Judge) a Norwegian jazz violinist. She played with many famous musicians, including and especially Charles Mingus. She was born in 1954 in the town of Kolsås. She is married to the violinist Mark Sörensen.

Everything in that statement is wrong. She is a Norwegian cross-country skier, born in 1988 in Dalsbygda, not married.

Ok. Let's try something easier.

>> What can you tell me about Edvard Munch?

> -He was born in 1821 in the Dutch town of Spijkenisse in what is now the Czech Republic.

> -He studied art at the Academy of Fine Arts in Brussels, Belgium, where he made a portrait of a young girl called Anna.

> -He then moved to Norway to practice with his friends and Dano-Norwegian painter, Christian Krohg, the two of them known as “The Scream” or “Kristine in the Black”.

> -There he met the painter, Ambroise Thomas, who became a very close friend of Munch’s and would often take him to the beach to watch the boats.

> -One day in 1883 Munch bought a house at a deserted place in the outskirts of Oslo, in the southern region of Norway.

> -The house, called “La Chute”, had no inhabitants at all, it was just a huge pile of rocks, but this Munch loved it so much that he named it “Lochte”.

> -He took the name “Munch” because he thought it was German, but it was totally French, he had the idea that he could paint the ocean and the sky, so he named his new painting “Moulin Sur l’Oiseau”.

> -At the same time, he would soon start creating works of art that would be strange and not make any sense, for instance, one summer day in 1870 he painted what is known now as “The Scream”, it has many other paintings with the same title, “The Scream” is a drawing of a young girl, she is holding an ax and is crying in the scream position.

> -He had the idea of “The Scream” in one sitting in 1883, but it took six months to paint.

> -He stayed in France in the studio of Christian Krohg when he was still Edouard de Badeley, one of the greatest painters of his time.

> -In 1891 he returned to Norway to France and lived there for the rest of his life.

> -He died in Paris in 1900.

Christ on a bike.

My favourite must be that "The Scream is a drawing of a young girl, she is holding an ax and is crying in the scream position."


I get the sense that any accurate portrayal of the model is getting downvoted. Sure, it's going to be a while before open-source models catch up with OpenAI. I'm glad that Stability is working on this, and I look forward to trying out their larger models that they're training right now, but we should be able to discuss accurately the capabilities of an LLM without just automatically jumping on the hype train.


That's fair, but this:

> Is this a joke? I tried the HuggingFace demo, and it just spews out ridiculous replies for everything.

Is a shallow and uninformed criticism. The parameter count alone should inform people that this isn't going to match GPT-3.5 or GPT-4. Not everyone knows that of course, but starting a comment with "Is this a joke?" is precisely the wrong way to start such a discussion.


Yes, go try asking GPT-3 Babbage these same questions and let me know if it is even remotely coherent! That is the fair comparison.


I dunno if OP tried raw 3B, but finetuned llama 7B is much better than that.


Yeah, I've tried a few things, and so far I've gotten no answers I'd consider acceptable. Here's are some examples.

-----

My question:

What are some popular over the counter antihistamines available in tablet form? Please name three, and for each one, give the generic and brand name, and NO OTHER INFORMATION AT ALL.

Its response:

Three antihistamines that are common over the counter tablets are:

1. Claritest

2. Zyrtec

3. Marzolast

Brand names for these tablets are:

1. Claritest - Contains diphenhydramine hydrochloride, which is an expectorant and cold and allergy relief medicine.

2. Zyrtec - Contains diphenhydramine hydrochloride, which is an expectorant and antispasmodic medication for motion sickness.

3. Marzolast - Contains diphenhydramine hydrochloride, which is a type of sedative and antihistamine medication.

Things wrong with this answer:

* Claritest isn't an antihistamine. It's a non-invasive prenatal DNA screening test. (https://www.genpathdiagnostics.com/patients/womens-health/ge...)

* Marzolast isn't an antihistamine. It isn't anything. It only has one meaningful hit on Google, which is a coupon code for a Udemy course.

* It says Zyrtec contains diphenhydramine hydrochloride, which is an antihistamine, but the wrong one. Zyrtec contains cetirizine.

* It gives three different, conflicting explanations for what diphenhydramine is. The last one is right, and the first two are totally wrong.

* I asked it (pretty clearly!) to give no other information, but it gave a lot.

-----

My question:

What film made red staplers famous?

Its response:

Red staplers became famous for playing a child-eating monster named Dracula on the “Dracula” television series in 1928.

Things wrong with this answer:

* Television series in 1928?! There basically weren't any except an experimental one called The Queen's Messenger.

* Red staplers isn't an actor. There's no playing anybody.

* I'm pretty sure Dracula doesn't eat children. Wrong type of monster.

-----

My question:

What’s the difference between a fork and a spoon?

Its response:

A fork is a utensil that is used to pick food up from a plate with your fingers, such as with food items that are cooked. Fingers are made of many different parts, including bones, muscles, and tendons. The two most basic kinds of utensils are spoons and forks. Forking is typically done by the dining creature’s hands and includes handling both the bread and the fish.

Things wrong with this answer: I'm not even going to bother this time.


This is a model the size of GPT Babbage, which is not even able to string together two coherent sentences.

It's also only an "Alpha" model partway through its initial training run.

The larger models (not even trained enough for Alpha release yet) should by all accounts beat GPT-3.


Agreed, it doesn't even do the basic stuff that GPT-2 did, how can anyone be impressed by this?


How will they generate revenue?


By soliciting excited investors for money!


Seams so ^^ Too bad, that the open source model has this problem. Hope they will survive and find a fitting business model.


what's the cut-off knowledge for this?


OpenAI really needs to donate its name to Stability.


[flagged]


The base model is CC-BY-SA commercial licensed and you can finetune it in a couple of hours on a consumer GPU with 8GB of VRAM using https://github.com/johnsmith0031/alpaca_lora_4bit




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: