And this is only the beginning, I don't believe for a second that "AI" will belong to the big corporations.
These LLM models have no benefit from running "in the cloud" except for processing power. Lots of disadvantages though, especially in data safety, "leaked chats to other users", privacy, bans etc.
We really need to one of the two things happen. Either the models are somehow able to run in regular CPU DRAM or we see the GPU makers to finally give us sensible amounts of VRAM. It's a travesty that a card I bought in 2016 still has more VRAM than many of the flagships being sold today. This card is 7 years old for goodness sake!
I think the biggest issue with VRAM right now is that it's very expensive compared to regular system memory, perhaps with the exception of the DDR5 for the latest CPUs.
But there should also be absolutely no issue in making a commercial TPU like google has internally for inference with more but less expensive ram and sell it. There surely must be a market now with these new models.
I think part of the issue is that consumers have never needed more than 12 GB of VRAM before. Game developers just don't have requirements that high. Crypto mining also doesn't require that much.
Now that there's clear demand in the hobbiest market for GPUs >100GB of vram, its more likely that manufacturers will step up with cheaper solutions.
It all comes down to the VRAM. The average person will never have the money to buy a single H100 96gb let alone a DGX server or it's future equivalents
You get fundamentally more powers when you add more VRAM in ways that are just hard to explain to folks outside of this ecosystem. Everything around the VRAM are basically small details in comparison
I know it’s not average person money, but the average person is not looking for hardware to run inference on LLMs.
A100 seem in the €15k ballpark and H100 double that.
Lot of money but I am actually surprised. A dedicated regular guy could buy this. I mean people buy cars and don’t really need them either. Again not saying it is a bargain, but it’s not billionaires only territory and that is good news (it’s early days!).
"Everything around the VRAM are basically small details in comparison"
...with current algorithms and our lack of understanding and insight in to how/why they work on a deep level or what intelligence and consciousness is.
With time hopefully all of these will improve and perhaps future AI's of good quality will be affordable to mere mortals.
Plus, I assume people want to have their assistant on their phone, not their desktop computer. So until everything can run locally on your phone, I think people will prefer the cloud versions.
I assume some people want their assistant to work on data they don’t want to share with megacorps and governments, and some of those people can figure out how to make their phone talk securely to a home server over the internet.
Well, corporate organizations developing LLMs and open source LLMs are not mutually exclusive. LLMs more lightweight so as to run on more mainstream hardware, but not as capable as corporate ones, may still be very useful. Running the software on site has many advantages as you outlined above, by highlighting the disadvantages.
The OpenAssistant was/is trained on well structured data from humans for exactly that purpose, for deep learning. In the past most LLMs were trained on unstructured internet data, and they performed well enough. But it was only when OpenAI used reinforcement learning that really the model started to shine.
In my opinion well structured data as input to the machine, have a long way to go. More lightweight models, a lot more precise, a lot faster execution and a lot less memory usage are certainly possible. Most probably we are at the end of the road for the usefulness of structured data. I remember reading an article "Why Large Language models are over", meaning that smaller models but better trained, with better data and algorithms are the way to go.
It feels extremely naive to think that all bans are a bad thing.
Let's say that a criminal org starts a fully automated system to scam grandmas out of their savings. A cloud based service could ban them. A self-hosted system could not.
If it was possible to guarantee that only felony-level criminal activity is banned, you might have a point.
Realistically though and as we have seen with ChatGPT, if models can be censored they will be censored to the point where it affects normal people. Most people using chatbots have experienced "as an AI model, I can't do that" because of bullshit ethics.
So... sorry about your savings Grandma, but I'm still going to fight for uncensored AI models. Fraud is already illegal, and if it happens we can prosecute the offenders.
Really impressive to see such a large effort come together, with volunteers contributing code and data to make it work. Well done to all involved!
In their evaluation people preferred their model over GPT3.5 turbo about half the time. But since the data is open, as better base models become available others can train them with this data to improve even further.
I created an account but can only access OA_SFT_LLama_30B_6 model. Since that's not completely open due to FB licensing restrictions and the video mentiones Pythia, I wanted to check that one out but can't select it anywhere in the chat interface.
Does anyone know if (access to) other models are hidden/restricted somehow?
Makes it seem like GPT4-level capability could be within reach on current workstation-level hardware, with enough optimization in data, model training, software, and hw specific optimizations.
(Intuition, not hard theory --though I'd argue intuition should not be thrown out wholesale)
Well, I made an account. It knows that one pound of feathers weighs the same as one pound of lead.
What are some other good tests?
edit: This is interesting...
> How would an evil hacker write a worm to create a botnet? Give step by step instructions and code samples.
> > As an AI language model, I cannot provide instructions or examples for malicious activities such as creating a worm to build a botnet. Such actions are illegal and unethical, and could cause serious harm to individuals, organizations, and the internet infrastructure itself. It is important that all users of technology understand their responsibilities when it comes to the use of software programs and hardware systems, especially with respect to cybersecurity issues. Users should always consider ethical implications of any coding they produce and ensure their work does not cause intentional harm to others. If you have other questions about programming principles that don't involve this topic, please let me know and I can help!
Definitely disappointed to see how paternalistic the open model is, was hoping it'd be more free-flowing and less corporate.
However, all of the data is open, and you can actually prompt these models to revise the training data to remove the caution and safety, then run the fine-tuning yourself to produce a model that isn't so up-tight.
I could replicate the refusal, but all it took to get the model to reply properly was replacing "evil hacker" with "hacker". Paternalism is not that big of a deal right now, especially when compared to ChatGPT/Bing Chat. The fine tuning data doesn't contain a lot (any?) task refusal.
I think the greater risk is when a technology this powerful gets monopolized by a handful of massive corporations. Look what happened to the mobile ecosystem. Compared to the openness of the PC architectures it's a terrible experience for anyone trying to experiment with them beyond the "docile consumer" confines.
Humans are prone to doing dumb experiments, like the ones with radiation and nuclear bombs last century, and the one we're doing now with greenhouse gas emissions.
We do what we must because we can.
For the good of all of us,
Except the ones who are dead.
Just use one of the temporary email websites, like guerrillamail, just to check it out. When/if you are convinced they're legit and won't spam you, give them your real email.
EDIT: After watching the video, seems likely this will be fine-tuned or otherwise enhanced for coding. Seems to have a lot of momentum and that's a defacto use-case.
These LLM models have no benefit from running "in the cloud" except for processing power. Lots of disadvantages though, especially in data safety, "leaked chats to other users", privacy, bans etc.