> OpenAI could open GPT 4 tomorrow and it wouldn’t meaningfully impact their rev...

sillysaurusx · 2024-02-26T19:46:20.000000Z

That’s the theory. In practice, it requires immense infrastructure to run it, let alone all the tooling and sales pipelines surrounding it. Companies are risk averse by definition, and in practice the risks are usually different than the ones you imagine from first principles.

It’s dumb. The first company to prove this will hopefully set an example that will be noticed.

declaredapple · 2024-02-26T20:35:15.000000Z

It didn't take long for perplexity, anyscale, together.ai, groq, deepinfra, or lepton to all host mistral's 8x7B model, both faster and cheaper then Mistral's own api.

https://artificialanalysis.ai/models/mixtral-8x7b-instruct/h...

sillysaurusx · 2024-02-26T23:28:24.000000Z

Hosting a 7B model is completely different than hosting a 150B+ model. I thought this would be obvious, but I should have been explicit.

declaredapple · 2024-02-27T13:08:22.000000Z

It's not really. And 8x7B is not a 7B model, it's a MoE that's closer to 60B that has to be kept in memory, and uses 2 experts per token so it runs at 15B speeds.

All of the current frameworks support MoE and sharding among GPUs so I don't see what the issue is.

fragmede · 2024-02-26T20:00:41.000000Z

Ollama makes it pretty easy to run inference on a bunch of model-available releases. If a company is after code/text generation, finding a company/contractor to fine tune one of the model-available releases on their source code, and have IT deploy Ollama to ask their employees with M3 MacBooks, decked out with 64 GiB of RAM is well within the abilities of a competent and well funded IT department.

What recognition has Facebook gotten for their model releases? How has that been priced into their stock price?

viraptor · 2024-02-26T20:24:18.000000Z

That's completely different scale. You're not going to run GPT4 like a random ollama model. At that point you need dedicated external hardware for the service, and proper batching/pipelining to utilise it well. This is way out of the "enough ram in the laptop area".