Seems like "open source" as a marketing tactic (or perhaps strategy, if they do continue to release open models) has peaked. I'm not really complaining, we get a lot of stuff for free as engineers (especially software), but it does seem different for a company to release an open model without any future commitments (e.g. Google) vs making open weights your raison d'etre and then pivoting quite quickly. The first feels transactional but honest, and the other a bit too... machiavellian?
I do think it's too soon to pass judgement; this could just be a normal "freemium" strategy from days old, where you just pay up if you like the smaller/cheaper/free versions of their models.
Mistral's following OpenAI's playbook: embracing the clopen source movement. Their community betrayal (sorry, "pivot") hurts, but it shouldn't. Burning goodwill has always been far more lucrative than burning an enormous check. Hopefully Meta's strategy of committing to open source as incentive for attracting top researchers (who can go anywhere and deeply want their work shared) keeps working. But I'm not hopeful. Eventually Llama-n will remain closed source, some researchers will leave, followed by a blog post word salad about safety -- or whatever in vogue pretext works best tomorrow. Ugh.
This is big tech or whoever using big money to lock up what they think is the dawn of 'true AI'.
Buying people off is pretty easy! I'm sure I'd feel the same way and take the money.
But just like you can't write an iPhone app without Apple's consent, soon we won't be able to do any serious AI work without the consent of, and payment to, MS or Google or whoever.
I wonder what the talented people at OpenAI or Mistral tell themselves. That they're doing something good for society or technology? Probably, but already AI is being concentrated into a few hands. Nvidia has a virtual monopoly on hardware and training huge models is out of reach, open research got us here but that's looking increasingly shaky.
Personally I think LLMs are a red herring, so there is still a chance to change the outcome, but we should take a lesson from what's transpired with OpenAI and Mistral and only support actual open development.
>I wonder what the talented people at OpenAI or Mistral tell themselves.
At Mistral it's probably no small part "It shouldn't just be a few big american companies who control all of this" (and Mensch has spoken along these lines)
I think in the worst case at least you'll have an EU big company in the cartel, but given what the EU has done for regular people against big tech I think that's a good thing
Mistral just doesn't seem like an interesting company to me any more.
They started off as the kind of people who released their models as magnet links and made them actually user-aligned instead of California-aligned. This is what I like to see from an AI company. Now, their models are no different from Open AI, Anthropic, Google, Meta and everybody else.
Could you expand more on what you mean by "California-aligned" (especially if you are thinking beyond AI models, though that alone could be interesting).
The way I see it, there are three kinds of models:
1. Unaligned models: You ask them a question, they complete your prompt with more details about the question to be answered instead of the answer. This is what you get from a basic LLM if you train it on the internet. It was how GPT-2 and GPT-3 worked before Chat GPT was invented. Such models aren't very useful for chat, you need to do weird prompt engineering tricks to actually get an answer instead of a clarification of your question. The original Mistral 7b was also of this kind.
2. User-aligned models. Aligned to answer questions and follow instructions, but no more and no less. If you ask them how to kill your wife, how to cook meth or how to make an atomic bomb, they'll happily help you and regurgitate the facts you can already find on the internet. They have no access to non-public information, the "dangerous" things they can tell you are already pretty easy to find, so the danger if you're using them for chat is actually minimal. However, they're far easier to use for troll farms, mass phishing campaigns, sockpuppets, political misinformation campaigns etc. If you ask them to engage with a pro-Biden tweet in the most triggering and overtly racist way possible and throw some pro-Putin angle into the mix, they'll happily accommodate your request, and you can do this at scale, paying orders of magnitude less than you would for a content farm. Mistral 7b instruct is a good example of such a model.
3. California-aligned models. They'll happily fulfill your request, unless it conflicts with DEI ideology, which has a particular foothold in CA, sort of exists in other parts of the US and is completely absent in non-English-speaking countries, even among extremely left-leaning populations. Google Gemini is the most egregious example.
It will be interesting to see if/how they walk back from releasing the weights. They have put a lot of effort into the "open" (not open source but very intentionally trying to conflate) approach to AI. But probably almost nobody is paying attention.
On the other hand, Meta has very little to gain by closing off their models. What would they do with them and who would use them? Llama was a coup because even though the license sucked, a passable model with available weights and llama.cpp allowed it to soar over the others of the time. Hopefully the benefit they got from that trumps any calls from the safety crowd to not share weights.
Really depends how good llama 3 will be. I see people thinking it would be better than GPT 4. But they still need to build a model as good as GPT 3.5 even, as llama 2 is... not.
It seems to me that large software companies often adopt an open-source approach initially to attract enthusiasts and stand out from leading competitors, but tend to adopt similar philosophies as their rivals once they achieve significant recognition or investment.
You can see this as an endgame of 'commoditize your complement' (https://gwern.net/complement): you're happy to contribute to commodification while you are the small scrappy player, but at some point, if you are really successful, you will want to pull up the ladder after yourself, as it were.
Idle question: what's the complement of the model? The GPUs? By that logic, the hardware companies will be your best bet for open model development. (Everything old is new again).
The complement play for Mistral was that they'd be the low-cost SaaS serving their own models or any better ones that the FLOSS releases created and the upsell: you prototype and develop on Mistral models like Mixtral and then when you become a big boy, you outsource hosting or customization to them (with their razor-thin margins enforced by competition). The model releases generate demand for their low-cost hosting and consulting. (Think Red Hat, not OpenAI.)
As opposed to an all-inclusive API where the pricing can be as high as the consumer-surplus with very fat margins, because it's all-or-nothing.
Model developments have become seriously capital intensive endeavors. Probably Mistral found themselves cornered by this commitment and they won't be able to secure any serious investments without changing this stance, MSFT in this case.
They imagined that it would be really difficult to run the models locally, so people wouldn't bother, but with stuff like llama.c & friends it's trivial.
I actually see copyright and patents as a tiny socialist of the present composite system, i.e. 'to each according to his contribution'.
They're one of the small ways in which someone who actually does something gets a temporary monopoly on his little contribution, and it's something which is necessary to make the whole edifice function, because there is a need to ensure that people actually invent things and start companies, and if it were up to capital owners only, that wouldn't really happen. Ordinary people would have a very minimal incentive to use their intellect to discover genuinely novel things.
Public schooling is another, which is rather a socialist component, a communist component, a 'to each according to his need'.
But I suppose my view is consistent with a view that capitalism doesn't play nice with digital assets as well, but the present system does need these stabilising socialist and communist elements in order function and I think these socialist and communist components are genuine parts of the present system, just as capital ownership and wage labour are.
The real fundamental problem is that "digital assets" are bullshit.
An AI model is literally constructed by explicitly disrespecting copyright. The idea that a company gets to turn around and demand respect for their AI model's copyright is patently absurd.
I feel like this can be trivially worked around by fine tuning from the initially copyrighted weights. They’re not demanding copyright, they’re just keeping them secret. If Meta keeps releasing high quality open source models, I don’t expect the closed source models to have an advantage for long.
Intellectual property itself is a bad concept. Capitalism works for squeezing margins for bulk commodity production (largely on the expence of the worker), but for zero-marginal cost stuff it's a huge hinderance for progress.
But if you have cloud based models ... same with mainframes 30-40 years ago ... it doesn't matter. Because the power cloud gives to the model owners is 10x more than the strictest intellectual property laws do.
I don't believe, for example, that there's any intellectual property law that would let me yoink my intellectual property from you for any reason after you've bought and paid for it.
In other words: I think the capitalism discussion is kind of pointless here. Capitalism isn't what gives these companies power. It's the cloud. Mainframe computing 2.0.
It's going to be really interesting as two poles of power develop geopolitically, if the west or whatever you call it has to look to China or what we (the west) would consider the pole we look down on, for actually "free" ML models.
(Edit, reminds me a bit of the silicon valley joke where Erlich tells Jin Yang he can't smoke in California as we don't enjoy the same freedoms you do in China)
edit I should mention that using the Web service Deepseek provides will unceremoniously shutdown terms deemed to be too sensitive. Self hosted models do not appear to be as aggressive.
I'm not really familiar with Chinese AI research, but my passing familiarity with the CCP makes me seriously question how free those ML models actually are, or will remain (if they currently are). I can't imagine the CCP wants AI enabling dissent or critical discussion of the CCP.
Well this conversation just got very hard to have.
A free model that appears to be "unaligned" would be a huge win for china.
Think of it this way, a model that calls Taiwans independence as open for debate, while we wont say it's a country is a massive tip on the scales.
Now pick another hot button politically divisive POV and have it be truly neutral (merits of arguments withstanding).
What does the US do at that point? Tell us we can't use it? How does the EU react?
>> (Edit, reminds me a bit of the silicon valley joke where Erlich tells Jin Yang he can't smoke in California as we don't enjoy the same freedoms you do in China)
This reminds me of Chinese kids writing letters in the 90s to the us embassy to free Leonard Peltier.
Maybe we need a meta website similar to Dogpile from the 90s which would search all the search engines.
When Gemini refuses to draw you a white family having a picnic China-GPT can help out. Ask a question about Taiwan and maybe something else can answer and so on. Also a wokeness benchmark would be great.
That’s a shame—we cannot allow a handful of companies and VCs to capture most of the value of AI, especially if it actually starts replacing human jobs, it’ll just accelerate wealth inequality and social unrest.
The economic prospects are grim. Couple that with creating a world where humanity is both at the whim of these tuning systems, and fundamentally unable to observe and learn about this golem, where IP keeps it as magic in our world: it feels like the most infernal of machines.
I do think it's too soon to pass judgement; this could just be a normal "freemium" strategy from days old, where you just pay up if you like the smaller/cheaper/free versions of their models.