I think people expected him to "open" the algorithm so that you could tell how the recommendations are determined, and instead what people got was an Underpants Gnomes' Plan with a neural network step in the middle and no weights.
While I agree there is a common understanding of what open source is, there most definitely does not exist any "binding definition"! It is not trademarked, copyrighted (and it never could have been, two common words that it is), or in any country's legally protected terms, or anything else. It is really grating to see such nonsense repeated way to often.
Not only that, it's not been updated in 8 months. It's extremely unlikely that Twitter hasn't updated anything about the home feed since then. They effectively dumped part of the code on GitHub for some headlines, but never intended to keep developing it in the open.
Not explicitly. But since they gave it an AGPL licence, you can request the updated source from them, so the intention was definitely there. (Then again, as owners they can relicense any time they want, so it's not really binding for them)
They also invited to contribute to that repo... but none of the serious PRs got ever merged as far as I can tell. Basically, they were never serious about doing this.
"Open source" or "open weight"? Because there is a distinction. Many have previously provided open weights (or what they call "open model" now): Mistral, LLaMA, Falcon, etc. There are not many open "source" LLMs out there that bring true value to business and academia.
how does grok even compare to the rest of llms? it seems like it was just Elon throwing up shit because he wants Twitter to be as big and bad as Google and Facebook, and even Google has been really fumbling trying to compete with Microsoft and openai, FB has been surprising with their more open approach open models and Mistral seemingly came out of nowhere with some great tech.
Is grok really noteworthy or is it just a nothing burger?
Has anyone benchmarked Grok against other models? The LLMSYS benchmarks, which I trust most, don't have it. And their own reported results are good but nothing amazing since it doesn't seem to surpass GPT4 or Claude 3.
The general consensus is it's in the "GPT3.5" class along with llama 2 and co, but it has a very annoying attitude. I don't know anybody routinely using it.
People seem very concerned about licenses for LLM weights.
Why shouldn't we treat LLM weights like LLM creators treat ebooks and open source code? Namely, that it is not subject to copyright?
To say that the Llama training process bypasses the copyright of all the training data creators, and yet the output is copyrighted by Facebook, seems a uniquely pro-corporation stance.
This is a really interesting framing that I hadn’t thought about before.
You’re absolutely right. It’s very one sided at the moment.
If we follow their ebook usage practice, it’s not even required that they declare it to be open source. Just need someone to publish their copyrighted work online [0] without their agreement and then - per their rules - it’s totally acceptable to download and use those weights with abandon.
Maybe it could be called “weights3”
[0] I’m not actually suggesting anyone should do this.
You really can't have an anon internet and copyrights simultaneously.
Take Wikipedia's content, licensed under Creative Commons - by who? Donald Duck? Then when Pikachu and Tony Stark edit the article it becomes a derived work?
> Creative Commons licenses give everyone from individual creators to large institutions a standardized way to grant the public permission to use their creative work under copyright law.
>....so long as attribution is given to the creator.
Who is the creator I must attribute to?
I don't think any of WP is CC? Without at least a full name and claim of authorship I cant satisfy the requirements of the license? Or can I? Then if I can satisfy attribution I will have to disclose who I am in order to allow further sharing.
When Scratch[0] took off lots of kids re-uploaded things made by others replacing the description with "I MADE THIS"
I'd say we, the grown ups of this world should know we've messed up when kids mock our ways.
My comment was specifically about the use of tens of thousands of copyrighted books which had been pirated and distributed illegally, for which the authors - most of whom receive quite a small amount of money for their work as it is - received nothing and weren’t asked for permission.
There is a very big difference between knowingly downloading and using illegally distributed copyrighted works vs scraping the internet in general.
And if we can’t have copyright any more then we need to work out how to allow authors to make a living, (and musicians, and artists, and indie software developers in fact…)
I agree it’s less clear cut about content that has been willingly posted to the internet but that’s not really what I’m most concerned about.
> My comment was specifically about the use of tens of thousands of copyrighted books which had been pirated and distributed illegally, for which the authors - most of whom receive quite a small amount of money for their work as it is - received nothing and weren’t asked for permission.
I'm sorry, while true you've made to much of a heart warming story from it. It is an ongoing conflict between sharing and not sharing books, published papers, video, audio and perhaps patents should also be part of the scope. On both sides we have both small and large efforts that range from deserving to not deserving our sympathy.
The main beneficiary of not sharing the content of books are the publishers. For the most part they have proven not to care about authors. Much like the recording industry. They will not stop pushing for more and more control if it benefits them.
They really want (and have) my government adopting/creating/preserving/copying(lol?) laws that cant realistically be implemented. They wont be satisfied even if they can get a scheme like the TV license circus with random assholes searching peoples homes looking for a radio or TV (while even the police has no such rights) You already cant play music in public places without paying various kinds of protection money.
They are already scanning your uploads in various places looking for anything that vaguely resembles something else. When they think they've found it you will be punished. They don't care what life will be like after losing your proverbial google account over a false positive. Oh and Google has to pay for it which means you ultimately have to pay for your own investigation and persecution.
People got enormous fines for tiny offenses. There are efforts to filter out websites at the ISP level. Bittorrent is portrayed as a tool for pirates while it is simply a much superior sharing technology.
Many enormous data centers had to be build just so that we can use inferior means of distribution. You ultimately have to pay for that. You got asymmetric internet connections because hey, you don't need to be uploading anything now do you? We are retooling the entire civilization to protect Harry Potter, Shakin that Ass and Plan 9 from outer space and you get to pay for it.
The industries want to sell new works. That agenda also opposes the distribution of existing works. We have a rich history of book burning so that the old may make room for the new.
Personally, the most worrying part is the desire/agenda to breed a population of illiterate consumers who can barely tie their own shoes but should some how run a democracy.
It should be that if anyone shows an ever so slight interest in a topic we ram all the relevant books, published papers, patents, documentaries and tools in their hands and shout: HERE, READ THIS, WATCH THIS AND HERE IS YOUR FISHING ROD.
This is worth twice the military budget. We can find a way to pay authors. It doesn't seem a very hard problem. I'm not sure there really is a need but plenty of people want this so lets make it.
> There is a very big difference between knowingly downloading and using illegally distributed copyrighted works vs scraping the internet in general.
Not really, you cant look inside peoples head. If I buy something knowing it was stolen or pretending not to know it is still a crime.
Full self driving is a misleading name, Level 2 is not Level 5, Tesla is overpromising a solution, and I wouldn't consider something in "beta" that is safety critical to be considered shipping as GA.
Its name, doesn't tell me how good it is. until I can buy a Tesla, give it a Lyft account, and have it go make me money driving for Lyft, it's not worth much to me.
Successful in being irresponsible and marketing 'Level 5' as 'Full Self Driving' while not actually being a Level 5 autonomous vehicle with mounting complaints about safety.
There's an old saying in Tennessee — I know it's in Texas, probably in Tennessee — that says, fool me once, shame on — shame on you. Fool me — you can't get fooled again.
I think this is simply a confusion over the meaning of words.
You can indeed buy Full Self Driving™ (FSD), but even then your Tesla is not capable of self driving, fully (eg, there are many scenarios where a human is still required)
People take "Full" as meaning L5, but Tesla uses "full" as in ODD (Operational Design Domain). It can go anywhere, city streets, highways, parking lots, unmarked roads. In that sense it is indeed "full". This is clear when you look at the history of Tesla autonomy products, first there was Autopilot, which is only for highways. Then they release Full Self Driving Beta, which includes every type of driving.
LLaMA is pseudo open source. There's a huge difference.
Mistral, for example, is real open source.
Remains to be seen which one he picks (better be the latter), but Musk haters are worse than the fanboys, that much is clear by the bias clouding even the most basic of assessments.
I had the wrong assumption that Mistral was built "on top of" Llama. Then again, I find sentences like "Mistral's models are based off on Meta's Llama".
I used to interact with an ex-coworker who is a normally an intelligent fellow. But when it comes to discussing Elon Musk, his IQ will drop a few points. He believed in bizarre conspiracies, like he thought Twitter was already going down and will be dead soon. He would cite me various legacy media propaganda articles on it. Sad to see smart folks letting their intelligence be compromised like this.