Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
This week, xAI will open source Grok (twitter.com/elonmusk)
103 points by 0xedb on March 11, 2024 | hide | past | favorite | 52 comments


Good luck to the xAI employees who just learned that they are open sourcing their product this week.


Meaning?


Read the biography Elon Musk from Walter Isaacson.

The engineers likely learned of this news via the tweet.


Alex Heath from The Verge alleged that Grok is just tuned LLaMa [0]. I wonder what will be revealed!

[0]: https://www.threads.net/@alexheath/post/C0pEidVp-1U


Would certainly not be surprised!


Wasn’t the tweet recommendation system “open sourced” as well? Does this guy know the difference between open source and “open source”?


> Wasn’t the tweet recommendation system “open sourced” as well? Does this guy know the difference between open source and “open source”?

What do you mean? There exists only one binding definition of open source

> https://opensource.org/osd

and either some product does satisfy it, or it doesn't. As far as I am aware

> https://github.com/twitter/the-algorithm

does satisfy the open source definition, so your sarcasm looks demagogical to me, but I am very willing to learn something new.


I think people expected him to "open" the algorithm so that you could tell how the recommendations are determined, and instead what people got was an Underpants Gnomes' Plan with a neural network step in the middle and no weights.


While I agree there is a common understanding of what open source is, there most definitely does not exist any "binding definition"! It is not trademarked, copyrighted (and it never could have been, two common words that it is), or in any country's legally protected terms, or anything else. It is really grating to see such nonsense repeated way to often.


> There exists only one binding definition of open source

>> https://opensource.org/osd

Insert Obama awarding himself meme. Who said that this is the "only binding definition"?


So it is not open source, thank you for the info.


Yes, and it's here: https://github.com/twitter/the-algorithm

If e.g. Amazon open sources some part of its software infrastructure should they also open source the data it uses or their configuration files?



If I recall correctly this repo is missing data so it's functionally impossible to verify or replicate the behaviour they're using on live.


Not only that, it's not been updated in 8 months. It's extremely unlikely that Twitter hasn't updated anything about the home feed since then. They effectively dumped part of the code on GitHub for some headlines, but never intended to keep developing it in the open.


> but never intended to keep developing it in the open

Did Elon Musk promise this?


Not explicitly. But since they gave it an AGPL licence, you can request the updated source from them, so the intention was definitely there. (Then again, as owners they can relicense any time they want, so it's not really binding for them)

They also invited to contribute to that repo... but none of the serious PRs got ever merged as far as I can tell. Basically, they were never serious about doing this.


"Open source" or "open weight"? Because there is a distinction. Many have previously provided open weights (or what they call "open model" now): Mistral, LLaMA, Falcon, etc. There are not many open "source" LLMs out there that bring true value to business and academia.


how does grok even compare to the rest of llms? it seems like it was just Elon throwing up shit because he wants Twitter to be as big and bad as Google and Facebook, and even Google has been really fumbling trying to compete with Microsoft and openai, FB has been surprising with their more open approach open models and Mistral seemingly came out of nowhere with some great tech.

Is grok really noteworthy or is it just a nothing burger?


I prefer the term "model available".


Has anyone benchmarked Grok against other models? The LLMSYS benchmarks, which I trust most, don't have it. And their own reported results are good but nothing amazing since it doesn't seem to surpass GPT4 or Claude 3.


The general consensus is it's in the "GPT3.5" class along with llama 2 and co, but it has a very annoying attitude. I don't know anybody routinely using it.


The project ignores the fundamental GIGO nature of the written word so completely that I’m assuming someone’s running a con on Musk.


God, the replies to that tweet are deranged.


Yeah. So the remaining concern is license. I hope it won’t be similar to Llama.


People seem very concerned about licenses for LLM weights.

Why shouldn't we treat LLM weights like LLM creators treat ebooks and open source code? Namely, that it is not subject to copyright?

To say that the Llama training process bypasses the copyright of all the training data creators, and yet the output is copyrighted by Facebook, seems a uniquely pro-corporation stance.


This is a really interesting framing that I hadn’t thought about before.

You’re absolutely right. It’s very one sided at the moment.

If we follow their ebook usage practice, it’s not even required that they declare it to be open source. Just need someone to publish their copyrighted work online [0] without their agreement and then - per their rules - it’s totally acceptable to download and use those weights with abandon.

Maybe it could be called “weights3”

[0] I’m not actually suggesting anyone should do this.


You really can't have an anon internet and copyrights simultaneously.

Take Wikipedia's content, licensed under Creative Commons - by who? Donald Duck? Then when Pikachu and Tony Stark edit the article it becomes a derived work?

> Creative Commons licenses give everyone from individual creators to large institutions a standardized way to grant the public permission to use their creative work under copyright law.

>....so long as attribution is given to the creator.

Who is the creator I must attribute to?

I don't think any of WP is CC? Without at least a full name and claim of authorship I cant satisfy the requirements of the license? Or can I? Then if I can satisfy attribution I will have to disclose who I am in order to allow further sharing.

When Scratch[0] took off lots of kids re-uploaded things made by others replacing the description with "I MADE THIS"

I'd say we, the grown ups of this world should know we've messed up when kids mock our ways.

[0] - https://scratch.mit.edu


My comment was specifically about the use of tens of thousands of copyrighted books which had been pirated and distributed illegally, for which the authors - most of whom receive quite a small amount of money for their work as it is - received nothing and weren’t asked for permission.

There is a very big difference between knowingly downloading and using illegally distributed copyrighted works vs scraping the internet in general.

And if we can’t have copyright any more then we need to work out how to allow authors to make a living, (and musicians, and artists, and indie software developers in fact…)

I agree it’s less clear cut about content that has been willingly posted to the internet but that’s not really what I’m most concerned about.


> My comment was specifically about the use of tens of thousands of copyrighted books which had been pirated and distributed illegally, for which the authors - most of whom receive quite a small amount of money for their work as it is - received nothing and weren’t asked for permission.

I'm sorry, while true you've made to much of a heart warming story from it. It is an ongoing conflict between sharing and not sharing books, published papers, video, audio and perhaps patents should also be part of the scope. On both sides we have both small and large efforts that range from deserving to not deserving our sympathy.

The main beneficiary of not sharing the content of books are the publishers. For the most part they have proven not to care about authors. Much like the recording industry. They will not stop pushing for more and more control if it benefits them.

They really want (and have) my government adopting/creating/preserving/copying(lol?) laws that cant realistically be implemented. They wont be satisfied even if they can get a scheme like the TV license circus with random assholes searching peoples homes looking for a radio or TV (while even the police has no such rights) You already cant play music in public places without paying various kinds of protection money.

They are already scanning your uploads in various places looking for anything that vaguely resembles something else. When they think they've found it you will be punished. They don't care what life will be like after losing your proverbial google account over a false positive. Oh and Google has to pay for it which means you ultimately have to pay for your own investigation and persecution.

People got enormous fines for tiny offenses. There are efforts to filter out websites at the ISP level. Bittorrent is portrayed as a tool for pirates while it is simply a much superior sharing technology.

Many enormous data centers had to be build just so that we can use inferior means of distribution. You ultimately have to pay for that. You got asymmetric internet connections because hey, you don't need to be uploading anything now do you? We are retooling the entire civilization to protect Harry Potter, Shakin that Ass and Plan 9 from outer space and you get to pay for it.

The industries want to sell new works. That agenda also opposes the distribution of existing works. We have a rich history of book burning so that the old may make room for the new.

Personally, the most worrying part is the desire/agenda to breed a population of illiterate consumers who can barely tie their own shoes but should some how run a democracy.

It should be that if anyone shows an ever so slight interest in a topic we ram all the relevant books, published papers, patents, documentaries and tools in their hands and shout: HERE, READ THIS, WATCH THIS AND HERE IS YOUR FISHING ROD.

This is worth twice the military budget. We can find a way to pay authors. It doesn't seem a very hard problem. I'm not sure there really is a need but plenty of people want this so lets make it.

> There is a very big difference between knowingly downloading and using illegally distributed copyrighted works vs scraping the internet in general.

Not really, you cant look inside peoples head. If I buy something knowing it was stolen or pretending not to know it is still a crime.


Every artist does the same.

Get inspired and training on prev work, creating something new.


tidbit: Oracle have an OpenGrok project under active development:

https://en.wikipedia.org/wiki/OpenGrok



...and nobody will care.


This week, @xAI will open source Grok

It's like that glorious week in 2018 when we got full self driving.


You can get FSD today. Thousands of people use it every day https://www.teslarati.com/tesla-fsd-beta-program-half-a-bill...


Full self driving is a misleading name, Level 2 is not Level 5, Tesla is overpromising a solution, and I wouldn't consider something in "beta" that is safety critical to be considered shipping as GA.


Its name, doesn't tell me how good it is. until I can buy a Tesla, give it a Lyft account, and have it go make me money driving for Lyft, it's not worth much to me.


That version will be released by the end of the year (2019).


It's still a huge success.


Successful in being irresponsible and marketing 'Level 5' as 'Full Self Driving' while not actually being a Level 5 autonomous vehicle with mounting complaints about safety.

https://www.theverge.com/2023/5/25/23737972/tesla-whistleblo...


If it’s wildly overdue and only a fraction of what was promised, it’s a significant failure as a product.

It might be a great achievement if it hadn’t been hyped up well beyond what has been released


Whenever I see what it can do on some peoples YouTube, it's impressive as f.

And the backend structure with their AI model as well.

I'm not talking about marketing shit


This.

Instead what you have is Fool Self Driving as Tesla knows that it isn't fully autonomous yet, but still market it as such.

Intentionally misleading and irresponsible.


There's an old saying in Tennessee — I know it's in Texas, probably in Tennessee — that says, fool me once, shame on — shame on you. Fool me — you can't get fooled again.

George W. Bush


I think this is simply a confusion over the meaning of words.

You can indeed buy Full Self Driving™ (FSD), but even then your Tesla is not capable of self driving, fully (eg, there are many scenarios where a human is still required)


People take "Full" as meaning L5, but Tesla uses "full" as in ODD (Operational Design Domain). It can go anywhere, city streets, highways, parking lots, unmarked roads. In that sense it is indeed "full". This is clear when you look at the history of Tesla autonomy products, first there was Autopilot, which is only for highways. Then they release Full Self Driving Beta, which includes every type of driving.



I think I reflexively agree, but I will try to suspend my opinion until it is actually released.


[flagged]


LLaMA is pseudo open source. There's a huge difference.

Mistral, for example, is real open source.

Remains to be seen which one he picks (better be the latter), but Musk haters are worse than the fanboys, that much is clear by the bias clouding even the most basic of assessments.


I had the wrong assumption that Mistral was built "on top of" Llama. Then again, I find sentences like "Mistral's models are based off on Meta's Llama".


> Musk haters are worse than the fanboys

I used to interact with an ex-coworker who is a normally an intelligent fellow. But when it comes to discussing Elon Musk, his IQ will drop a few points. He believed in bizarre conspiracies, like he thought Twitter was already going down and will be dead soon. He would cite me various legacy media propaganda articles on it. Sad to see smart folks letting their intelligence be compromised like this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: