I think it is interesting that llama 3 still does not use Mixture of Experts. Command R+ is also not using MoE. On paper you get 4x less training and inference compute cost. But maybe there is a good reason not to use it, this kind of supports this. Might be wasteful to create these huge MoE models if they can't outperform smaller dense models.
Many MoE models like Mixtral were not created from scratch. Somebody could still upcycle that 8B into an 8x8B. It should ideally be done by Meta since they have access to the original training data.
This entire difference can be explained due to their double mixture of experts architecture. So only 1/4 MLP and attention blocks are used at any time. Maybe this should be the headline, Moe reduces compute by a factor of 4 without losing accuracy. But this is already known. Still interesting to see a smaller Moe model. This could be the ideal size for many local applications.
It’s absolutely beneficial when training because the forward pass and back propagation is still only on the neurons that were activated.
The Mistral guys specifically mention that training speed (due to not needing as much compute) was one of the reasons Mixtral was released so soon after Mistal 7b.
With an MoE you only need to train a smaller model which you can then combine into an x8 and finetune/train the router. Mistral used their 7B base to make Mixtral, Qwen's new MoE uses their 1.8B model upscaled to 2.7B, pretty sure Grok also trained a smaller model first.
Very incorrect! The "8x7b" in the name regularly confuses people into some similar conclusion, but there are not eight 7b "experts" in Mixtral 8x. It's more apt to think of all 256 FFN's as the "experts," as each expert FFN on a given layer has no relation to the expert FFN's on other layers. You need to train them all within the MoE architecture, as combining existing models ("clown car MoE") works, but isn't gaining anything from the architecture/sparsity
Sorry, could you expand on this a bit further? Are you saying that for a MoE, you want to train the exact same model, and then just finetune the feed forward networks differently for each of them? And you're saying that separately training 8 different models would not be efficient - do we have evidence for that?
You're only correct about Qwen's MoE. I presume that Chinese model builders feel more pressure to be efficient about using their GPU time because of sanctions.
If MoEs are that good, we know GPT-4 is, than why not train very specific MoEs. One part of MoE could be a perfect Math model which can actually calculate 2+2. Wouldn't models like these be better in general?
Keeping it short: "Not even wrong", in the Pauli sense.
- People hear "mixture of experts" and they think "N specialists" - but ex. think how much know you need to know to autocomplete "Two plus two is "
- Fundamental thing of ML is you define functions and give it data, and the more data you give it to the better. Once youre at "I will simply give it the training data needed to be good enough at the task and wall off that part of the implementation" you're outside ML and have a chicken and egg problem
- We don't know GPT-4 is MoE
- MoE in practice is fundamentally about trading off runtime vs. static size properties to gain inference speed. I.e. 7x8 stored and picking 7x2 at runtime means youre somewhere between 7x2 and 7x3 in quality, inference at 7x2 speed, and have to train and store and load 7x8. You don't reach for it to increase quality, you reach for it to increase inference speed at the expense of inference ram and total model size.
It's just a dude retweeting a substack. I wouldn't bet against it* but I wouldn't bet on it either. His tweet would have just linked to the article in the top comment.
* I used to crusade against this rumor because the only source is that article, and people repeating that source. But I imagine it's a no-brainer given they have enough users that they essentially get a throughput bump 'for free' even if the model weights are huge, i.e. better to utilize as much GPU ram as you can muster, the cost of needing more GPU ram is offset by the cost of being able to run multiple inference against the model all the time anyway
I feel like this is more interesting than the original post. Though I am not sure what I am supposed to make out of this: is it 1. shady operations using fake identities with their signatures or 2. an early example of OAI trying to use AI for making money through investments? I lean towards the first one. I mean maybe there is an explanation which makes this seem more reasonable. Do people create boards with fake directors and owners for companies?
The Substack was badly written, and it took me a long time to figure out what their point was. And I might be wrong! I got lost in the purple prose.
Regardless of how shady OpenAI's aims may have been, I genuinely think the root cause of this is that someone at OpenAI foolishly used ChatGPT to automate boring tax paperwork. ChatGPT decided "John Q Vesper" or whatever was a statistically plausible name for the CEO, and this dumb mistake wasn't caught by a human because nobody wants to read tax paperwork if they think a magical talking robot is capable of what seems like a routine task. I am assuming OpenAI didn't intend to tell a ridiculous and easily falsifiable lie in its tax filings (especially if that lie contradicted their public explanations about Altman's management of the fund!). OpenAI probably wanted the paperwork to say "Sam Altman."
FWIW the IRS is generally forgiving of good-faith tax errors, but I would suspect "we spent so long lying about our chatbot that we ourselves forgot that it's dumber than a pigeon and doesn't actually understand human language" doesn't count. Considering how many two-bit lawyers got in significant legal trouble for relying on ChatGPT hallucinations, it would be outrageous if OpenAI manages to get off scot-free here.
Fun fact: I consider myself a pretty bad writer and initially started the newsletter to try and get better at it. The results have been mixed on that front so apologies if you got confused by the "purple prose". I'm trying.
Anyway, my point was:
1. The disclosures point to "OpenAI Startup Fund I" being under the control of a fake company and CEO.
2. The fake company and CEO in question looks like the creation of an AI hallucination.
3. If a fake company and CEO were intentionally used by OpenAI, this isn't a minor infraction and there are highly consequential ramifications to consider given Altman's prior "ownership" of the fund, reported concerns it was being used to circumvent OpenAI Board oversight/governance, and participation in the fund was allegedly a means to invest in OpenAI.
Much of the piece is admittedly a helter-skelter collection of disclosures and explanations focused on supporting #1 and I don't really get into my rationale for #3.
Overall, it makes for bad long-form reading, but it did compel BI to do the needed journalistic follow-up work, made OAI acknowledge the filing was "illegitimate", and pushed the story forward.
We now know, according to OpenAI:
1. The company and CEO doesn't exist to their knowledge.
2. The document indicating this fake company and CEO controlled "Fund I" isn't legitimate and completely fabricated.
3. They're not willing to elaborate/explain how or why these "fabricated" entities got filed by them.
It's ugly spaghetti prose, but I'd say the write-up worked as intended.
This humanoid form plus the voice really gives of a different feeling than the pure chat version. I think it will feel even more profound if they can add eyes and eye contact. Imagining demoing this to a random person.
It sounds so human, a person would also stutter at an introspective question like this. I wonder if their text to speech was trained on human data and produces these artifacts of human speech, or if it is intentional.
I believe it's Eleven Labs API with the Stability setting turned down a little bit. It is definitely trained on human speech and when you use a somewhat lower setting than default, it will insert those types of natural imperfections or pauses and is very realistic.
I'm not sure when OpenAI added them, but you can hear similar things when using the ChatGPT voice mode on iOS. Sometimes it feels almost like a latency stutter and other times it feels intentionally human.
I use ChatGPT voice a lot, and it is prone to this exact type of stutter. I don’t think it’s intentional. I think there are certain phonetic/tonal linkages that are naturally “weird” (uncommon in the training corpus) and that AI struggle with them. Why this struggle manifests as a very human-like stutter is a fascinating question.
This is kind of a general response, but I disagree. AI is not just like any other past technology. Just two ways in which it is totally different from the steam engine: AI is able to make art, it can learn from people and then produce similar results. It can imitate art that is the result of our deepest internal processes. Also, AI is about replacing all human jobs and capabilities. Not just a super narrow subset.
> AI is able to make art, it can learn from people and then produce similar results.
In the jargon of fine arts, "craft" refers "making a thing well / in a particular style" while "art" has something to say, most often challenging a widespread idea.
EG installing a toilet in a room of an art gallery with a plaque stating "this is art" was a bold work of art, but was not an impressive display of craft.
I really appreciate this distinction, because separating those concepts immediately clarifies a lot of the discussion around "AI Art".
AI can imitate _craft_ exceptionally well, but it cannot create _art_ (that is, it can't independently devise a display which challenges a widespread idea).
Working artists are losing out because nearly all of the economic demand for art was actually demand for craft, and they're doubly angry because their past work was needed to create the machine that's now undercutting their income.
Agree. A magnificent example of craft will make make you curious about how it was done, while a correspondingly magnificent example of art will make you curious about why it was done.
An AI-generated work is mainly if not categorically a prompt for "how" questions, as its can-kicking answer to "why" is built right into the first letter of the initialism: because a human wanted it.
The real answer to why is pursued from that point up into its usual unnavigable headwaters, with a few extra twists concerning the question of whether the word artificial itself will turn out to be merely another diguise of the anthropic principle.
Yap yap yap. The camera will ruin painters, movies will cause psychosis , dancing is gods sin, comics will ruin our nation’s youth , photoshop will end the work of photography. Blah blah blah.
This answer sounds like taken from one of the AI startups' investor pitch.
Don't believe in these "too-good-to-be-true" promises that hustlers put into their slide decks. :-)
To me it seemed like an amplification of the parent's point: AI is poised to disrupt the reason behind a great deal of human endeavor. For music, only live performance will count. Sculpture, printmaking, painting, possibly even jewelry-making? Do it by machine, 10x the speed for 0.1x the labor, you're not the actor, the machine is the actor and you're sort of the director.
"I’m 16 years old, I’m planning to go to college to become an animator. It has been my life’s dream to become an animator and to share my artwork with people, to create, to show love in every piece I make. And now, I don’t think I’m ever going to have that future."
Is there still something we can realistically tell kids that they should invest time and effort into learning? Seems that the general idea of getting good at something you are passionate about is going away. Also in particular in the arts, the idea that humans practice for a long time to create art, show it to other people who then enjoy it will disappear. I don't think people have really thought about the deeper consequences of this. Seems that a lot of people like to immediately deflect and deny that this is any different from changes in the past, as you can see in the responses to her post.
As a career, I think it‘s difficult. While I believe most jobs will remain longer than we are thinking right now, many will feel threatened.
I think for now it would be good to develop physical skills in addition to pretending nothing is happening. Robotics seem to lag behind for various reasons.
I don't really care about jobs, but what is actually scary for me is power/wealth inequality with high unemployment and low social mobility.
I used to think that by becoming software engineer I would have good life. Now I'm no longer sure, if I will still have anything in decade or two, because what about debt? What about opportunities for younger people? What about poorer countries?
> Is there still something we can realistically tell kids that they should invest time and effort into learning?
That is an extremely good question. So I asked Bing:
As a high school student, considering career paths that are less likely to be automated by artificial intelligence (AI) is a wise approach. While no job is entirely immune to technological advancements, some occupations rely heavily on uniquely human skills and are less susceptible to automation. Here are several career options to explore:
Healthcare Professionals:
- Nurses, doctors, therapists, and counselors play critical roles in patient care and emotional support. The human touch and empathy required in healthcare make these professions less likely to be fully automated.
Education:
-Teachers, instructors, and school administrators engage with students, personalize learning experiences, and provide mentorship. The interactive nature of education remains essential, even as technology evolves.
Software Development and IT:
- While AI can assist in coding and software development, human creativity, problem-solving, and understanding user needs are irreplaceable. Careers in software engineering, data science, and cybersecurity continue to thrive.
Creative Industries:
- Musicians, artists, writers, and journalists rely on imagination, emotional expression, and unique perspectives. Creativity is a distinctly human trait that AI struggles to replicate.
Human Resources (HR):
- HR professionals handle employee relations, conflict resolution, and organizational culture. Their ability to understand human dynamics and make informed decisions is crucial.
Law and Legal Services:
-Legal professionals, including lawyers and paralegals, analyze complex information, argue cases, and interpret laws. Legal reasoning and ethical judgment remain human strengths.
Public Relations (PR):
-PR specialists manage communication, reputation, and crisis response. Building relationships, understanding public sentiment, and strategic thinking are key.
Health and Fitness Trainers:
-Personal trainers, coaches, and nutritionists work directly with clients, providing personalized guidance and motivation. Human interaction is central to their roles.
Social Work and Counseling:
-Social workers and counselors support individuals and families facing challenges. Empathy, active listening, and emotional intelligence are essential skills.
Environmental Science and Conservation:
-Careers related to environmental protection, wildlife conservation, and sustainability involve complex decision-making and a deep understanding of ecological systems.
Remember that adaptability and continuous learning are crucial regardless of the career path you choose. As technology evolves, staying informed and developing a growth mindset will help you thrive in any field.