Note that Microsoft has been using "custom chips" for years; they've just been on FPGAs not ASICs. They've developed IP to accelerate a whole bunch of processes, so it's not like they've suddenly come around to a magic sauce that Amazon and Google have been doing this whole time. It would surprise me if over half of the new chip is just based on RTL from their FPGA design.
The only thing that's changed is that they're scaling like crazy now and can justify overhead that comes with designing ASICs versus using off the shelf parts.
It was only a matter of time. Google announced theirs years ago, Amazon announced theirs last year.
Right now NVIDIA has the lead because they have the better software, but they can't make the chips fast enough. Will be interesting to see if their better software continues to keep them in the lead or if people are more interested in getting the capacity in any form.
If they “can’t make the chips fast enough” being TSMC’s second highest volume customer behind Apple and probably second in priority, what chance does Microsoft have getting enough of TSMCs capacity?
It blows my mind that not only are Apple, Nvidia, and all other enormous "chip makers" fabless... but there's ONE single company actually making the majority of all their chips.
Why aren't any other companies entering this space? TSMC's growth and profits are immense, it's not like the market couldn't bear more competitors.
Also, the current situation is geopolitical insanity. It feels like China and the U.S. are on a path to war in the next decade or so. China is itching to retake Taiwan... and if you thought the U.S. fought a lot of wars over oil, there's NO WAY we wouldn't go to war to prevent all the world's most important semiconductors falling exclusively under PROC control. It's the 21st century, we would have no choice. I know that TSMC is diversifying and building fabs in the U.S. and Germany, and hopefully that will reduce the risk of war. But it's just nuts that this is even a risk. How does one company control so much of the global market?
TSMC is so dominant because they have very, very, deep institutional knowledge. Even if you have billions to invest it's not easy to create that, and definitely impossible to do it short term.
There's only 3 companies left chasing smaller nodes. Everyone else has given up and are focusing on revenue from high volume parts in older processes.
The only three are Samsung, Intel, and TSMC. Idk why going on with Intel but Samsung should BSs doing better but that institutional knowledge that TSMC has is crazy. They all technically invest in asml that makes the machines. If I had to redo my education I’d get into electrical engineering just for this.
Eh, the process engineers I've talked to don't have a great life, at least here in the US. It seems inevitable you end up perpetually on call for some machine, and I don't mean some type of machine, I mean unit #23 over in fab #4 and then that's your baby for years. The pay is considerable worse than software too.
I almost went to get a CE degree because I find it all so fascinating but ultimately I'm glad I didn't.
the pay is worse than software because you're competing with Asian wages in a manufacturing context, where the margins are slim.
the serious system architects, designers, and fab managers are doing just fine, software or not. the dudes in Phoenix who are getting pushed through 2-year degrees to help run a clean room are probably making 65k.
Yeah the salaries for EEs are really depressing in most of the industry. It's really hard work. Most EEs end up leaving for software or research (I did!).
I took EECS, did a masters in analog design and I've only worked professionally making shitty SaaS websites. Making way more money with way less stress while still feeling accomplished that I learned how to do the complex stuff I wanted to know how. Very hard to make nearly as much otherwise.
I went to school for CS while one of my best friends went for EE. It definitely seemed like the far more complex discipline.
In fact for a small project in an elective he programmed a computer vision system that was far ahead of anything me or my peers had made at the time in our CS program. He had pretty much zero programming experience.
I must admit he's a particularly smart person but it was crazy that I went on to make more money than him. His work is way more complicated than mine as far as I can tell.
Presumably only for their own and Chinese brands however as I can't see anyone letting them do fabication for major players outside of China. They've also got a huge hurdle given they cant buy the standard equipment every chip fab needs owing to it all being produced by a single company.
If there's one company that has the capability to become a Samsung/Intel (IDM) on steroids, it is Huawei.
They have the money,talent, determination and state backing.
TSMC is solving some of the hardest physics, electrical engineering, mechanical engineering, and other kinds of engineering I don't even know about. It's unlikely anyone catches up to them in terms of lithography ability for a long, long time given how well they've executed 45nm and lower nodes.
More like ASML is solving the physics problems and TSMC is the one with the most capital and experience to run their EUV machines. It’s not that other competitors don’t exist, it’s more like they’ve given up or can’t keep up. Global foundations I believe isn’t making the investment past 16nm. Samsungs latest node doesn’t perform as good as TSMC and intel is being intel.
No. I don’t know where this oft-repeated semiconductor meme came from, but it’s a really poor take.
ASML are solving a small subset of physics problems: how to project extremely small features. TSMC are solving many more physics problems: how to structure layers of doped silicon into transistors, how to structure those transistors into logic gates, and those logic gates into functional blocks. This is why silicon process is not just a matter of capital investment, and why nobody is going to show up overnight with 10 billion dollars and change things. It’s not that TSMC are the only ones with a big bag of cash to give ASML.
Yeah I love how people thinks it's easy to mass produce designs that take hundreds of individual steps and weeks for a single wafer to go through the process. 50+ Layers, aligned at the (sub?) nanometer level. Trying to tweak the process so that your electrons stop quantum tunneling across layers in all directions.
Why doesn't anyone enter? It takes billions and billions of dollars, tons of knowledge and if you get it wrong...even Intel screwed it up, and they are one of the only companies would could even try to do it.
Right now Intel is hamstrung because there is no separation between the fab and design sides. Nobody is going to trust Intel to fab their latest and greatest while the two sides are joined at the hip.
The sooner Intel spins out the fab side into its own entity the better positioned it will be to pick up the business that TSMC doesn't have capacity for.
But the whole problem at Intel is that they're not at the level of TSMC, they're years and generations behind. Splitting them up won't just magically make the fab business be better, and without the profit margins of the sale of processors it's unlikely they'll have the resources to invest to get them to be good. Case in point: AMD are doing very well after spinning off their fabs, but their fabs are not doing very well (relatively of course - they're just stuck to where they split and haven't had meaningful advances; but they're still a profitable business because those older chips are needed too).
Apple and Microsoft are trillion dollar companies ( both nearing $3 trillion market caps ). They can afford it. Heck they have the balance sheet to acquire both tsmc and samsung if either were up for sale.
I thought Apple was moving towards being a full-stack company. Where all the hardware and software is developed in-house. If anyone has the resources, it is apple.
> I thought Apple was moving towards being a full-stack company. Where all the hardware and software is developed in-house. If anyone has the resources, it is apple.
Having a modern chip fab binds an insane amount of capital, and makes you much more vulnerable in case of market turbulences. This is exactly a reason why there exist people who like to claim that Intel should spin off their fabs.
Building and operating a fab is an extremely difficult and expensive task that will take at least 5-10 years to bear fruit. That makes it a high-risk venture which is hard to sell to the board and investors.
More importantly, the new chip production lines are dependent on hyper-specialized EUV lithography systems that only company is able to manufacture (ASML). ASML has their own limits to production too.
That's a bit hyperbolic don't you think? Sure, they're the only game in town when it comes to the leading edge nodes (eg. 3nm), but there's plenty of options at 14 nm and above. That corresponds to AMD zen (released in 2017) and Intel skylake (released in 2015). They're significantly worse than the latest and greatest CPUs today, but they're still in use today. I'd hardly call that as "going back to the stone age".
The Chinese aren't stupid enough to do that. Any launch of ICBMs at a NATO country would be interpreted as a possible nuclear attack and draw an immediate US counterforce strike.
> Why aren't any other companies entering this space?
Because it would take over a decade to build a comparable chip fab and thats assuming you had unlimited cash to throw at it.
You'd be spending billions with zero revenue for a decade and no guarantees it'll be anything as close to as good as TSMC's setup. Try finding someone willing to invest in that and it becomes clear why theres not more chip fabs out there.
It's worse. It's one company in a 'contested' territory. So much so that US military is rumored to have considered the 'samson option' of bombing the TSMC facility were the PLA to occupy the island.
Taiwan (unlike Ukraine) is far too critical to the world economy.
TSMC makes a lot of money from fabricating chips on old technology using fully depreciated assets. No new competitor is going to spend billions today to fab chips on older technology.
Just few years ago was four companies doing the bleeding edge: Intel, GlobalFoundries, Samsung and TSMC. Intel was the best. First Global Foundries could not keep up. Then Intel fumbled ant TSMC became the best. Samsung is the only one keeping up with TSMC but they come slightly behind. Intel tries to catch up the two. No companies have "exited" they just can't compete.
The bleeding edge semiconductor node design is like doing Apollo moon program every 4-5 years. Only one can be the best.
> It feels like China and the U.S. are on a path to war in the next decade or so
IMO, MAD (mutually assured destruction - if a nuclear power start a war with another nuclear power, both will probably end up destroyed by nukes) makes the prospects of that highly unlikely.
> It feels like China and the U.S. are on a path to war in the next decade or so
they are literally having discussions in San Fran right now to reduce the likelihood of that happening. And it sounds like China is quite open to discussion, and the US as well.
And TSMC uses the ASML fabrication machines made by… one company. Basically two places are responsible for all of those chips. Intel really needs to get back in the game. Hopefully that’s still on the table.
Even if Intel could spin up a bunch of fabs competitive with tsmc, they'd still be competing for asml's capacity. We just can't massively scale up leading edge fab capacity.
By internal and external claims, Intel is on track to be competitive with TSMC by 2025. And, Nvidia and other will use the capacity as soon it is available.
> what chance does Microsoft have getting enough of TSMCs capacity?
Looking at their balance sheet: plenty.
TSMC is all about pay to play. Apple is first in line because they’re willing to pay to be first in line. I have no doubt Microsoft can justify spending the money they save not buying nvidia into getting some priority access from TSMC.
Also keep in mind Apple is now on 3nm so there’s likely spare 5nm.
My guess is that Microsoft booked capacity years ago. Either by strategic decision, with the belief they could yield the capacity to someone else, or through their other hardware divisions (see: Xbox).
It’s very likely they knew something like this was coming, as they’ve been doing FPGAs for more than a decade now.
In exchange for paying to build out TSMC's 3mm capacity, Apple gets exclusive rights to use it for the next N years. Who knows what other exclusivity deals or other shenanigans are in place that would prevent Microsoft from acquiring capacity?
The latest Stratechery interview had Pat Gelsinger on it and he was talking about how Intel is happy to package up TSMC dies in their foundries. Relevant quote:
> So it really brings together many of the things that Intel is doing as an IDM, now bringing it together in a heterogeneous environment where we’re taking TSMC dies. We’re going to be using other foundries in the industry, we’re standardizing that with UCIe. So I really see ourself as the front end of this multi-chip chiplet world doing so in the Intel way, standardizing it for the industry’s participation with UCIE, and then just winning a better technology.
Considering that Jensen is on stage with Satya at the moment sharing the keynote of Microsoft Ignite, I suspect NVIDIA won't be going anywhere anytime soon.
A bit surprised to see Jensen's stage appearance since clearly Microsoft's success with its own AI chips means less business for Nvidia's chips.
Because different than the ARM chip also announced in the same Ignite event, Microsoft doesn't exactly "need" nor can fully utilize an AI chip. Google trains its foundational models (e.g. Gemini) on its own TPU hardware but Microsoft's is heavily reliant on OpenAI for its generative AI serving needs.
Unless Microsoft is planning to acquire OpenAI fully and switch over from Nvidia hardware...
Microsoft absolutely runs their own models on their own hardware, at scale, and they have done so for years just like every other hyperscaler -- Project Brainwave was first publicly talked about as far back as 2018. The generative LLM craze is a recent phenomenon in comparison. They are absolutely going to go all in on putting AI functionality in Bing, in Excel, in Windows, etc etc. To do that, you need hardware.
None of this is really strange. It also wasn't strange when Google announced H100 systems while also pushing TPUs they developed. Microsoft has Jensen on stage because customers of Microsoft Azure demand Nvidia products. Customers of Google Cloud demand Nvidia products. So, they provide them those products, because not providing them loses those customers. It's that simple. Everyone involved in these deals acknowledges this.
They’ve been doing custom accelerators for Bing for a few years. See: project catapult (2012). It’s all on FPGA, but similar.
In some ways, Microsoft was 10 years ahead, but they are terrible as an organization at proliferating research projects to production across multiple orgs.
Yeah! Did/do you work on it? The original publications were good timing; I was working as a consultant on an FPGA-based ML accelerator at the time the original stuff was talked about, and I really enjoyed reading everything I could about Brainwave! Really neat project from both a system design perspective (e.g. heterogeneous compiler) to the choice of using and interconnecting FPGAs and integrating the network/software/ML stack (IIRC, there was a good paper on the overlay network they used to make those custom functions available on the global network fabric.)
I'm guessing at this point the ASICs make a lot more economic sense, though. :)
Sorta! Tangentially on the data science side - Brainwave pre-dates me being at MS though. It's a really cool project and I'm glad someone else also thinks so :) I was enamored with FPGAs from undergrad, one of the professors I did research with was a consultant on the original Catapult project, I thought it was the coolest idea. It was pretty awesome to be able to come full circle a fews years later.
...they're going to overbid on a studio that was actively falling apart, after being rebuffed from buying one of the biggest giants in the business[0], all as part of an ill-advised attempt to muscle into a game business they didn't understand?
Inferentia (inf1) was GA'ed in December 2019 so it's actually
almost 4 years old now. The trainium (trn1) chips and the Inferentia 2 (inf2) refresh is indeed 1 year old though.
Why? Inferentia => inference, trainium => training. Given the usually naming of AWS product, having one where the name roughly matches what it does is pretty good?
TPU is pretty good but is associated with Google. MTIA is an acronym but still maps to what the chip does. ~~"Cobalt" is worse as it does not mean anything~~ . Cobalt is the CPU chip, MAIA is the accelerator so this matches Meta's naming.
Funny, that's precisely why I think the names are bad. It's like if Google had chosen "Search-ola" as their name. Way too on the nose and/or lazy. Having said that, I don't really care all that much and I imagine that may have been the spirit of those who chose the names.
heh, as someone who has to deal with this nonsense all day <https://aws.amazon.com/products/> I would for sure welcome some straightforward naming. $(echo "AWS Fargate" | sed s/Fargate/ServerlessContainerium/)
My previous role was a lot of AWS, and I became convinced that the value of an AWS cert was mostly learning how to map all of the product names to their actual functions.
I don’t really know what an active directory is, but I assume that the default type of directory is a passive one, in that it just holds files or subdirectory (it doesn’t act). An active directory sounds like a directory that is going to play tricks on me.
Entra ID sounds like a type of ID.
I’m not sure how something could legitimately have each of these names. I assume the functionality changed pretty dramatically over the lifespan of the product?
That’s kind of a nice take actually. For people like me who did Windows systems engineering at the beginning of the millennium, Active Directory is a very household name. But there’s not much actual meaning behind it. Everything Microsoft was active in that era - Active Directory, Active Desktop, ActiveX, Active Server Pages (lives on as ASP.NET) and probably some more I forgot.
I should admit that I was somewhat lying, I’ve encountered Active Directory before and know it is something vaguely identity/login related.
I was pretty confused the first time I encountered it, though. Could tell from context that it had something to do with accounts, but thought maybe it could be for syncing a user’s home directory or something!
It actually isn’t a terrible name, in isolation, since a directory (like, the non-digital version) was for keeping identities. But “directory” in tech has a pretty strong association with file systems.
The term "directory" was actually already quite common for these kinds of systems (directory services) when AD came about, which is probably why it was called Active Directory. The Active bit was some kind of "en vogue" term for Microsoft back then, as mentioned ;)
It technically builts upon (amongst other systems) LDAP, the Lightweight Directory Access Protocol and X.500
I guess you could think of it more like a telephone directory - which again is also where the file system metaphor has its roots I guess. So the two are not so different in the end.
I manage our small company’s Microsoft 365 / Azure tenant and only rarely need to use the portal.
A month or so ago my laptop was requiring a BitLocker recovery code before it would boot. I spent ages looking for Azure AD, before eventually discovering it was renamed to Entra. They should have had a transition period where it would have “ (formerly called Azure Active Directory)” on its name.
That article shows that it takes about 50x as long to train gpt-3 with intel's offering vs Nvidia. At least in the current environment, if you are training llms I think almost no amount of cost savings can justify that.
That 50X is only if you can afford one thousand NVIDIA H100.
There cannot be more than a handful of companies in the entire world that could afford such a huge price (tens of millions of $).
In comparison with a still extremely expensive cluster of 64 NVIDIA H100, the difference in speed would reduce to only two to three times, and paying several times less for the entire training becomes very attractive.
The problem is not having so much money available.
Such a big expense only makes sense for a company where spending that amount would bring hundreds of millions of $ of additional revenue.
I doubt that any of the companies that have already spent such amounts have recovered even a small part of their expenses. It is more likely that they bet on future revenues, but it remains to be seen who will succeed to achieve that.
Kinda, companies of that scale regularly spend more than that on (often) random R&D.
Sure if there is a plausible ROI, they’d have no issues dropping that much money (actually far more). Revenues for fortune 500’s are going to be in the 10’s of billions anyway, and it wouldn’t be hard to make an argument that random AI project could increase that by a couple percent or decrease costs a couple percent, which would more than provide that ROI.
Their biggest issue is usually having anyone in leadership that has a clue enough to even propose something plausible, let alone get a team together to give it a plausible go.
Upfront, training costs 1000x more than inference - about 0.01/token vs 0.01/1000 tokens. But considering the user base size and the size of the training set - 15T tokens for GPT-4, I estimate the total inference cost becomes equal to training at around 10K tokens/user/month and 100M users.
I would say software is very important. We already have tons of different models, standards, library, etc. I usually have smooth experience if I am using NVDA, but any other variation, I have to spend some times to get things started.
Supporting the common librarires that I use is very important for me to chose the cloud platform.
"Manufactured on a 5-nanometer TSMC process, Maia has 105 billion transistors — around 30 percent fewer than the 153 billion found on AMD’s own Nvidia competitor, the MI300X AI GPU. “Maia supports our first implementation of the sub 8-bit data types, MX data types, in order to co-design hardware and software,” says Borkar. “This helps us support faster model training and inference times.”"
> Microsoft said it does not plan to sell the chips
Add it to the list of things you can't buy at any price, and can only rent. That list is getting pretty long, especially if you count "any electronic device you can't fully control or modify".
Google does the same thing with their TPUs. The masses will be left with the NVidia monopoly, while large companies will be able to free themselves from that.
> The masses will be left with the NVidia monopoly, while large companies will be able to free themselves from that.
My bet: if it really becomes clear what capabilities an AI accelerator chip needs and lots of people want to run (or even train) AIs on their own computers, AI accelerators will appear at the market. This is how capitalism typically works.
My further bet: these AI accelerators will initially come from China.
Just look at the history of Bitcoin: initially the blocks were mined on CPUs, but then the miners switched to GPUs and "everybody" was complaining about increasing GPU prices because of all the Bitcoin mining. At some moment, Bitcoin mining ASICs appeared from China and after those spread, GPUs were not attractive anymore for Bitcoin mining (of course the cryptocurrency fans who bought the GPUs for mining attempted to use their investment for mining other cryptocurrencies).
The capital costs are enormous, not even counting the CUDA moat. It takes years to start producing a big AI processor.
Yet many startups and existing designers anticipated this demand correctly, years in advance, and they are all still kinda struggling. Nvidia is massively supply constrained. AI customers would be buying up MI250s, CS-2s, IPUs, Tenstorrent accelerators, Gaudi 2s and so on en masse if they wanted to... But they are not, and its not going to get any easier once the supply catches up.
Unless there's a big one in stealth mode, I think we are stuck with the hardware companies we have.
Is there not a distributed computing potential here like there was for crypto mining? Some sort of seti@home/boinc like setup where home users can donate or sell compute time?
Yes, see projects like the AI Horde and Petals. I highly recommend the Horde in particular.
Theres also some kind of actual AI crypto project that I wouldn't touch with a 10 foot pole.
But ultimately, even if true distribution like Petals figures out the inefficiency (and thats hard), it had the same issue as non Nvidia hardware: its not turnkey.
you can setup a computer and sell time on it on a couple of saas platforms, but only for inference. for training, the slowness of the interconnect between nodes become a bottleneck.
> Yet many startups and existing designers anticipated this demand correctly, years in advance, and they are all still kinda struggling. Nvidia is massively supply constrained. AI customers would be buying up MI250s, CS-2s, IPUs, Tenstorrent accelerators, Gaudi 2s and so on en masse if they wanted to... But they are not, and its not going to get any easier once the supply catches up.
Can you order any of these devices online as a regular person? Anybody can order a $300 Nvidia GPU and program it. This is the reason why deep learning originated on the GPUs. Forget those other AI accelerators, even if you bought something like a consumer grade AMD GPU, you couldn't program it because it's restricted. The reason why Nvidia's competitors are struggling is because their hardware is either too expensive or hard to buy.
> Yet many startups and existing designers anticipated this demand correctly, years in advance, and they are all still kinda struggling.
As I already hinted in my post: I see a huge problem in the fact that in my opinion it still is not completely clear to this day which capabilities an AI accelerator really needs - too much is in my opinion still in a state of flux.
The answer is kinda "whatever Nvidia implements." Research papers literally build around their hardware capabilities.
A good example of this is Intel canceling, and AMD sidelining, their unified memory CPU/GPU chips for AI. They are super useful!.. In theory. But actually, they totally useless because no one is programming frameworks with unified memory SoCs in mind, as Nvidia does not make something like that.
> My bet: if it really becomes clear what capabilities an AI accelerator chip needs and lots of people want to run (or even train) AIs on their own computers, AI accelerators will appear at the market.
My bet: in 6 months jart will have models running on local or server, with support for all platforms and using only 88K of ram ;)
nvidia is a $1.2 trillion dollar company (the 6th largest company by cap), and at this point AI is a huge component of that wealth. It has appreciated by 3.3x since just the beginning of this year.
If any of these companies truly made competitive silicon they absolutely would commercialize it.
I suspect they aren't as competitive as the press releases hold them to be, and this Microsoft entrant is likely to follow the same path. Like Google, Tesla, Amazon and others it seems mostly an initiative to negotiate discounts from nvidia.
It would be great if there were really competition. When Google was hyped about their Tensor chips they did have a period where they were looking to commercialize it, and there are some pretty crappy USB products they sell.
They are commercializing the silicon, by selling access to it on their clouds.
Now, I know that what you actually mean is selling the chips themselves to third parties :) But it's not obvious that there's any point to it given their already existing model of commercializing the chips.
First, literally everyone is already supply-constrained due to limits on high end foundry capacity. Nvidia has a ton of capacity because they're one of TSMC's top two customers. The big tech companies will have much smaller allocations which are used up just supplying their own clouds. Even if the demand for buying these chips rather than renting were there, they just don't have the chips to sell without losing out on the customers who want to rent capacity.
Second, the chips by themselves are probably not all that useful. A lot of the benefit is coming from the silicon/system/software co-design. (E.g. the TPUv4 papers spent as much attention on the optical interconnect as the chips). Selling just chips or accelerator cards wouldn't do much good to any customers. Nor can they just trust that systems integrators could buy the cards and build good systems to house them in. They need to sell and support massive large scale custom systems to third parties. That's not a core competency for any of them, it'll take years to build up that org if you start now. And it means they need to ship the software to the customers, it can't continue being the secret sauce any more.
Nvidia on the other hand has been building up an ecosystem and organization for exactly this for the last decade.
> Nvidia has a ton of capacity because they're one of TSMC's top two customers. The big tech companies will have much smaller allocations which are used up just supplying their own clouds.
And TSMCs top customer is not even playing in the cloud space.
If the software still sucks with what CUDA can do, in graphics tooling, polyglot support and optmized libraries, across at very least Windows and GNU/Linux, than whatever the card is capable of doing isn't much relevant.
Hardware startups aren’t going to stand a chance because they have to fight for the scraps of capacity that is left over after Apple and NVidia and the cloud providers use what they can.
This is a custom chip that they are making. I don't think that they should be required to sell it, but if others find it valuable you could expect to see hardware startups making their own RISC-V AI chips as well that you could buy.
I’m curious to know what everyone thinks of this trend. Do you view it as a good thing, bringing efficiency and economy of scale, competition and so forth? Or do you consider it a bad thing, another salvo in the War on General Purpose Computing [1] so vividly described by Cory Doctorow?
I, personally, am interested in retrocomputing, amateur/hobbyist electronics, and hobbyist computing (including semiconductors [2]). While these techniquess and devices may be light years away from anything resembling a computer that can compete with SotA commercial offerings, they do offer the promise of “keeping the candle lit” as it were. I will note that if you follow Sam Zeloof’s chronicles, he progressed through the earliest phases of semiconductor development far faster than the industry did back when it was pioneering the technology. Of course, he had the benefits of knowing it was already possible and access to the written knowledge of the experts who went before him.
And this is why even if Windows is gone tomorrow, it will be a phyrric victory for the usual anti-Microsoft crowd, Microsoft will continue to matter in domains nowadays much more relevant to society.
Every reasonable company in that situation needs to play all sides of this. Even the comically bad management at Google understands this: Google has their own LLMs, their own hardware for ML, and they also buy a ton of nvidia chips for cloud customers. It's a no brainer.
Not a lot of information about the chips yet. About 100B transistors in the AI chip. For comparison, an RTX 4090 has 76B, and an H100 has about 80B. So the Maia chip is pretty massive.
GPUs (and AI chips) are highly parallel, containing thousands upon thousands of the same compute units. The performance of these chips is very much dependent on having a sheer number of transistors to form into as many compute units as possible.
If we assume that Microsoft is roughly able to architect compute units of a similar performance-to-number-of-transistors ratio as nVidia is, then having twice the number of transistors should roughly result in twice the performance.
That is very different than it is with typical software. If you give a programmer who needs to write 100 lines of code to solve a given problem 100 more lines to fill, he won't simply be able to copy-paste his 100 lines another time and by that action be twice as fast at solving whatever problem you tasked him with. With GPU compute units, such copy-pasting of compute units is exactly what's being done (at least until you hit the limits of other resources such as management units, memory bandwidth etc.).
In a way, it is the opposite. Code is what you execute. The transistors are the engines that do the execution. They are going to expect a chip they designed with 105B transistors to perform (speed/efficiency/whatever) in the same ballpark as a high=end GPU for their AI workloads.
It is like knowing the kind of engine a car has. Not all V8 gas engines produce the same power, but knowing that it is a V8 instead of an inline three cylinder does give you an idea of the expected performance characteristics.
I thought an interesting point was the liquid cooling -- unclear how important this is to them, but I'm guessing it means that they designed it with a TDP that requires liquid cooling.
This (wanting higher density) is the opposite of the trade-off that I was expecting. In my (limited and out of date) experience, power was the limiting factor before space, and I believe AI racks have very high power draws already.
I would have guessed this would be because larger nodes would be better for AIs tight communication patterns, but they specifically call out datacenter space as the constraint. Curious if anyone knows more about this
My understanding is that if you are renting your data center space right now, power still is the limiting factor and you will need to leave some rack space empty to be able to run GPUs.
On the other hand, if you are building your own data center, which is the case for Microsoft, presumably you can arrange high power zone to run GPUs.
I am a little confused on all this.
I thought conventional wisdom was that creating "super good/ top performant " computer processors was hard, and unless you could produce them at scale prohibitable expensive.
Is this done as a bridge until/if Nvidia is able to deliver their processors
fast enough?
I would think that competitors to Nvidea would have serious competitors on the market already if competing can be done by Microsoft for whom producing hardware is not their main business focus.
Cobalt is based on Arm CSS model, so it's Neoverse series probably N2 (based on 128 core count). AWS Graviton 2/3 is N1/V1, and Nvidia Grace is V2. Ampere Altra is N1 and the new Ampere One is publicly unknown.
Several Chinese companies also develop chips based on N2, like Alibaba T-head and ZTE Sanechips. I worked on software tunings for both of them. It's good to see more and more Arm chips.
I genuinely don't see how x86 architecture will continue to survive the next 10 years. It will of course take longer to change home desktop users to new architectures; they will be the last segment to switch, but it seems all but inevitable.
BTW, I'm not even speaking to whether x86 can compete at the same power per watt... I think it just won't make sense financially to be out of sync with the industry.
I care vastly more about raw performance than energy usage for my home systems. I also have good reasons to care about the best single core performance. I don't see x86 going away that fast.
FWIW, the current king of single core Geekbench is the M3 chips. Even the base M3 scores as high as the i9-14900K and higher than the Ryzen 9 7950X3D, at less than half their TDPs.
The top consists of what appears to be an Intel i3-10100 overclocked past 13GHz(!), a Ryzen 7 5800H at 2.8GHz, and then an i9-14900K at just below 800MHz.
That page ranks individual test run, so the top ranking are filled with outliers of internal system or crazily overclocked liquid cooled behemoth. When a CPU has appeared in sufficiently many test runs, the aggregate result, which is more representative of the real performance, will appear on https://browser.geekbench.com/processor-benchmarks.
The i9-14900K and M3 actually haven't appeared in the official chart, but you can search for them as they already have thousands of test runs[0][1]. Both of them score around 3100 in single core, and around 21000 in multicore (for the M3 Max).
Mobile, desktop, laptop, edge, server. These are the domains of compute. 4 out of the 5 domains value power efficiency. Laptop that were once x86 are now coming round to Arm because it really does make a better product i.e battery life and thermals. For the server, savings in energy and cost of chip manufacturing, datacentres and users both benefit.
>are now coming round to Arm because it really does make a better product i.e battery life and thermals.
ISA doesn't imply performance characteristics.
It's like saying that programming language (syntax) has performance implication.
No, it doesn't. Everything is up to the compiler, runtime and standard library.
Of course there may be some feature that make compiler's life easier, but still things are way, way more complicated than "just take ARM ISA and you'll be king"
It's more the fact that power efficiency is something Arm values quite lot in their philosophy since their products are used in mobile and edge devices.
Until the Windows developer community actually cares about ARM, they will continue to be nice to have laptops that most consumers won't care.
Microsoft isn't Apple or Google in this regard, dragging developers into new worlds, and it is quite telling that they had now to put up some kind of ARM advocacy action.
It's like saying that programming language (syntax) has performance implication.
No, it doesn't. Everything is up to the compiler, runtime and standard library.
Of course there may be some feature that make compiler's life easier, but still things are way, way more complicated than "just take ARM ISA and you'll be king".
Even if it should be possible to design Arm CPUs competitive with the x86 CPUs, there are a lot of application domains for which no vendor of Arm CPUs has ever attempted to make competitive Arm CPUs.
For example, for scientific computation and computer-aided design, Fujitsu is the only company that has designed Arm CPUs that can compete with the x86 CPUs, but they do not sell their CPUs on the free market.
For a huge company, the floating-point performance of the CPUs is less important, because they can use datacenter GPUs with even greater throughput, so the existing Arm server CPUs could be good enough even for a supercomputer, as they only have to move the data to and from the GPUs. However the small businesses and the individuals cannot use datacenter GPUs, which have huge prices, so they can use only x86 CPUs and there is not the slightest chance of any alternative that would appear soon.
Another application domain for which no Arm vendor has ever made competitive devices is for cheap personal computers.
Nothing what Apple does matters, because they do not sell computers, they only lend computers that remain under their control and which are much more expensive than their alternatives anyway.
Besides Apple, only Qualcomm, Mediatek and NVIDIA are able to make Arm CPUs with a performance similar to the cheapest of the Intel and AMD CPUs, but all these 3 companies demand for their CPUs prices that are several times higher than the prices of comparable x86 CPUs.
Like for CPUs with high floating-point or big integer performance, there is not the slightest chance for the appearance of any company that would be willing to sell Arm CPUs that are both cheap and fast.
Also for server CPUs, all the companies that have attempted to design Arm-based server CPUs have never designed models suitable for small businesses or individuals, but only models that can be bought only by very big companies.
I would not mind to switch from x86 to Arm, but there is absolutely no perspective for that.
If the x86 CPUs would disappear, that would be a catastrophe for the people who do not want to depend on the mercy of the big companies. That would be a return to the times from before the personal computers, when all computing had to be done remotely, in the computing centers of big companies, which have been renamed now as "clouds".
Most of the nodes I see every day are still x86. But I’m in an academic environment, maybe things are slower over here. Does ARM actually seem to have legs outside? (Other than, like, nodes subsidized by Amazon’s wish to in-house everything they can).
It's going to take time, but momentum is seriously starting to build up now. Laptop market going to pick up with Snapdragon X and cloud providers are going to continue with more powerful designs.
But will these run Linux, run AI stuff the way the Apple Silicon seems to be able to do?
Because right now I'm looking to save up for a majorly spec'd Apple MacbookPro just to be able to do this stuff on a *nix operating system. I have no great love for Apple but the abilities of their chips and the vast software offerings are tempting this Linux guy in that direction.
Something that Microsoft cannot seem to do any more. I used Windows from 3.x-WinME; NT3.51-WinXP, getting off before Vista. What I've seen since then has done nothing to tempt me back to their side. Since I unfortunately must deal with Windows 10 at work, it definitely reinforces my distaste for their systems....
So despite thinking OSX has been rendered ugly for the past ten years now, I'm still thinking heavily in that direction, even with the high costs. Snapdragon X sounds nice enough but I have zero expectations based on past behavior at those getting decent Linux support any time soon. And no one else seems to even be trying, that one Thinkpad aside.
Microsoft has taken a couple swings at making ARM laptops (which, we should note, doesn’t appear to be what they are announcing here).
I’d expect a future hypothetical Microsoft ARM laptop to be like a surface-RT; some Windows dropped on a third party ARM chip. Microsoft is a software company, after all. So it is more of a matter of, do they happen to have bought a chip that supports Linux (probably yes, because what hardware manufacturer wants to be dependent on one company for OS support?) and can you get past Secureboot (probably yes, after a couple years at least, when the jailbreak happens).
It's a hell of a cloud too. Geforce Now performs several times better than Microsoft's crappy offering, xCloud (or whatever it's called now).
Nvidia made some really amazing strides in the past few years, taking over cloud gaming where Onlive and Stadia utterly failed, making DLSS, etc.
I just hope they don't abandon us gamers for their AI stuff :( Probably the entire gaming market is way smaller than the potential AI market, just hopefully not too small to matter.
This is likely to be cyclical though. Fast cars[1], large amounts of processing power[2], access to cryptographic algorithms, are all things that started out expensive to be on the cutting edge, some still are expensive on the cutting edge, but then became more affordable for the consumer over time. AI has already had explorations on training models with limited resources. It's feasible, just has tradeoffs that will hopefully get better over time.
The article is actually incorrect. There is no 1970 427. There was a 350 and a 454. Considering the amount of torque it put down, it would likely be a good bit faster with modern tires. You do have to also realize that the 1970 vette was right when emissions controls started destroying performance - the 69 427 put down a good bit more power than the 70 454.
That's like complaining that all books are made by Penguin Press or something, ignoring the effort individual authors make.
Most of the value of chips is in their design, which is owned by different entities. Manufacturing is important too (only TSMC can make these advanced designs at scale and at lower costs than the competition).
The question I have is if Cobalt has any innovations in its design, or if its just bog-standard ARM Neoverse cores. Its not too big of a deal to download ARM's latest designs and slap them into... erm... your designs. But hopefully Microsoft added value somewhere along the road (The Uncore remains important: cache sizes, communications, and the like).
It's not a matter of giving due credit, but supply constraints. Books aren't limited by the availability of printing presses, are they? (maybe they are and I just didn't know?)
But if TSMC is the only company that can do this, they're a bottleneck for the entire world. Not to mention a strategic and geopolitical risk for the West.
It's be nice if some domestic companies invested in fabs again...
Microsoft claims that Cobalt has a much lower power consumption than any other Arm CPUs that they have used.
Presumably this means that Cobalt has a much lower power consumption than the current Ampere CPUs used by Azure.
Most of the power consumption reduction for a given performance may have come from using a more recent TSMC process, together with a more recent Arm Neoverse core, but perhaps there might be also some other innovation in the MS design.
> I wish I could work in some team designing these chips. Maia is probably my dream product to work on. Super new, super cool and one of it’s kind.
You likely became bewitched by their glamorous marketing side. I'd bet that the real work that the team does is very similar to the work that basically every ASIC design team does.
I mean that looks cool and exciting if it is really small colocated team if one is lead engineer or director of large team of engineers so that they can learn do things that interest them and assign boring/routine work to individuals. Otherwise it would be just another job where people work on assigned JIRA stories and go home in evening.
In my experience working on anything that's exciting and full of marketing buzz is a sure road to burnout, with hype-dictated unrealistic deadlines and all.
Chip manufacturers (including Nvidia) really missed where the market was going if customers like Microsoft, Amazon, etc. feel the need to make their own chips.
Microsoft, Amazon, etc. feel the need to make their own chips so that they don't let NVIDIA take all the profits, not because they think NVIDIA is incompetent
There are real profits in the chip space and considering that there are 3 fabs and one clear leader who will make anything for anyone this is a sign that NVIDIA are doing a great job.
I think they got the direction right but the price wrong. They are used to dealing with super-computers as the main server clients who aren't big enough to fight back if the prices creep to high.
Or cloud vendors have decided that at their scale owning their own chips represents a valuable differentiation opportunity, and they don't think of them as commodities.
Nvidia rode a gaming high from RTX straight into a crypto high and then straight into the AI high. Their products just print money right now and nobody else is close yet. They can always lower prices later, but for now they're getting filthy rich...
My guess is that it's more about the wish of cloud vendors to control everything from the hw to sw: it's called vertical integration, and it's common in a lot of businesses.
It makes a lot of sense from the point of view of cloud giants.
The demand for chips has increased so much that it's profitable for these customers to start producing their own chips. This doesn't mean Nvidia's chips are bad or that they missed anything.
So does NVIDIA, just turns out NVIDIA can profit more because of their software and the software ecosystem around it adds so much value, nobody can compete.
It's gonna take a lot of work and many years to approach that. Even by leveraging AI. And by diverging with their own chips, they're gonna miss out on the mainstream.
I don't know much about the software-side of nvidia/gpus + LLMs.
Can you catch me up on what software they've created means as a differentiator? Is that CUDA? How does this relate to things like tensorflow with Google's chips?
The only thing that's changed is that they're scaling like crazy now and can justify overhead that comes with designing ASICs versus using off the shelf parts.