> The software titan is rather late to the custom silicon party. While Amazon and Google have been building custom CPUs and AI accelerators for years, Microsoft only revealed its Maia AI accelerators in late 2023.
They are too late for now, they realistically hardware takes a couple generations to become a serious contender and by the time Microsoft has a chance to learn from their hardware mistakes the “AI” bubble will have popped.
But, there will probably be some little LLM tools that do end up having practical value; maybe there will be a happy line-crossing point for MS and they’ll have cheap in-house compute when the models actually need to be able to turn a profit.
At this point it will take a lot of investment to catch up. Google relies heavily on specialized interconnects to build massive tpu clusters. It's more than just designing a chip these days. Folks who work on interconnects are a lot more rare than engineers who can design chips.
> hardware takes a couple generations to become a serious contender
Not really and for the same reason Chinese players like Biren are leapfrogging - much of the workload profile in AI/ML is "embarrassingly parallel", thus reducing the need for individual ASICs to be bleeding edge performant.
If you are able to negotiate competitive fabrication and energy supply deals, you can mass produce your way into providing "good enough" performance.
Finally, the persona who cares about hardware performance in training isn't in the market for cloud offered services.
As I understood it the main bottleneck is interconnects, anyhow. It's more difficult to keep the ALUs fed than it is to make them fast enough, especially once your model can't fit in one die/PCB. And that's in principle a much trickier part of the design, so I don't really know how that shakes out (is there a good enough design that you can just buy as a block?)
And current LLM architectures affinitize differently to HW than DNNs even just a decade ago. If you have the money and technical expertise (both of which I assume MS has access to) then a late start might actually be beneficial.
Most of the big players started working on hardware for this stuff in 2018/2019. I worked at MSFT silicon org during this time. Meta was also hiring my coworkers for similar projects. I left a few years ago and don’t know current state but they already have some generations under their belt
> The software titan is rather late to the custom silicon party. While Amazon and Google have been building custom CPUs and AI accelerators for years, Microsoft only revealed its Maia AI accelerators in late 2023.
They are too late for now, they realistically hardware takes a couple generations to become a serious contender and by the time Microsoft has a chance to learn from their hardware mistakes the “AI” bubble will have popped.
But, there will probably be some little LLM tools that do end up having practical value; maybe there will be a happy line-crossing point for MS and they’ll have cheap in-house compute when the models actually need to be able to turn a profit.