Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Impressions from last week’s CVPR, a conference with 12k attendees on computer vision - Pretty much everyone is using NVIDIA GPUs, and pretty much everyone isn’t happy with the prices, and would like some competition in the space:

NVIDIA was there with 57 papers, a website dedicated to their research presented at the conference, a full day tutorial on accelerating deep learning, and ever present with shirts and backpacks in the corridors and at poster presentations.

AMD had a booth at the expo part, where they were raffling off some GPUs. I went up to them to ask what framework I should look into, when writing kernels (ideally from Python) for GPGPU. They referred me to the “technical guy”, who it turns out had a demo on inference on an LLM. Which he couldn’t show me, as the laptop with the APU had crashed and wouldn’t reboot. He didn’t know about writing kernels, but told me there was a compiler guy who might be able to help, but he wasn’t to be found at that moment, and I couldn’t find him when returning to the booth later.

I’m not at all happy with this situation. As long as AMDs investment into software and evangelism remains at ~$0, I don’t see how any hardware they put out will make a difference. And you’ll continue to hear people walking away from their booth, saying “oh when I win it I’m going to sell it to buy myself an NVIDIA GPU”.




> I’m not at all happy with this situation. As long as AMDs investment into software and evangelism remains at ~$0, I don’t see how any hardware they put out will make a difference.

It appears AMD initial strategy is courting the HPC crowd and hyperscalers, they have big budgets, lower support overhead and are willing and able to write code that papers-over AMDs not-great software while appreciating lower-than-Nvidia TCO. I think this this incremental strategy is sensible, considering where most of the money is.

As a first mover, Nvidia had to start from the bottom up; CUDA used to run only/mostly on consumer GPUs - AMD is going top-down, starting with high-margin DC hardware, before trickling down rack-level users, and eventually APUs later as revenue growth allows more re-investment.


They’re making the wrong strategic play.

They will fail if they go after the highest margin customers. Nvidia has every advantage and every motivation to keep those customers. They would need a trillion dollars in capital to have a chance imho.

It would be like trying to go after Intel in the early 2000s by trying to target server cpus, or going after the desktop operating system market in the 90s against Microsoft. Its aiming for your competition where they are strongest and you are weakest.

Their only chance to disrupt is to try to get some of the customers that Nvidia doesn’t care about, like consumer level inference / academic or hobbyist models. Intel failed when they got beaten in a market they didn’t care about, i.e mobile / small power devices.


This is a common sentiment, no doubt also driven by the wish thay AMD would cater to us.

But I see no evidence that the strategy is wrong or failing. AMD is already powering a massive and rapidly growing share of Top 500 HPC:

https://www.top500.org/statistics/treemaps/

AMD compute growth isn't in places where people see it, and I think that gives a wrong impression. (Or it means people have missed the big shifts over the last two years.)


It would be interesting to see how much these "supercomputers" are actually used, and what parts of them are used.

I use my university's "supercomputer" every now and then when I need lots of VRAM, and there are rarely many other users. E.g. I've never had to queue for a GPU even though I use only the top model, which probably should be the most utilized.

Also, I'd guess there can be nvidia cards in the grid even if "the computer" is AMD.

Of course it doesn't matter for AMD whether the compute is actually used or not as long as it's bought, but lots of theoretical AMD flops standing somewhere doesn't necessarily mean AMD is used much for compute.


It is a pretty safe bet that if someone builds a supercomputer there is a business case for it. Spending big on compute then leaving it idle is terrible economics. I agree with Certhas in that although this is not a consumer-first strategy it might be working. AMDs management are not incapable, for all that they've been outmanoeuvred convincingly by Nvidia.

That being said, there is a certain irony and schadenfreude in the AMD laptop being bricked from the thread root. The AMD engineers are at least aware that running a compute demo is an uncomfortable experience on their products. The consumer situation is not acceptable even if strategically AMD is doing OK.


I find it a safer bet that there are terrible economics all over. Especially when the buyers are not the users, as is usually the case with supercomputers (just like with all "enterprise" stuff).

In the cluster I'm using there's 36 nodes, of which 13 are currently not idling (doesn't mean they are computing). There are 8 V100 GPUs and 7 A100 GPUs and all are idling. Admittedly it's holiday season and 3AM here, but this it's similar other times too.

This is of course great for me, but I think the safer bet is that the typical load average of a "supercomputer" is under 0.10. And the less useful the hardware, the less will be its load.


It is not a reasonable assumption to compare your local cluster to the largest clusters within DOE or their equivalents in Europe/Japan. These machines regularly run at >90% utilization and you will not be given an allocation if you can’t prove that you’ll actually use the machine.

I do see the phenomenon you describe on smaller university clusters, but these are not power users who know how to leverage HPC to the highest capacity. People in DOE spend their careers working to use as much as these machines as efficiently as possible.


In Europe at least supercomputer are organised in tiers. Tier 0 are the highest grade, tier 3 are small local university clusters like the one you describe. Tier 2 or Tier 1 machines and upward usually require you to apply for time. They are definitely highly utilised. Tier 3 the situation will be very different from one university to the next. But you can be sure that funding bodies will look at utilisation before deciding on upgrades.

Also this amount of GPUs is not sufficient for competitive pure ML research groups from what I have seen. The point of these small decentral underutilized resources is to have slack for experimentation. Want to explore ML application with a master student in your (non-ML) field? Go for it.

Edit: No idea how much of the total hpc market is in the many small instalks, vs the fewer large ones. My instinct is that funders prefer to fund large centralised infrastructure, and getting smaller decentralised stuff done is always a battle. But that's all based on very local experience, and I couldn't guess how well this generalises.


    > It is a pretty safe bet that if someone builds a supercomputer there is a business case for it.
As I understand, most (95%+) of the market for supercomputers is gov't. If wrong, please correct. Else, what do you mean by "business case"?


When you ask your funding agency for an HPC upgrade or a new machine, the first thing they will want from you are utilisation numbers of current infrastructure. The second thing they will ask is why you don't just apply for time on a bigger machine.

Despite the clichés, spending taxpayer money is really hard. In fact my impression is always that the fear that resources get misused is a major driver of the inefficient bureaucracies in government. If we were more tolerant of taxpayer money being wasted we could spend it more efficiently. But any individual instance of misuse can be weaponized by those who prefer for power to stay in the hands of the rich...


At least where I'm from, new HPC clusters aren't really asked for by the users, but they are "infrastructure projects" of their own.

With the difficulty of spending taxpayer money, I fully agree. I even think HPC clusters are a bit of a symptom of this. It's often really hard to buy a beefy enough workstation of your own that would fit the bill, or to just buy time from cloud services. Instead you have to faff with a HPC cluster and its bureaucracy, because it doesn't mean extra spending. And especially not doing a tender, which is the epitome of the inefficiency caused by the paranoia of wasted spending.

I've worked for large businesses, and it's a lot easier to spend in those for all sorts of useless stuff, at least when the times are good. When the times get bad, the (pointless) bureaucracy and red tape gets easily worse than in gov organizations.


> At least where I'm from, new HPC clusters aren't really asked for by the users, but they are "infrastructure projects" of their own.

Because the users expect them to be renewed and improved. Otherwise the research can’t be done. None of our users tell us to buy new systems. But they cite us like mad, so we can buy systems every year.

The dynamics of this ecosystem is different.


> It would be interesting to see how much these "supercomputers" are actually used, and what parts of them are used.

I’m in that ecosystem. Access is limited, demand is huge. There’s literal queues and breakneck competition to get time slots. Same for CPU and GPU partitions.

They generally run at ~95% utilization. Even our small cluster runs at 98%.


Did your university not have a bioinformatics department?


It does. And meteorology, climatology and cosmology for example.


Well then I'm really unsure what's happening. Any serious researcher in either of those fields should be able and trying to expand into all available supercompute.


Maybe they just don't need them? At least a bioinformatics/computational science professor I know runs most of his analyses on a laptop.


I see a lot of evidence, in the form of a rising moat for NVidia.


Super computers are in 95% cases government funded and I recommend that you check in conditions for tenders and how government has check on certain condition in buying. That isn't a normal business partner who only looks at performance, there are many more other criteria in the descision making.

Or let me ask you directly, can you name me one enterprise which would buy a super computer and wait 5+ years for it and fund the development of HW for it which doesn't exist yet? At the same time when the competition can deliver a super computer within the year with an existing product?

No sane CEO would have done Frontier or El Capitan. Such things work only with government funding where the government decides to wait and fund an alternative. But AMD is indeed a bit lucky that it happened or otherwise they wouldn't been forced to push the Instinct line.

In the commercial world, things work differently. There is always a TCO calculation. But one critical aspect since the 90s is SW. No matter how good the HW is, the opportunity costs in SW could force enterprises to use the inferior HW due to SW deployment. If vision computing SW in industry is supporting and optimized for CUDA or even runs only with CUDA then any competition has a very hard time penetrating that market. They first have to invest a lot of money to make their products equally appealing.

AMD makes a huge mistake and is by far not paranoid enough to see it. For 2 decades, AMD and Intel have been in a nice spot with PC and HPC computing requiring x86. It basically to this date has guaranteed a steady demand. But in that timeframe mobile computing has been lost to ARM. ML/AI doesn't require x86 as Nvidia demonstrates by combining their ARM CPUs into the mix but also ARM themselves want more and more of the PC and HPC computing cake. And MS is eager to help with OS for ARM solutions.

What that means is that if some day x86 isn't as dominant anymore and ARM becomes equally good then AMD/Intel will suddenly have more competition in CPUs and might even offer non-x86 solutions as well. Their position will therefore drop into yet another commodity CPU offering.

In the AI accelerator space we will witness something similiar. Nvidia has created a platform and earns tons of money with it by combining and optimizing SW+HW. Big Tech is great at SW but not yeat at HW. So the only logical thing to do is getting better at HW. All large Tech companies are working on their own accelerators and they will build their platform around it to compete with Nvidia and locking in customers all the same way. The primary losers in all of this will be HW only vendors without a platform, hoping that Big Tech will support them on their platforms. Amazon and Google have already shown today that they have no intention to support anything besides their platform and Nvidia (which they only must due to customer demand).


I am that crazy ceo building a super computer, for rent by anyone who wants it. We are starting small and growing with demand.

Our first deployment is 3x larger flops than Cheyenne and a fraction of the cost.

https://en.wikipedia.org/wiki/Cheyenne_(supercomputer)


The savings are an order of magnitude different. Switching from Intel to AMD in a data center might have saved millions if you were lucky. Switching from NVidia to AMD might save the big LLM vendors billions.


Nvidia have less moat for inference workloads since inference is modular. AMD would be mistaken to go after training workloads but that's not what they're going after.


I only observe this market from the sidelines... but

They're able to get the high end customers, and this strategy works because they can sell the high end customers high end parts in volume without having to have a good software stack; at the high end, the customers are willing to put in the effort to make their code work on hardware that is better in dollars/watts/availability or whatever it is that's giving AMD inroads into the supercomputing market. They can't sell low end customers on GPU compute without having a stack that works, and somebody who has a small GPU compute workload may not be willing or able to adapt their software to make it work on an AMD card even if the AMD card would be a better choice if they could make it work.


They’re going to sell a billion dollars of GPUs to a handful of customers while NVIDIA sells a trillion dollars of their products to everyone.

Every framework, library, demo, tool, and app is going to use CUDA forever and ever while some “account manager” at AMD takes a government procurement officer to lunch to sell one more supercomputer that year.


I'd guess that the majority of ML software is written in PyTorch, not in CUDA, and PyTorch has support for multiple backends including AMD. torch.compile also supports AMD (generating Triton kernels, same as it does for NVIDIA), so for most people there's no need to go lower level.


GPUs are used for more than only ML workloads.

CUDA relevance in the industry is so big now, that NVidia has several WG21 seats, and helps driving heterogenous programming roadmap for C++.


You can use PyTorch for more than ML. No need to use backprop. Thinks of it as GPU accelerated NumPy.


I would like to see OctaneRender done in Pytorch. /s


Sure, but if the OctaneRender folk wanted to support AMD, then I highly doubt they'd be interested in a CUDA compatability layer either - they'd want to be using the lowest level API possible (Vulkan?) to get close to the metal and optimize performance.


See, that is where you got all wrong, they dropped Vulkan for CUDA, and even made a talk about it at GTC.

https://www.cgchannel.com/2023/11/otoy-releases-first-public...

https://www.cgchannel.com/2023/11/otoy-unveils-the-octane-20...

And then again, there are plenty of other cases where Pytorch makes absolute no sense in GPU, which was the whole starting point.


> See, that is where you got all wrong

I said that if they wanted to support AMD they would use the closest-to-metal API possible, and your links prove that this is exactly their mindset - preferring a lower level more performant API to a higher level more portable one.

For many people the tradeoffs are different and ability to write code quickly and iterate on design makes more sense.


Nvidia's 2024 data center revenue was $46B. They got a long fucking way to go to get to trillion dollars of product.


Take a look at this chart going back ~3Y: https://ycharts.com/indicators/nvidia_corp_nvda_data_center_...

Their quarterly data centre revenue is now $22.6B! Even assuming that it immediately levels off, that's $90B over the next 12 months.

If it merely doubles, then they'll hit a total of $1T in revenue in about 6 years.

I'm an AI pessimist. The current crop of generative LLMs are cute, but not a direct replacement for humans in all but a few menial tasks.

However, there's a very wide range of algorithmic improvements available, which wouldn't have been explored three years ago. Nobody had the funding, motivation, or hardware. Suddenly, everyone believes that it is possible, and everyone is throwing money at the problem. Even if the fruits of all of this investment is just a ~10% improvement in business productivity, that's easily worth $1T to the world economy over the next decade or so.

AMD is absolutely leaving trillions of dollars on the table because they're too comfortable selling one supercomputer at a time to government customers.

Those customers will stop buying their kit very soon, because all of the useful software is being written for CUDA only.


Did you look at your own chart? There's no trend of 200% growth. Rather this last few quarters were a huge jump from relatively modest gains the years prior. Expecting 6 years of "merely doubling" is absolutely bonkers lol

Who can even afford to buy that much product? Are you expecting Apple, Microsoft, Alphabet, Amazon, etc to all dump 100% of their cash on Nvidia GPUs? Even then that doesn't get you to a trillion dollars


Once AI becomes a political spending topic like green energy, I think we’ll see nation level spending. Just need one medical breakthrough and you won’t be able to run a political campaign without AI in your platform.


Meta alone bought 350,000 H100 GPUs, which cost them $10.5 billion: https://www.pcmag.com/news/zuckerbergs-meta-is-spending-bill...

This kind of AI capital investment seems to have helped them improve the feed recommendations, doubling their market cap over the last few years. In other words, they got their money back many times over! Chances are that they're going to invest this capital into B100 GPUs next year.

Apple is about to revamp Siri with generative AI for hundreds of millions of their customers. I don't know how many GPUs that'll require, but I assume... many.

There's a gold rush, and NVIDIA is the only shovel manufacturer in the world right now.


> Meta alone bought 350,000 H100 GPUs, which cost them $10.5 billion

Right, which means you need about a trillion dollars more to get to a trillion dollars. There's not another 100 Metas floating around.

> Apple is about to revamp Siri with generative AI for hundreds of millions of their customers. I don't know how many GPUs that'll require, but I assume... many.

Apple also said they were doing it with their silicon. Apple in particular is all but guaranteed to refuse to buy from Nvidia even.

> There's a gold rush, and NVIDIA is the only shovel manufacturer in the world right now.

lol no they aren't. This is literally a post about AMD's AI product even. But Apple and Google both have in-house chips as well.

Nvidia is the big general party player, for sure, but they aren't the only. And more to the point, exponential growth of the already largest player for 6 years is still fucking absurd.


The GDP of the US alone over the next five years is $135T. Throw in other modern economies that use cloud services like Office 365 and you’re over $200T.

If AI can improve productivity by just 1% then that is $2T more. If it costs $1T in NVIDIA hardware then this is well worth it.


(note to conversation participants - I think jiggawatts might be arguing about $50B/qtr x 24 qtr = $1 trillion and kllrnohj is arguing $20 billion * 2^6 years = $1 trillion - although neither approach seems to be accounting for NPV).

That is assuming Nvidia can capture the value and doesn't get crushed by commodity economics. Which I can see happening and I can also see not happening. Their margins are going to be under tremendous pressure. Plus I doubt Meta are going to be cycling all their GPUs quarterly, there is likely to be a rush then settling of capital expenses.


Another implicit assumption is that LLMs will be SoTA throughout that period, or the successor architecture will have an equally insatiable appetite for lots of compute, memory and memory bandwidth; I'd like to believe that Nvidia is one research paper away from a steep drop in revenue.


Agreed with @roenxi and I’d like to propose a variant of your comment:

All evidence is that “more is better”. Everyone involved professionally is of the mind that scaling up is the key.

However, like you said, just a single invention could cause the AI winds to blow the other way and instantly crash NVIDIA’s stock price.

Something I’ve been thinking about is that the current systems rely on global communications which requires expensive networking and high bandwidth memory. What if someone invents an algorithm that can be trained on a “Beowulf cluster” of nodes with low communication requirements?

For example the human brain uses local connectivity between neurons. There is no global update during “training”. If someone could emulate that in code, NVIDIA would be in trouble.


> They will fail if they go after the highest margin customers.

They are already powering the most powerful supercomputers, so I guess you’re right.

Oh, by coincidence, the academic crowd is the primary user of these supercomputers.

Pure luck.


AMD did go after intels server CPUs in the 2000s, with quite a bit of success.


And it worked mainly because they were a drop-in for Intel processors. Which was and is an amazing feat. I and most people could and can run anything compiled (but avx512 stuff back then on zen1 and 2 ?) without a hitch. And it was still a huge uphill battle and Intel let it happen, what with their bungling of the 10nm process.

I don't see how the same can work here. HIP isn't it right now (every time I try, anyway).


> They would need a trillion dollars in capital to have a chance imho.

All AMD would really need is for Nvidia innovation to stall. Which, with many of their engineers coasting on $10M annual compensation, seems not too far fetched


AMD can go toe to toe with Nvidia on hardware innovation. What AMD had realised (correctly, IMO), is that all they need is for hyperscalers to match/come close to Nvidia on software innovation on AMD hardware - Amazon/Meta/Microsoft engineers can get their foundation models running on M1300X well enough for their needs - CUDA is not much of moat in that market-segment where there are dedicated AI-infrastructure teams. If the price is right, they may shift some of those CapEx dollars from Nvidia to AMD. Few AI practitioners - and even fewer LLM consumers - care about the libraries underpinning torch/numpy/high-level-python-framework/$LLM-service, as long as it works.


That is wrong move personally would start from localllm/llama folks who crave more memory and build up from there.


Seeing that they don't have a mature software stack, I think for now AMD would prefer one customer who brings in $10m revenue over 10'000 customers at $1000 a pop.


I doesn't make sense because they can market to both at the same time.


> It appears AMD initial strategy is courting the HPC crowd and hyperscaler...

I don't agree with this at all! Give me something that I can easily prototype at home and then quickly scale up at work!


> As long as AMDs investment into software and evangelism remains at ~$0

Last time I checked they have been trying to hire a ton of software engineers for improving the applied stacks (CV, ML, DSP, compute, etc) at the location near where I'm located.

It seems like there's a big push to improve the stacks but given that less than 10 years ago they were practically at death's door it's not terribly surprising that their software is in the state it is. It's been getting better gradually but quality software doesn't just show up over night and especially so when things are as complex and arcane as they are in the GPU world.


With margins that high?

There is always financing, there are always people willing to go to the competitor at some wage, there is always a way if the leadership wants to.

If it was just a straight up fab bottleneck? Yeah maybe you buy that for a year or two.

“During Q1, Nvidia reported $5.6 billion in cost of goods sold (COGS). This resulted in a gross profit of $20.4 billion, or a margin profile of 78.4%.”

That’s called an “induced market failure”.


They literally bought Xilinx for their software engineering team. That's at least a thousand firmware engineers and software engineers focused on software stack improvements. That was two years ago. And on top of Xilinx they've been hiring staff like crazy for years now.

The issue was that they basically let everyone go who wasn't building hardware for their essential product lines (CPU & GPU) other than a skeleton crew to keep the software at least mostly functioning. And as much as this seems like it was a bad decision, AMD was probably weeks from bankruptcy by the time they got Zen out the door even despite doing this. Had they not done so, they'd almost certainly closed up entirely.

So for the last ~5 years minimum now they've been building back their software teams and trying to recuperate what they lost in institutional knowledge. That all takes time to do even if you hire back twice as many engineers as you lost.

And so now we are here. Things are clearly improving but nowhere near acceptable yet. But there's a trend of improvement.


> Things are clearly improving

How long am I supposed to wait, as my still-modern AMD GPU sits still-unsupported?

The anecdote above doesn't even sound like there's any improvement at all, let alone "clear" improvement.

And with Zen in 2017 and Zen+ in 2018 the counter is past six years at this point since the money gates opened wide.


> How long am I supposed to wait, as my still-modern AMD GPU sits still-unsupported?

Which GPU do you have? At least according to these docs, on linux the upper chunk of RDNA3 is supported officially but from experience, basically all 6xxx or 7xxxx cards are unofficially supported if you build it for your target arch. 5xxx cards get the short end of the stick and got skipped (they were a rough launch) but Radeon VII cards should also still be officially supported (with support shifting to unofficial status in the next release).

https://rocm.docs.amd.com/en/latest/compatibility/compatibil...

And given that ROCm is pretty core to AMD's support for the windows AI stack (via ONNX), you can assume any new GPUs released from here on out will be supported.


It's 5xxx. And "rough launch" is not an excuse. They've had plenty of time. Is it that different from the other RDNA cards?

The unofficial support for so many cards is not a good situation either.

Edit: Actually, no, I know it's not that different, because some versions of ROCm largely work on RDNA1 if you trick them. They are just refusing to do the extra bit of work to patch over the differences.


I mean it apparently works on RDNA1 now after some effort but they never really attempted to support it because they initially only supported workstation RDNA cards but they didn't have a workstation RDNA1 release.

https://www.reddit.com/r/ROCm/comments/1bd8vde/psa_rdna1_gfx...

I wish they had comprehensive support for basically all recent GPU releases but tbh I'd rather they focus on perfecting support for the current and upcoming generations than spread their efforts too thin.

And ideally with time backports to the older cards will come with time but it's really not a priority over issues on the current generation because those RDNA1 cards were never actually supported in the first place.


Every post I see about trying it has the person run into issues, but maybe Soon it will finally be true.


Have you ever organized anything of size?

Financing is not the bottleneck. Organizational capacity might well be, though. As an organization, AMDs survival depended not on competing with nVidia but on competing with Intel. Now they are established, in what must be one of the greatest come from behind successes in tech history. 8 years ago, Intel was worth 80 times as much as AMD, today AMD has surpassed them:

https://www.financecharts.com/compare/AMD,INTC/summary/marke...

Stock isn't reality, but I wouldn't so easily assume that the team that led AMD to overtake Intel are idiots.


> With margins that high? There is always financing, there are always people willing to go to the competitor at some wage, there is always a way if the leadership wants to.

People love to pop-off on stuff they really know anything about. Let me ask you: what financing do you imagine is available? Like literally what financing do you propose for a publically traded company? Like do you realize they can't actually issue new shares without putting it to a shareholder vote? Should they issue bonds? No I know they should run an ICO!!!

And then what margins exactly? Do you know what the margin is on MI300? No. Do you know whether they're currently selling at a loss to win marketshare? No.

I would the happiest boy if hn, in addition to policing jokes and memes, could police arrogance.


Are you saying that companies lose the ability to secure financing once they go public?


of course not - mentioned 3 routes to securing further financing. did you read about those 3 routes in my comment?


You mentioned them all mockingly. If you weren't trying to suggest none are viable, you need to reword.


This isn't hard: financing routes exist but they aren't as simple or easy or straightforward as the person to whom I was responding makes it seem.


They didn't imply it was notably easy. Your reply there only makes sense if you were trying to say it's nearly impossible. If you're just saying it's kinda hard then your post is weirdly hostile for no reason, reading theirs in an extreme way just so you can correct it harder.


> They didn't imply it was notably easy

Really? I must be reading a different language than English here

> There is always financing, there are always people willing to go to the competitor at some wage, there is always a way if the leadership wants to.


If "always a way" implies anything about difficulty, it implies that there are challenges to overcome, not ease.


I guess there's always a way to play devil's advocate <shrug>


Have you looked into TinyCorp [0]/tinygrad [1], one of the latest endeavors by George Hotz? I've been pretty impressed by the performance. [2]

[0] https://tinygrad.org/ [1] https://github.com/tinygrad/tinygrad [2] https://x.com/realGeorgeHotz/status/1800932122569343043?t=Y6...


I have not been impressed by the perf. Slower than PyTorch for LLMs, and PyTorch is actually stable on AMD (I've trained 7B/13B models).. so the stability issues seem to be more of a tinygrad problem and less of an AMD problem, despite George's ramblings [0][1]

[0] https://github.com/tinygrad/tinygrad/issues/4301 [1] https://x.com/realAnthonix/status/1800993761696284676


He also shakes his fist at the software stack, but loudly enough that it has AMD react to it.


As more a business person than engineer, help me understand why AMD are not getting this, what's the counter argument? Is CUDA just too far ahead, are they lacking the right people in senior leadership roles to see this through?


CUDA is very far ahead. Not only technically, but in mindshare. Developers trust CUDA and know that investing in CUDA is a future proof investment. AMD has had so many API changes over the years, that no one trusts them any more. If you go all in on AMD, you might have to re-write all your code in 3-5 years. AMD can promise that this won't happen, but it's happened so many times already that no one really believes them.

Another problem is simply that hiring (and keeping) top talent is really really hard. If you're smart enough to be a lead developer of AMDs core Machine Learning libraries, you can probably get hired at any number of other places, so why choose AMD.

I think the leadership gets it and understand the importance, I just don't think they (or really anybody) knows how to come up with a good plan to turn things around quickly. They're going to have to commit to at least a 5 year plan and lose money each of those 5 years, and I'm not sure they can or even want to fight that battle.


> Another problem is simply that hiring (and keeping) top talent is really really hard.

Absolutely. And when your mandate for this top talent is going to be "go and build something that basically copies what those other guys have already built", it is even harder to attract them, when they can go any place they like and work on something new.

> I think the leadership gets it and understand the importance, I just don't think they (or really anybody) knows how to come up with a good plan to turn things around quickly.

Yes, it always puzzles me when people think nobody at AMD actually sees the problem. Of course they see it. Turning a large company is incredibly hard. Leadership can give direction, but there is so much baked in momentum, power structures, existing projects and interests, that it is really tough to change things.


CUDA is one area that Nvidia really nailed. When it was first announcement I saw it as something neat but could have never envisioned just how ingrained it would become. This was long before AI training/execution was something really on most people radars.

But for years I have heard the same things from so many people working in the field. "We hate Nvidia because they got it so right but are the only option."


As another commenter points out their strategy appears to be to focus on HPC clients where AMD can focus providing after-sale software support around a relatively small number of customer requests. This gets them some sales while avoiding the level of organizational investment necessary to build a software platform that can support NVIDIA-style broad compatibility and good out-of-the-box experience.


Yes, to add to the other comments, what many don't realize is that CUDA is an ecosystem, C, C++ and Fortran foremost, however NVidia quickly realized that supporting any programming language community to target PTX was a very good idea.

Their GPUs were re-designed to follow C++ memory model, and many NVidia engineers are seat at ISO C++, yet making CUDA the best way to run heterogenous C++. Something that Intel also realized, by acquiring CodePlay, key players in SYCL, and also employing ISO C++ contributors.

Then there are the Visual Studio and Eclipse plugins, and graphical debuggers that allow even to single step shaders if you so wish.


> are they lacking the right people in senior leadership roles to see this through?

Just like Intel, they have an outdated culture. IMHO they should start a software Skunk Works isolated from the company and have the software guys guide the hardware features. Not the other way around.

I wouldn't bet money on either of them doing this. Hopefully some other smaller, modern, and flexible companies can try it.


CUDA is a software moat. If you want to use any gpu other than nvidia, you need to double your engineering budget because theres no easy to bootstrap projects at any level. The hardware prices are meaninglesz if you need a 200k engineer, if they exist, just.to bootstrap a product.


Depending on your hardware budget, the engineering one can look like a rounding error.


Sure, but then youre still on the.side.of NVIDIA because you jave the.budget.


Why give any additional money to Nvidia when you can announce more profits (or get more compute if you're a government agency) by hiring more engineers to enable AMD hardware for less than a few million per year? It's not like Microsoft loves the idea of handing over money to Nvidia if there is a cheaper alternative that can make $MSFT go up.


Say your success rate for replicating CUDA+Nvidia hardware on AMD is 60%. But it will take 2 years. That's not going to be compelling for any large org, especially when the MI300x is cheaper, but not crazy cheaper than an h100.

Especially since CUDA is still rolling out new functionality and optimizations, so the goal posts will keep moving.


> Say your success rate for replicating CUDA+Nvidia[...]

Rational hyperscalers would just stop as soon as their tooling/workloads/models are functional on AMD hardware within an acceptable perf envelope - just like they already do with their custom silicon. Replicating CUDA is just unnecessary, expensive and time-consuming completionism; if some workloads require CUDA, they will be executed on Nvidia clusters that are part of the fleet.


It depends on how cheaper the total solution is and how available the hardware is. If I can't get Nvidia hardware less than six months after I get AMD hardware, I have a couple months to port my software to AMD and still beat my competitor that's waiting for Nvidia. It's always a matter of how many problems can you solve for a given amount of money x time.


Sure, but the "it depends" is carrying a lot of weight. NVIDIA's moat will get you testable software straight out the gate; any other stack currently is a game of "how long can we take to get this going".

Corporations simply arn't interested in long term gains unless there's a straightforward path.


It depends on the problems you have. If you need CUDA, then you married yourself to Nvidia. If you can use libraries that work equally well on both, then you would benefit.

When you are a government agency, it’s more palatable to spend the budget in a way it results in employment of nationals and development of indigenous technologies.


Because if you don't join NVIDA your likely hood of success goes down. So the "more profits" you speak of is gambling money. Most corporations arn't going to gamble.


Depends on you needing CUDA or not. If you don’t, you can use anything.

It was this same game with x86 and ARM is eroding the former king’s place in the datacenter.


Leadership lacking vision + being almost bankrupt until relatively recently.


MIVisionX is probably the library you want for computer vision. As for kernels, you would generally write HIP, which is very similar to CUDA. To my knowledge, there's no equivalent to cupy for writing kernels in Python.

For what it's worth, your post has cemented my decision to submit a few conference talks. I've felt too busy writing code to go out and speak, but I really should make time.



Oh cool! It appears that I've already packaged cupy's required dependencies for AMD GPU support in the Debian 13 'main' and Ubuntu 24.04 'universe' repos. I also extended the enabled architectures to cover all discrete AMD GPUs from Vega onwards (aside from MI300, ironically). It might be nice to get python3-cupy-rocm added to Debian 13 if this is a library that people find useful.


HIP isn't similar to CUDA, in the set of available languages that target PTX, existing library ecosystem, IDE plugins and graphical debuggers.

This is the kind of stuff AMD keeps missing out, even OneAPI from Intel looks better in that regard.


If you are looking for attention from an evangelist, I'm sorry but you are not the target customer for MI300. They are courting the Hyperscalers for heavy duty production inference workloads.


I also stopped by their booth and talked about trial access, and right away asked for easy access a la Google Collab, specifically without bureaucracy. And they are like "yeah, we are making it, but nah man, you can't just login and use it, you gotta fill a form and wait for us to approve it". Was very disappointed at that point.

That was a marketing guy BTW. I don't think they realize their marketing strategies suck.


Completely agree. It's been 18 years since Nvidia released CUDA. AMD has had a long time to figure this out so I'm amazed at how they continue to fumble this.


10 years ago AMD was selling its own headquarters so that it could stave off bankruptcy for another few weeks (https://arstechnica.com/information-technology/2013/03/amd-s...).

AMD's software investments have begun in earnest a few years ago, but AMD really did progress more than pretty much everyone else aside from NVidia IMO.

AMD further made a few bad decisions where they "split the bet", relying upon Microsoft and others to push software forward. (I did like C++ Amp for what its worth). The underpinnings of C++Amp led to Boltzmann which led to ROCm, which then needed to be ported away from C++Amp and into CUDA-like Hip.

So its a bit of a misstep there for sure. But its not like AMD has been dilly dallying. And for what its worth, I would have personally preferred C++ Amp (a C++11 standardized way to represent GPU functions as []-lambdas rather than CUDA-specific <<<extensions>>>). Obviously everyone else disagrees with me but there's some elegance to parallel_for_each([](param1, param2){magically a GPU function executing in parallel}), where the compiler figures out the details of how to get param1 and param2 from CPU RAM into GPU (or you use GPU-specific allocators to make param1/param2 in the GPU codespace already to bypass the automagic).


Nowadays you can write regular C++ in CUDA if you so wish, and contrary to AMD, NVidia employs several WG21 contributors.


CUDA of 18 years ago is very different to CUDA of today.

Back then AMD/ATI were actually at the forefront on the GPGPU side - things like the early brook language and CTM lead pretty quickly into things like OpenCL. Lots of work went on using the xbox360 gpu in real games for GPGPU tasks.

But CUDA steadily improved iteratively, and AMD kinda just... stopped developing their equivalents? Considering a good part of that time they were near bankruptcy it might have not have been surprising though.

But saying Nvidia solely kicked off everything with CUDA is rather a-historical.


AMD kinda just... stopped developing their equivalents?

I wasn't so much that they stopped developing, rather they kept throwing everything out and coming out with new and non backwards compatible replacements. I knew people working in the GPU Compute field back in those days who were trying to support both AMD/ATI and NVidia. While their CUDA code just worked from release to release and every new release of CUDA just got better and better, AMD kept coming up with new breaking APIs and forcing rewrite and rewrite until they just gave up and dropped AMD.


> CUDA of 18 years ago is very different to CUDA of today.

I've been writing CUDA since 2008 and it doesn't seem that different to me. They even still use some of the same graphics in the user guide.


Yep! I used BrookGPU for my GPGPU master thesis, before CUDA was a thing. AMD lacked followthrough on yhe software side as you said, but a big factor was also NV handing out GPUs to researchers.


10 years ago they were basically broke and bet the farm on Zen. That bet paid off. I doubt a bet on CUDA would have paid off in time to save the company. They definitely didn't have the resources to split that bet.


It's not like the specific push for AI on GPUs came out of nowhere either, Nvidia first shipped cuDNN in 2014.


Did you talk to anyone from Intel? It seems they were also present: https://community.intel.com/t5/Blogs/Tech-Innovation/Artific...


Well if Mojo and Modular Max Platform take off I guess there will be a path for AMD


Well,

"Modular to bring NVIDIA Accelerated Computing to the MAX Platform"

https://www.modular.com/blog/modular-partners-with-nvidia-to...


The whole point of Max is that you can compile same code to multiple targets without manually optimizing for a given target. They are obviously going to support NVIDIA as a target.


Yet you haven't seen any AMD or Intel deal from them.


Cause they start with the target with largest user base?


99%+ of people aren't writing kernels man, this doesn't mean anything, this is just silly




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: