Hacker News new | past | comments | ask | show | jobs | submit login
AMD Instinct MI325X to Feature 256GB HBM3E Memory, CDNA4-Based MI355X with 288GB (videocardz.com)
40 points by kristianp 42 days ago | hide | past | favorite | 18 comments



The part here I find funny is that AMD's market cap currently sits at @ 2x Intel's so the analysts of the world obviously think they're sitting on something. And yet they are still in headlines as the market underdog.

For anyone who doesn't follow AMD at all (good move, their consumer support for compute leaves scars) they appear to have a strategy of targeting the server market in hopes of scooping out the high-profit part of the GPGPU world. Hopefully that does well for them, but based on my years of regret at being an AMD customer watching the AI revolution zoom by, I'd be hesitant about that translating to good compute experiences on consumer hardware. I assume the situation is much improved from what I was used to, but I don't trust them to see supporting small users as a priority.


Similar experience here. We got burnt betting on ROCm being released in consumer GPUs a few years ago, but it never happened. I think you have to win the consumer market to get the Enterprise market, not the other way around.


And yet Meta is using MI300X exclusively for all live inference on Llama 405B.

Clearly there are workloads AMD wins at, and just going Nvidia by default for everything without considering AMD is suboptimal.


The difference is that Meta and the FAANG companies make hundreds of billions of dollars in annual revenue, and are capable of hiring top talent to solve this problem of their AI running well on any GPU they choose for their data center.

Consumers, open-source solutions and smaller companies unfortunately can't afford this, so they would be dependent fully on AMD and other providers to solve this implementation gap; so ironically smaller companies may prefer to use Nvidia just so they don't have to worry about odd GPU driver issues for their workloads.


But Meta is the main company behind Pytorch development. If they make it work and upstream it, this will cascade to all Pytorch users.

We don't have to imagine far, it's slowly happening. Pytorch for ROCm is getting better and better!

Then they will have to fix the split between data-center and consumer GPU for sure. From what I understand, this is on the roadmap with the convergence of both GPU lines on the UDNA architecture.


If Meta/FAANG can make it work for them, it's not unreasonable to assume those improvements will trickle down to consumers/smaller companies.


When it comes to comparing market cap, the more apt business relevant to AMD's MI line is Nvidia's data center division, and investors are probably rightly assessing that AMD will not dent Nvidia's market position any time soon. That said, AMD's data center GPU is growing at an extremely healthy pace and enjoys high profit margins, so they have proven their ability to execute in this space to a degree and as a business it shows a promising future.

When looking at the market cap, there are three main pillars of valuation - revenue growth, profit growth, and net income. If all three are growing, you are an industry darling. If two are growing, you are still likely to be valued highly. If you have only one, you are much riskier. If you have none, it's a red flag.

As of the latest earnings report, AMD profit, revenue and net income are all increasing. Intel, they are all decreasing. If analysts assume trends hold, AMD can grow into its valuation and Intel is currently heading towards being worth nothing unless they change their business. Simply put, a business that is losing all three of revenue, profit margin, and net income is simply headed on the wrong path for investors, and will be punished in an outsized way when it comes to predicting it's future value (ie, market cap).


You know AMD primarily sells CPUs right?

For datacenter GPUs, they're going from ~500M-750M in 2023 full year (can't find proper numbers), to 4.5B+ full year 2024. In GPUs, it's almost like they're entering a new market.

The current Instinct line of products is relatively new too, I found this article [1] on the MI100 launch on Nov, 2020. That's basically start of 2021.

To go from MI100 in 2021, to 4.5B+ of MI300X + MI250X in 2024 is great. They are doing just fine.

On MI355X, I can't find endnotes for the slides they show, but it is not clear if the 9.2PF of FP6 and FP4 is sparse or not (all the other numbers on that slide were non-sparse). If it isn't they're exceeding GB200's sparse FP6/4 numbers with non-sparse flops (!). They both have the same memory bandwidth though. AMD is doing just fine.

[1] https://www.servethehome.com/amd-radeon-instinct-mi100-32gb-...


MI100 is hardly the first AMD datacenter GPU. The first Instinct branded card, MI25, is from 2017 [1]. But ATI/AMD had FirePro/FireStream branded GPGPU cards going back to the mid 2000s [2,3]. They just never caught on because AMD's software, support and marketing was not competitive with Nvidia's.

[1] https://www.techpowerup.com/gpu-specs/radeon-instinct-mi25.c...

[2] https://en.wikipedia.org/wiki/AMD_FireStream

[3] https://en.wikipedia.org/wiki/AMD_FirePro


Those are Vega, not CDNA. It wouldn't surprise me if those are rebranded consumer chips, though I haven't checked.


MI25 is a Vega 10 (same as Vega 56/64) and the MI50 is Vega 20 (same as Radeon VII). CDNA is just a Vega variant too, you'll see that MI25/Vega 56/Vega 64 are all gfx9* generation hardware just like CDNA1 and CDNA2 [1], while later RDNA cards are gfx10* and gfx11*.

But, what difference does it make? Nvidia also shipped the same _architecture_ for their datacenter and consumer cards for quite a few generations back then (e.g. Pascal), though typically not the same die. Whether they reuse the same architecture or not, they had a product that they marketed as enterprise/datacenter cards. The buyers don't care if it's a rebranded consumer card or not as long as it works well - see the Nvidia L40S (uses AD102 - same as RTX4090 [2]) which is very popular in inference.

Not to mention, with GCN, AMD made an explicit bet on unifying their architecture for compute & graphics. They bet on being able to supply both the consumer and datacenter markets with the same silicon by coming up with graphics hardware that was quite compute-heavy (hence why AMD consumer cards were stronger against their Nvidia counterparts until the Ampere generation or so).

[1] https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html

[2] https://www.techpowerup.com/gpu-specs/l40s.c4173


I don't think having a common ancestry for the ISA means much, or even having the same ISA.

Anyway, I don't understand what you want from me or are arguing about. They were trying to win the datacenter CPU market and not the GPU market. They did well at that. They've recently started trying to win the GPU market as well, cause now they can afford to. They seem to be doing well now.


I'm saying that the "not trying to win the datacenter GPU market" bit is not quite correct, as they had a lot of products trying to address that market. Agree that their offerings today are a marked improvement over the previous ones though.


Wendell over at level one techs seems to think that AMD cards are more popular in pro applications.

https://youtu.be/aKV0FiuVJ0E?t=147


Everyone knows that AMD primarily sells CPUs. That is why all the interest is over with Nvidia and a contributing factor to why I don't own an AMD graphics card any more.


This isn't so directly related, and I know that performance figures are highly workload dependent and always under the headline figures but I want to take a moment to point out the multi-petaflop figures. Yes they're not full precision, but still. How long ago would this have felt like an outrageous supercomputer?

Quick thing to show the sheer scale of these figures. This is 10^15 operations per second, and if you sit a foot from your screen that takes light about a nanosecond to reach you. That means that from the light leaving your screen to it hitting your eyeballs these things can have done another million calculations.

I know this isn't particularly constructive, but I'm hit with waves of nostalgia and older performance figures seeing this.


The price is good compared to Nvidia H100 or B100 at around $15k.


Supposedly last in the line of CDNA? AMD said they are switching to a new Unified DNA/UDNA in the future, merging both Radeon/consumer and Compute/data-center. https://www.tomshardware.com/pc-components/cpus/amd-announce...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: