Super impressive, and awesome to see that you were able to use Framework Laptop hinges. Let me know if you need more. We have a ton of remaining 3.3kg ones!
Hey Nirav, super super honored that you saw this! I've always looked up to you guys for inspiration and guidance. Thank you for the offer! Although I probably won't be mass-producing open-source laptops like you (i have a framework 16!), I would love to meet you. Would that be possible?
This is the best of the internet. Connection based on interest, appreciation, and mutual respect facilitated with a high degree of good faith. Hope you folks connect fruitfully, and also appreciate that you kept some of the "sent an email" and "thanks" public. Getting to see that this happened has given me a real boost.
We’ve taken to recommending Rufus in our setup guides rather than the Windows installation media tool due to the lack of recent Wi-Fi drivers in vanilla Windows.
In this case it’s because Dylan Patel of Semianalysis interviews Lisa Su regularly and presumably has a direct line to her, and because Lisa and the rest of AMD leadership are absolutely reading the article. It’s unclear if Pat would have (e.g. I don’t think Pat ever sat down for a chat with Semianalysis like Lisa has).
> Give AMD Engineers more compute and engineering resources to fix and improve the AMD ecosystem, they have very few internal gpu boxes relative to what Nvidia provides to their engineers.
This is real. We’ve found ourselves having to give hardware to engineers at AMD because they’re unable to get allocation of it internally.
This is baffling. I’m sure there are many technical reasons I don’t grok that AMD’s job is challenging, but it’s wild that they are dropping the ball on such obvious stuff as this.
The prize is trillions of dollars, and they can print hundreds of millions if they can convince the market that they are closing the gap.
It’s embarrassing that whoever actually tries to use their product hits these crass bugs (same with geohot who was really invested in making AMD’s cards work; I think he just ran their demo script in a loop and produced crashes).
It seems they really don’t understand/value the developer flywheel.
There's a recent interview with Lisa Su where she basically says she's never been interested in software because hardware is harder, she doesn't believe AMD has any problems in the software department anyway and AMD is doing great in AI. So make of that what you will. Suffice it to say, clearly the AMD board doesn't care either because otherwise they'd replace her.
Sadly common at hardware companies. The most extreme case I've heard of is ASML, who supposedly doesn't keep any machines of their own. They test against "almost-ready" machines right before they go out the door to customers.
Many for last gen process nodes, and from a second or third hand supplier if you could even find one. ASML makes very few fully working machines each year, and the cost and throughput those machines have is astronomical.
They have spare parts you'd bet, and I'd bet they have some SLA agreement with each customer where an engineer is basically on call nearby in case a single thing dosnt work or a random part breaks or needs servicing.
Asianometry did a great video on the cost of downtime when it comes to ASML device in any fab. While I am not directly in this field and can't speak to the accuracy of the numbers john gives, he does not seem one to just make stuff up as his quality of video production for niche topics is quite good.
Almost a decade ago KFAB had a fire, power was cut, everything in process was dumped, they planned to restart but ended up being was cheaper to close the whole facility .
You joke, but it is almost a genuine investment opportunity here for a large player.
Spend a billion on AMD shares, Spend another Billion on a out-of-house software team to solve the software solution to more than double the share price.
Taking into account that there are players that already own billions in AMD shares, they could probably do that as well. On the other hand perhaps it would be better for them, as major shareholders, to have a word with AMD management.
I don't have the inside baseball but I have seen those weird as hell interviews with Lisa Su where she gets asked point blank about the software problems and instead of "working on it, stay tuned" -- an answer that costs nothing to give -- she deflects into "performance is what matters," which is the kind of denial that rhymes exactly with the problems they are having. No, the horsepower of your F1 racecar doesn't matter if the engine doesn't start and there's a wheel missing! You need to fix those problems before the horsepower can matter! Please tell me you are fixing the starter and the wheel!
Hopefully I am reading too much into this. Hopefully she doesn't have any weird hangups over investing in software and it all just takes time to Do It Right after GPGPU got starved in the AMD winter. But if it is a weird hangup then yeah, 100%, ownership needs to get management in line because whiffing a matmul benchmark years into a world where matmul is worth trillions just ain't it.
> she deflects into "performance is what matters," which is the kind of denial that rhymes exactly with the problems they are having.
It's not a deflection, but a straightforward description of AMDs current top-down market strategy of partnering with big players instead of doubling down to have a great OOBE for consumers & others who don't order GPUs by the pallet. It's an honest reflection if their current core competencies, and the opportunity presented by Nvidia's margins.
They are going for a bang-for-buck right now aiming at data center workloads, and the hyperscalers care a lot about perf/$ than raw performance at. Hyperscalers are also more self-sufficient at software: they have entire teams working on PyTorch, Jax, and writing kernels.
Engineers at hyperscalers are struggling through all the bugs too. It's coming at notable opportunity cost for them, at a time when they also want an end to the monopoly. Do they buy AMD and wade through bug after bug, regression after regression, or do they shell out slightly more money for Nvidia GPUs and have it "just work".
AMD has to get on top of their software quality issues if they're ever going to succeed in this segment, or they need to be producing chips so much faster than Nvidia that it's worth the extra time investment and pain.
It's in the article. Meta don't use AMD for training and write their own kernels for inference. You can't train with AMD, full stop, because their software stack is so buggy.
The same article also states that AMD provided custom bug-fixes written by Principle Engineers to address bugs in a benchmark - this is software that will only become part of the public release in 2 quarters. I ask again, do you think AMD will not expedite non-public bug-fixes for hyperscalers?
> You can't train with AMD, full stop, because their software stack is so buggy.
Point 7 from the article:
>> The MI300X has a lower total cost of ownership (TCO) compared to the H100/H200, but training performance per TCO is worse on the MI300X on public stable releases of AMD software. This changes if one uses custom development builds of AMD software.
AI labs don't want to train models using a stack build some guy hacked up on his desktop last night that's been through no proper QA process. The cost of a training run that fails or results in a garbage model due to numerical errors are huge.
Which is why, as they say clearly, nobody is training models on AMD. Only inference, at most. I'm not sure why you keep claiming they are training using private drivers. They clearly aren't.
Now I i see how where talking past each other. I 100% agree that none of if the hyperscalers are currently (publicly) training on AMD silicon. I disagree with forward-looking statements like it "can't" happen, because I can guarantee you several of them are actively working on making it possible to train on AMD chips - that's just too juicy a target for a bonus packets all the way up to directors: "Our team saved the org $x0 million in TCO vy enabling training on MI300/MI400X in our new clusters"
That's the excuse used by every big company shitting out software so broken that it needs intensive professional babysitting.
I've been on both sides of this shitshow, I've even said those lines before! But I've also been in the trenches making the broken shit work and I know that it's fundamentally an excuse. There's a reason why people pay 80% margin to Nvidia and there's a reason why AMD is worth less than the rounding error when people call NVDA a 3 trillion dollar company.
It's not because people can't read a spec sheet, it's because people want their expensive engineers training models not changing diapers on incontinent equipment.
I hope AMD pulls through but denial is _not_ the move.
What exactly are they in denial about? They are aware that software is not a strength of theirs, so they partner with those who are great at it.
Would you say AMD is "shitting the bed" by not building it's own consoles too? You know AMD could build a kick-ass console since they are doing the heavy-lifting for the Playstation, and the XBox[1] , but AMD knows as much as anybody that they don't have the skills to wrangle studio relationships or figure out which games to finance. Instead, they lean hard in their HW skills and get Sony Entertainment/the Xbox division do what they do best.
1.and the Steam Deck, plus half a dozen Deck clones.
There is probably one employee - either a direct report of Su's or maybe one of her grandchildren in the org chart - who needs to "get it". If they replaced that one manager with someone who sees graphics cards as a tool to accelerate linear algebra then AMD would be participating more effectively in a multi-trillion dollar market. They are so breathtakingly close to the minimum standards of competence on this one. We know from the specs that the cards they produce should be able to perform.
This is a case-specific example of failure, it doesn't generalise very well to other markets. AMD is really well positioned for this very specific opportunity of historic proportions and the only thing holding them back is a somewhat continuous stream of unforced failures when writing a high quality compute driver. It seems to be pretty close to one single team of people holding the company back although organisational issues tend to stem from a level or two higher than the team. This could be the most visible case of value destruction by a public company we'll see in our lifetimes.
Optimistically speaking maybe they've already found and sacked the individual responsible and we're just waiting for improvement. I'm buying Nvidia until that proves to be so.
All the games that matter are Windows games running via Proton, as Valve has failed to actually build a GNU/Linux native games ecosystem, in spite of UNIX/POSIX underpinnings of Android NDK, PlayStation, the studios hardly bother.
The day Microsoft actually decides to challenge Proton, or do a netbooks move on handhelds with XBox OS/Windows, the SteamDeck will lose, just like the netboooks did.
Additionally, it is anyone's guess what will happen to Valve when Gabe steps down.
They are literally fostering Linux games by selling and endorsing a platform where those games would be native as well as having native releases of their own games.
They aren't gonna force any third-party devs to do the same, but they're showing that there is a market while also growing it.
>Hyperscalers are also more self-sufficient at software: they have entire teams working on PyTorch, Jax, and writing kernels.
None of this matters because AMD drivers are broken. No one is asking AMD to write a PyTorch backend. The idea that AMD will have twice the silicon performance than nvidia to make up the performance loss for bad software is a pipedream.
> None of this matters because AMD drivers are broken
How do you know that the problems arise from broken drivers rather than broken hardware? Real world GPU drivers are full of workarounds for hardware bugs.
It does seem like a good idea. If the obvious major Nvidia customers see it as too risky (or we could speculate about other possible reasons why it has not happened yet), maybe some hedge fund who is struggling with where to put their cash could initiate and fund the project.
I enjoy this train of thought a lot. Capturing shareholder value by just creating it yourself. The destruction of a moat I fear is worth less than the existence of a moat a competitor has successfully built. So don’t forget to buying puts Nvidia.
Honestly, probably NVIDIA itself, since they contribute significantly to many open-source projects (MLIR), and also make their SoTA GEMM/Conv implementations open-source and available for study (Cutlass).
*> also make their SoTA GEMM/Conv implementations open-source and available for study (Cutlass)"
Cutlass is a fine piece of engineering, but it is not quite as good as their closed source libraries in real world workloads. There is secret sauce that is not open sourced.
I was surprised to hear recently that the same happens at NVIDIA! Hopefully less frequently, but I can understand why it's hard to keep many units on hand given the level of external demand.
This is designed to fit the Expansion Bay Shell, but not the Graphics Module. To fit the dGPU, we had to make the mounting scheme different within the Graphics Module.
We expect most Framework Laptop 16 users won't need more than the two M.2 storage slots that are on the Mainboard itself, so we don't want to end up with a wasted module.
We have that on the most recent generation of Framework Laptop. When the hardware privacy switch is engaged, the image sensor is electrically powered off and the camera controller feeds a dummy frame with an illustration of the switch.
And adding 2+2, the man being interviewed (Nirav Patel) is the same man who replied to my comment (HN user nrp), i.e. the man who actually did the overengineering.
If you rewind to 17:03, he talks about the changes of what the switch does (previously: USB disconnection, now: as he described in grandparent comment).