An H100 has a TDP of 700 watts (for the SXM5 version). With a die size of 814 mm^2 that's 0.86 W/mm^2. If the cerebras chip has the same power density, that means a cerebras TDP of 37.8 kW.
That's a lot. Let's say you cover the whole die area of the chip with water 1 cm deep. How long would it take to boil the water starting from room temperature (20 degrees C)?
amount of water = (die area of 46225 mm^2) * (1 cm deep) * (density of water) = 462 grams
energy needed = (specific heat of water) * (80 kelvin difference) * (462 grams) = 154 kJ
time = 154 kJ / 39.8 kW = 3.9 seconds
This thing will boil (!) a centimeter of water in 4 seconds. A typical consumer water cooler radiator would reduce the temperature of the coolant water by only 10-15 C relative to ambient, and wouldn't like it (I presume) if you pass in boiling water. To use water cooling you'd need some extreme flow rate and a big rack of radiators, right? I don't really know. I'm not even sure if that would work. How do you cool a chip at this power density?
The enthalpy of vaporization of water (at standard pressure) is listed by Wikipedia[1] as 2.257 kJ/g, so boiling 462 grams would require an additional 1.04 MJ, adding 26 seconds. Cerebras claims a "peak sustained system power of 23kW" for the CS-3 16 Rack Unit system[2], so clearly the power density is lower than for an H100.
On a tangent: has anyone built an active cooling system which operates in a partial vacuum? At half atmospheric pressure, water boils at around 80 C, which i believe is roughly the operating temperature for a hard-working chip. You could pump water onto the chip, have it vapourise, taking away all that heat, then take the vapour away and condense it at the fan end.
This is how heat pipes work, i believe, but heat pipes aren't pumped, they rely entirely on heat-driven flow. I would have thought there were pumped heat pipes. Are they called something else?
It's also not a refrigerator, because those use a pump to pressurise the coolant in its gas phase, whereas here you would only be pumping the water.
No need to bother with a partial vacuum when ethanol boils at around 80 C as well and doesn't destroy electronics. I'm not aware of any active cooling systems utilizing this though.
I could argue that ethanol has 1/3 the latent heat of vaporization of water, and would boil off 3 times quicker. However, what ultimately matters is the rate of heat transfer, so my nitpick may be irrelevant.
> This is how heat pipes work, i believe, but heat pipes aren't pumped, they rely entirely on heat-driven flow. I would have thought there were pumped heat pipes.
Do you have a particular benefit in mind that a pump would help with?
The machine that actually holds one of their wafers is almost as impressive as the chip itself. Tons of water cooling channels and other interesting hardware for cooling.
If you let the chip actual boil enough water to run a turbine you're going to have a hard time keeping the magic smoke inside. Much better to run at reasonable temps and try to recover energy from the waste heat.
That's basically the principle of binary cycle[1] generators. However for data center waste heat recovery, I'd think you'd want to use a more stable fluid for cooling, and then pump it to a separate closed-loop binary-cycle generator. No reason to make your datacenter cooling system also deal with high pressure fluids, and moving high pressure working fluid from 1000s of chips to a turbine of sufficient size, etc.
There's a bunch of places in Europe that use waste heat from datacenters in district heating systems. Same thing with waste heat from various industrial processes. It's relatively common practice.
If my very stale physics is accurate then even with perfect thermodynamic efficiency you would only recover about a third of the energy that you put into the chips.
I'm an American living in Germany. When I first arrived, the way Germans write the digit 1 surprised me. They write it with the upper hook thing very long, almost like a capital lambda (Λ), which sometimes makes 1 and A visually ambiguous. This isn't really a problem, just something funny about moving to a new country.
I use 1 with a long hook except when I write binary numbers where I use just a | for 1.
I have some other context dependent characters/letters.
I write small z like that in normal writing, but as a mathematical variable I write it as ƶ. (To disambiguate from 2.)
I write small t like † in normal writing, but as a mathematical variable I write it as t. (To disambiguate from + (plus).)
I write q like that in normal writing, but as a mathematical variable I write it with a stroke, which does not display on the iPhone, a ꝗ, a bit similar to a ɋ. (To disambiguate from a (ɑ).)
It’s all about disambiguation, and sometimes having different letter shapes for isolated characters.
Possibly not a typo. It's a long running joke based on the NFLs extremely aggressive defense of their trademark by threatening anyone using the phrase Super Bowl to advertise anything. That's why euphemisms like "The Big Game" show up a lot.
I clicked their link to Knative so I could read about it. On the Knative webpage, the cookie banner pops up. I don't want to accept so I click "learn more". That expands the cookie banner to include a button that says "I understand how to opt out, hide this notice" as well as a link to a lengthy explanation of cookies. Well I don't want to click the button so I click the link to the cookie explanation. At the bottom of that page are more links to browser documentation that might eventually explain how to opt out, but I can't click those links because everything on the page is disabled - because the same cookie popup is on this page too. It blocks interaction, including clicking on their links about how to opt out, until you opt in. This stuff is getting worse.
There's a recent Tom Scott video about an office inside an elevator. It has a desk and chairs, the whole thing. It's on the corner of the building too. It seems, from his video, that nobody is really sure why it was built and what it was used for.
Well it is clear from the video what is was used for: nothing. The boss who had it built because of complex WWII history never actually got to use it. What the vision for it was though is unclear.
My question here is about underlying fab capacity. This chip is made on TSMC 4N, along with the H100 and 40xx series consumer GPUs. I assume Nvidia has purchased their entire production capacity. I also assume that Nvidia is using that capacity to produce the products with the highest margins, which probably means the H100 and this new GH200. So when they release this new chip, does it mean effectively fewer H100s and 4090s? Or is that not how fabrication capacity works?
I'm asking because whenever I look at ML training in the cloud, I never see any availability - either for this architecture or the A100s. AWS and GCP have quotas set to 0, lambda labs is usually sold out, paperspace has no capacity, etc. What we need isn't faster or bigger GPUs, it's _more_ GPUs.
It sounds to me like the GH200 achieves more FLOPS per transistor. So, compute demand will be quicker satisfied via the GH200 than via "smaller" chips such as the H100.
Having said that, I don’t think we’re anywhere near some kind of equilibrium for AI compute. If chip supply would magically double tomorrow, then the large companies would buy it for their datacenters and have 100% utilization in a few weeks. They all want to train larger models and scale inference to more users.
In addition to training larger models, I'm sure there are many use cases that AI could serve that are currently cost prohibitive due to the cost of running inference.
I'd like bigger GPUs. A trillion parameter model at 16 bits needs 2000gb+ for inference, more for training. All kinds of things can be done to spread it across multiple GPUs, downsize to less bits etc, but it's a lot easier to just shove a model on one GPU.
We'll likely see more efficiency from bigger GPUs and hopefully more availability as a result.
My question on the very slow growth of available memory: are there technical reasons they cannot trivially build a card with 100GB of RAM (even with lower performance) or has it been a business decision to milk the market for every penny?
High speed I/O pins cost a lot, and GDDR generally has 32 data pins per chip and no way to attach multiple chips to the same pins. So 256 bits and 16GB is hard to exceed by much on that tech. The high end is 384 bits and 24GB.
There is a mode to attach 16 data pins to each GDDR chip, so with some extra effort you could probably double that to 48GB. Or at least 32GB. Maybe this is a valid niche, or maybe there isn't enough demand.
The alternative to this is HBM, which can stack up big amounts, but it's a lot more expensive.
I don't disagree with Dylan, but I'm more than willing to bet that the only reason Nvidia's cards (and that's who we're talking about. CUDA is a hell of a moat.) are RAM-starved is that they haven't felt the pressure to do otherwise. AMD has an institutional aversion towards good software. Intel isn't even an also-ran, yet.
Apple and their unified memory architecture may be the prod needed to get larger levels of RAM available to single cards solutions. We'll see.
Fabs can run multiple complex designs on the same line simultaneously by sharing common tools. For example, photolithography tools can have their reticles swapped out automatically. Obviously, there is a cost to the context switching and most designs cannot be run on the same line as others.
Ultimately, the smallest unit of fabrication capacity is probably best measured along grain of the lot/FOUP (<100 wafers).
The basic of Supply Chain and Supply and Demand, as you should have all witness during COVID for toilet rolls are the same.
Fab capacity is not that different to any other manufacturing. You just need to book those capacity way ahead of time. ( 6 - 9 months ) And that is also why I said 99% of news, or rumours about TSMC Capacity are pure BS.
So to answer your question. Yes, Nvidia will likely go for the higher margin products. One of the reason why you see Nvidia working with Samsung and Intel.
It's my understanding from friends in the business that the actual chips do not represent any capacity issue or bottleneck, it's actually manufacturing the devices that the chips are in (e.g. the finished graphics card).
Why would this be the case? I would naively think that since the chips can only be made in a fab and the rest can be made basically anywhere that that wouldn't be true.
They can not be made "anywhere"; when you can't get that PMIC from the original manufacturer, good luck getting it from someone else. And replacing an IC in a QA tested, EMV verified, FCC and CE etc. certified device will often trigger you redoing all that, possibly requiring additional iterations. If there is a similar part available at all.
Take a look at a recent GPU and count the auxiliary components. All of them can cause supply chain difficulties.
For example my corpo hit manufacturing issues (production capacity) with flash memory, with clock oscillators, with auxiliary fpga. But main chips production was fine all the time during chip crisis as far as I know. So yeah, small critical components totally can be a blocker. Some specific voltage controller is unavailable and suddenly your whole design is paralyzed.
I think that's it. PCB itself is rather trivial, it's the RAM, but also things like switching regulators (there are others, but then it's a redesign), maybe even stuff like connectors (which don't burn....).
For a science project, we need to manufacture magnets. It's not easy to find a company who has the right iron right now, and it's hard to get, with long lead times. The supply crisis is real.
You know I was wondering this the other day when NVDA's insane run up happened. I went down the road of trying to figure out if there was even enough silicon wafers, or if there even would be enough wafers in the next five years, to justify that price.
Unless all the planet does is make silicon wafers; no.
Well you figured wrong - NVDA AI GPUs are a very small % of global foundry supply, if even if volume tripled, they will still be a small % of global foundry supply. NVDA’s revenue is high because their gross margins are extreme, not because their volume is high.
Can you go into more detail? So you're saying that at a 200 P/E ratio NVDA there isn't even enough wafer supply for NVDA to grow into that valuation even over 5 years?
I mean, you've got the gist of it. I pulled some reports on silicon production, silicon waver prices and price trends, current fab capacity etc..
My back of the napkin basically suggested that silicon production would need to 4x and fab capacity 4x (neither of which are happening) and NVDA with would have to capture all of that to justify their current price. I didn't bother writing it up, just looked at it mostly because I was on the wrong side of that play. It's something worth considering for sure.
Wouldn't NVDA just focus more on high margin datacenter products in order to grow into those higher earnings but with the the wafer limitation? Datacenter focused products are already starting to surpass gaming which is their second largest revenue source: http://www.nextplatform.com/wp-content/uploads/2022/05/nvidi...
It seems to me that yes while a 200 P/E may be high, they certainly could keep increasing the prices on the already high margin datacenter products, of which get quickly gobbled up by companies no matter what price they are because of the immense demand.
We're probably ~3 years out from all of those fabs gov'ts funded coming online, right?
(n.b. that's really good work on your end and I agree with your conclusion, just idly musing about the thing that bugs me, what the heck all these non-leading edge fabs are going to do)
See also the International Obfuscated C Code Contest. [0]
This program [1], for example. It just accepts some input on stdin and returns the same input, but mirrored along the diagonal. So the first row of the input becomes the first column of the output, second row becomes second column, etc. But the program is functionally invariant when given itself as input. In other words, you can flip the source code of the program along its diagonal and the result is a program which has the same functionality - it flips stdin.
Or this one [2], which parodies java in C. It's functional C code that looks like Java, including the
class LameJavaApp
{
/** The infamous Long-Winded Signature From Hell. */
public static void main(String[] args)
...
Or this one [3] that calculates pi by estimating its own surface area.
Or this one [4]. It's a lovers quarrel, written simultaneously in C and English. It's incredible, seriously, read it.
He shoots the Prince Rupert's drop with a bullet. The bullet wins. In high speed, you can see the bullet splat against the glass and break into shards, long before the glass breaks. Really neat.
I've read about this. Studying the psychology of group dynamics and conflict in a confined environment was a part of the experiment. Sure enough, the crew of 8 quickly split into two factions. It still seems like a success to me though, because even though some of them were barely on speaking terms despite being friends before, they continued to work together.
> I don't like some of them, but we were a hell of a team. That was the nature of the factionalism... but despite that, we ran the damn thing and we cooperated totally.
An H100 has a TDP of 700 watts (for the SXM5 version). With a die size of 814 mm^2 that's 0.86 W/mm^2. If the cerebras chip has the same power density, that means a cerebras TDP of 37.8 kW.
That's a lot. Let's say you cover the whole die area of the chip with water 1 cm deep. How long would it take to boil the water starting from room temperature (20 degrees C)?
amount of water = (die area of 46225 mm^2) * (1 cm deep) * (density of water) = 462 grams
energy needed = (specific heat of water) * (80 kelvin difference) * (462 grams) = 154 kJ
time = 154 kJ / 39.8 kW = 3.9 seconds
This thing will boil (!) a centimeter of water in 4 seconds. A typical consumer water cooler radiator would reduce the temperature of the coolant water by only 10-15 C relative to ambient, and wouldn't like it (I presume) if you pass in boiling water. To use water cooling you'd need some extreme flow rate and a big rack of radiators, right? I don't really know. I'm not even sure if that would work. How do you cool a chip at this power density?