Hacker News new | past | comments | ask | show | jobs | submit login
Apple M1 Ultra (apple.com)
1168 points by davidbarker on March 8, 2022 | hide | past | favorite | 828 comments



I think the GPU claims are interesting. According to the graph's footer, the M1 Ultra was compared to an RTX 3090. If the performance/wattage claims are correct, I'm wondering if the Mac Studio could become an "affordable" personal machine learning workstation (which also won't make the electricity bill skyrocket).

If Pytorch becomes stable and easy to use on Apple Silicon [0][1], it could be an appealing choice.

[0]: https://github.com/pytorch/pytorch/issues/47702#issuecomment... [1]: https://nod.ai/pytorch-m1-max-gpu/


The GPU claims on the M1 Pro & Max were, let's say, cherry picked to put it nicely. The M1 Ultra claims already look suspicious since the GPU graph tops out at ~120W & the CPU graph tops out at ~60W yet the M1 Studio is rated for 370W continuous power draw.

Since you mention ML specifically, looking at some benchmarks out there (like https://tlkh.dev/benchmarking-the-apple-m1-max#heading-gpu & https://wandb.ai/tcapelle/apple_m1_pro/reports/Deep-Learning... ), even if the M1 Ultra is 2x the performance of the M1 Max (so perfect scaling), it would still be far behind the 3090. Like completely different ballpark behind. But of course there is that price & power gap, but the primary strength of the M1 GPUs seems to really be from the essentially very large VRAM amount. So if your working set doesn't fit in an RTX GPU of your desired budget, then the M1 is a good option. If, however, you're not VRAM limited, then Nvidia still offers far more performance.

Well, assuming you can actually buy any of these, anyway. The M1 Ultra might win "by default" by simply being purchasable at all unlike pretty much every other GPU :/


The 3090 also can do fp16 and the M1 series only supports fp32, so the M1 series of chips basically needs more RAM for the same batch sizes. So it isn't an Oranges to Oranges comparison.

Back when that M1 MAX vs 3090 blog post was released, I ran those same tests on the M1 Pro (16GB), Google Colab Pro, and free GPUs (RTX4000, RTX5000) on the Paperspace Pro plan.

To make a long story short, I don't think buying any M1 chip make senses if your primary purpose is Deep Learning. If you are just learning or playing around with DL, Colab Pro and the M1 Max provide similar performance. But Colab Pro is ~$10/month, and upgrading any laptop to M1-Max is at least $600.

The "free" RTX5000 on Paperspace Pro (~$8 month) is much faster (especially with fp16 and XLA) than M1 Max and Colab Pro, albeit the RTX5000 isn't always available. The free RTX4000 is also a faster than M1 Max, albeit you need to use smaller batch sizes due to 8GB of VRAM.

If you assume that M1-Ultra doubles the performance of M1-Max in similar fashion to how the M1-Max seems to double the gpu performance of the M1-Pro, it still doesn't make sense from a cost perspective. If you are a serious DL practitioner, putting that money towards cloud resources or a 3090 makes a lot more sense than buying the M1-Ultra.


> The 3090 also can do fp16 and the M1 series only supports fp32

Apple Silicon (including base M1) actually has great FP16 support at the hardware level, including conversions. So it is wrong to say it only supports FP32.


I'm not sure if he was talking about the ML engine, the ARM cores, the microcode, the library or the OS. But it does indeed have FP16 in the Arm cores.


FP16 is supported in M1 GPU's and Neural Engines through the CoreML framework. From https://coremltools.readme.io/docs/typed-execution :

> The Core ML runtime dynamically partitions the network graph into sections for the Apple Neural Engine (ANE), GPU, and CPU, and each unit executes its section of the network using its native type to maximize its performance and the model’s overall performance. The GPU and ANE use float 16 precision, and the CPU uses float 32.

Also, this exploration (https://tlkh.dev/benchmarking-the-apple-m1-max#heading-neura...) reports the 5.1-5.3 TFLOPS FP16 ballpark performance.


I should have been more clear. I didn't mean the hardware, but the speedup you get from using mixed precision in something like Tensorflow with an NVIDIA GPU.


Thanks. At least when I ran the benchmarks with Tensorflow, using mixed precision resulted in the CPU being used for training instead of the GPU on the M1 Pro. So if the hardware is there for fp16 and they will implement the software support for DL frameworks, that will be great.


Yes, unfortunately, the software is to blame for the time being, and I also ran into issues myself. :\ Hope they catch up to what the hardware delivers well, including both the GPU and the Neural Engine.


At some point can we finally admit that Apple's GPU claims just aren't....true? Like, every Apple keynote they put up incredible performance claims, and every time people actually get their hands on the product, it doesn't even come close to holding water in any domain where GPU performance matters (Game performance, ML training perf)


> GPU performance matters (Game performance, ML training perf)

No one plays games on a Mac.

And it has nothing to do with GPU performance but rather the fact that the audience simply isn't interested in gaming on it and so there is no economic incentive to target them.

So the GPU performance that matters to Mac users and is relevant to Apple is not games but rather content creation, production etc.


> No one plays games on a Mac.

My wife does (I play games on Linux).

I have friends who own Macs who reluctantly dual boot to Windows just to play some games -- they would completely ditch Windows if they could just play every game on Mac.

I see there are Mac games on Steam.

All of this points to the situation being more nuanced than "no one plays games on Mac".


I interpret these sorts of statements to be short for "a statistically insignificant number of people play games on macos (or linux)" rather than the literal case where as long as a single person does, it is false.

According to the steam hardware survey, Windows is 95%, Macos is 4%, and Linux is 1%. And to dig deeper you'd need to see what games that 5% of the non-Windows is playing - are they simpler games that don't need graphics acceleration (e.g. puzzle games, roguelikes, etc) or ones that do?

My desktop is an Intel NUC running Ubuntu, and yeah I play games on it. Slay the Spire, Spacechem, even some older MMOs like DDO or LoTRO (which run but at 15-20 fps since that system just has Intel Iris). I'm unable to even start many others (e.g. Grim Dawn) due to not having dedicated graphics.

So yeah it's nuanced but lots of games that need a graphics card don't run or even display on that system.

That's why I have a windows gaming system too. I'm realistic, the market just isn't there. I used to have a Mac (dropped it in 2018) and if I still did I'd subscribe to Geforce Now or just do console gaming.


You’re right but i think there’s an immensely valuable lesson here for people who communicate with engineers as part of their job: be mindful of casual precision mistakes. “no one” and “a statistically insignificant number of” are colloquial synonyms to civilians, but they are black and white to engineers, especially software people. They are the difference for example between a safe code path and a edge case bug. In my experience, people who use casual imprecision in technical conversations are sometimes seen as inexperienced or not fully understanding a problem, which may actually not be the case. Learning to speak with precision without being pedantic is an excellent soft skill that can be developed over time.


I understood it was about statistical significance, but my point remains:

Who says Mac users aren't gamers? It's a self perpetuating vicious cycle: gamers use Windows because that's where the majority of games are, so developers keep targeting Windows, so gamers use Windows, and so ad infinitum.

But people who use Macs enjoy games as well. They would rather not dual boot to Windows.

It's not true that Mac users don't play games. Rather, it's that most games are on Windows, which is a shame.


If you want to get a bit more into significant numbers, Apple has the largest number of games for any platform and it has the largest number of gamers. They're mobile games, but on the M1 people can play them on their desktop.

What I think you mean is competitive gamers and AAA gamers do not play on Apple hardware. This is mostly true today ofc, but keep in mind that's actually not the majority of the gaming market. Apple is raking it in from its gaming market.


Grim Dawn is just poorly implemented. I have a Ryzen 3700x with a 2070 super and Grim Dawn runs like crap.


The interesting question would be: out of all Mac users, how many of them play games on their Macs?

I have no idea what the answer is. Personally I have run a number of games on macOS, via Steam, and via Boot Camp and virtualization. Some popular MMOs like Final Fantasy XIV, World of Warcraft, and Eve Online have macOS clients, though Guild Wars 2 discontinued theirs.

Apple Arcade apparently has enough Mac users that Apple has a reason to support it on macOS as well as iOS.

And Apple apparently has some reason to support iOS/iPadOS games on Apple Silicon Macs as well (though it could just be a side effect of a future iOS-macOS merger or hybrid device.)


> The interesting question would be: out of all Mac users, how many of them play games on their Macs?

...and how many would play more games on their Macs if they were available?

I refuse to believe Mac users are less fond of videogames. Based on my personal observation of the Mac users I know, they enjoy games as much as anyone.

At least casual games on iTunes for the iPad have a vast library, with many genuine good games (source: me, my iPad 2 was mainly a gaming platform for me, never found other uses for it).

I do realize more complex games are a different beast. But casual? I'd say Apple fans love them.


M1's blender rendering performance also wasn't very good, though. Assuming the scene was small enough to fit in the smaller VRAM of competitive consumer GPUs anyway.

Video content creation specifically is where it mostly achieves what the graphs indicate, and that's mostly a "well yeah, video decoder ASICs are really efficient. I'll take things we already knew for $100, Alex"


Blender didn’t even run on metal until a few months ago. It’s on beta now and the performance has increased substantially.

If the software is optimized, the graphics hold up fine.


The other elephant in the room is OptiX. Are Apple seriously indicating that they can outperform that with a iGPU?


Unless you specifically configure Blender for GPU rendering, it renders on the CPU(s).


Huh? I know lots of people who play games on a mac.


I thought that would be a given by now.

Won't even get into that it can't run most things you'd want that kind of hardware for.


even their graphs don't look legit. they look like what you see in boxes of vitamins found on ransom websites. Got to wait for independent benchmarks with well explained scenarios. anything can run twice faster than anything with some specific tweaks to the program that runs.


100W of that is probably for the USB ports; afaik TB4 ports are required to support 15W, and I don’t think there’s been a Mac didn’t support full power simultaneously across all ports. (if that’s even allowed?)


I suppose given that this is two M1 Max glued together, assuming cooling is a solved problem, the max SOC power consumption just twice as high as usual, plus interconnect overhead. Right? Based on the thermal and power consumption characteristics of previous chips I would not be surprised if say ~120W is the max power draw of this thing.

edit: Of course the M1 max only shipped in laptops, so... who knows.


The M1 Max hits 100w in a laptop form factor with 'real' workloads when hitting CPU and GPU simultaneously (or at least not parasitic ones like prime95 & furmark). So this is probably >200w, unless it's been power limited and thus performs worse than 2x M1 Max's do anyway.


Does the M1 Max throttle when hit with hitting CPU & GPU at the same time? 14 inch would be most interesting to me.


> assuming cooling is a solved problem

I’d assume that’s what most of the chonk is about, no?

> Based on the thermal and power consumption characteristics of previous chips I would not be surprised if say ~120W is the max power draw of this thing.

The Max could be brought up to 90W or so.


I wouldn't assume heat is solved as it's been Apples weak point in the past. The cube would crack, g5 iMac's would melt capacitors, MacBooks would burn users' laps.


Those issues were all almost 20 years ago.

If you think this is still a problem, you haven't used any recent Macs. The current MB Air and MB Pro both run very cool even under prolonged heavy loads.

Apple's management of any and all heat issues has been far better than any competitors for a while now.


> Apple's management of any and all heat issues has been far better than any competitors for a while now.

Only if you define "for a while" as "since a year ago with the introduction of the M1".

Apple refused to make a thicker laptop or one with better ventilation to adequately cool the CPUs & GPUs they were sticking in them. They were among if not the worst of them all at handling the heat of the components they were using. Until the M1 Pro & Max rolled around, anyway, and suddenly they got thicker, with feet that raise it farther off the desk, and absolutely massive amount of vents all over 3 sides of the machine. Curious timing on that...


The last intel macs run way too hot still. They get up over 80dC. The 2014 ones would hard reboot when building large Java projects.


To be fair the current Dells do the same.


Haha what are you smoking, my 2018 macpro laptop was constantly throttling under heavy load, its thermal management was terrible


I didn't say that particular model was perfect. I said Apple has been doing far better than competitors. Which is true. I set up and used hundreds of Macs and hundreds of PCs in 2018 as part of my job. I am pretty confident as to which performed better overall thermally.

And of course, Apple has made huge progress since then (M1, better thermal designs, new fan designs which are quieter and more efficient) whereas PC makers have made basically zero progress.


You said "Those issues were all almost 20 years ago."

The "huge progress since then" was a year ago.

Your timeline is a little bit made up.


Two thirds of the volume seems to be dedicated to cooling. Assuming they’re not complete idiots, they must be doing something!



I'm confident the marketing oversells it, but it's likely very good in comparison. 3090 is about 28B transistors on Samsung 8nm, with some budgeted for raytracing. This is on TSMC 5nm, a process with 3-4x the density, and the 114B transistor count could potentially allow for similar GPU size - although I'd wait for Locuza or someone to analyze it. It should be very competitive in performance, and the winner by far in perf/watt, at least until RDNA3 and Lovelace GPUs release towards the end of the year.


You are comparing the power source (370W) to the CPU/SOC (120W). The power supply provides power for USB-C/Thunderbolt ports and it’s never a good idea to spec a power supply too low and run it too close to capacity.


I am sure it has great GPU performance for what it is but comparing it to a top Nvidia chip just seems ridiculous on Apples part. Apple I think is going to have trouble conquering back the semi pro workstation market they abandoned not that many years ago if they do not start offering M1 chips along with Nvidia GPUs.

Once again we get a Mac Semi-Pro Mini (seems like the Studio is more like a replacement for the Trashcan) that their marketing implies is maybe as good as a Mac Pro but is obviously not. It does look a lot better this time around - at least it has more ports :-D


Just a note: they explicitly said the Mac Pro is coming.


I'm interested to see if they shoot themselves in the foot and try to make it all M1 architecture or partner with AMD or Nvidia


Watching the keynote I was almost thinking that Nvidia missed the boat when they chose not to sign whatever they had to to make OSX drivers.

Thank you for recalibrating me to actual reality and not Apple Reality (tm)


nVidia missed the boat in releasing a bunch of "replace the whole laptop logic board" chips that died in the 2008-2012 timeframe and annoyed a whole host of OEMs:

https://www.techpowerup.com/64683/nvidia-admits-to-selling-f...

Apple specifically: https://support.apple.com/en-us/HT203254


Nvidia switched to lead free solder while retaining the same potting material. This led to a mismatch in thermal expansion coefficients which caused strain with repeated thermal cycles and eventually some of the solder bumps just broke. You could use a toaster to melt the solder again and reconnect them but that didn't fix the underlying problem.


My 2011 MPB 15” suffered from the Nvidia GPU issue. It was so bad my computer wouldn’t boot properly and the Apple Geniuses kept denying the claim because it couldn’t finish a test.

Anyway, it was still my longest lived Laptop. My Sony VAIOs were great but I liked that Mac better.


I pulled the logic board out and baked it in the oven and it lasted another 7 months before needing another bake. By then I moved onto a newer macbook.


By 2011 it was AMD(ATI) not Nvidia. I had one that failed but Apple did the replacement for free.


I am certain AMD like most responsible vendors (Seagate) also worked with Apple to correct the issue. Nvidia's issue was that it told all of the laptop vendors to just deal with it. It's why they are hated by other vendors and AMD/Intel worked hard to keep them from creating x86 cpus.


> Well, assuming you can actually buy any of these, anyway. The M1 Ultra might win "by default" by simply being purchasable at all unlike pretty much every other GPU :/

Can we stop it with the meme that these GPUs are unobtainable? Yes, they are still overpriced compared to their supposed original prices and they'll likely never return to that price given that the base prices of manufacturing, materials and such have increased for multiple reasons.

But stock has been generally available for many months now and it's possible to get them as long as you can afford them.


Where are these available? I just checked all the links on Nvidia's webpage for a 3060 and they are all out of stock...


> 370W continuous power draw.

Don't know if it's the same, these days, but when I was designing electronic stuff, we were always told to spec the power supply at twice the maximum draw.


> The M1 Ultra claims already look suspicious since the GPU graph tops out at ~120W & the CPU graph tops out at ~60W yet the M1 Studio is rated for 370W continuous power draw.

And that is expected, a lot has to be reserved for USB devices.


Yeah can’t say I’ve ever seen a computer rated for the exact amount it’ll be drawn at. They have to leave room.


SSD and those two large fans also use a fair amount of power too.


Those fans are unlikely to break 5w combined.


Guess we will see once one is available to be tested, but I doubt that’s the case. I was guessing 5-8W each, but I could be wrong of course.


Most PC fans are 1-2w. Given the claims of "near silent", I think it's plenty safe to say these are not 5,000+ RPM rippers that are going to be in the 5-8W each range.


Given the M1 Ultra has a 2lb heavier heat sink just to dissipate the extra heat, not sure that’s a safe bet. There are other ways to reduce noise than just using lower RPM fans.


> There are other ways to reduce noise than just using lower RPM fans

No, not really. You can try adding sound dampening, like BeQuiet does, but it's not as effective as just having more lower RPM fans (although it does help with coil whine). But Apple has historically never used sound dampening, and this doesn't look like it changes that. With how "open" it is anyway (the entire back being just a bunch of holes), sound dampening wouldn't be all that effective.

I'm not really sure what you think the heavier heat sink has to do with either the fan RPM or the noise profile. The bigger heatsink is if anything evidence of larger, lower-RPM fans. They're using more surface area, so they can spread the air movement out over larger fan blades. Which means they don't need to use as high an RPM fan.


Massive (by weight) heatsink can be used as thermal mass. This delays the need to increase fan speed and allows spreading that increase over time. One thing that is more noticeable than fan speed is rapid fan speed changes -- avoiding those makes entire system seem quieter.


Yeah, there are a lot more parameters here than just 'there are two chips and one is the bestestst', like the availability you pointed out.

There is raw performance, but there is also performance per watt, availability and scalability (which is both good and bad - M1 is available, but there is no M1 Ultra cloud available). If you want a multi-use setup, an RTX makes more sense than most other options, if you can get one and at a reasonable price. If you want a Mac, the M1U is going to give you the best GPU. In pretty much all other setups there are so many other variables it's hard to recommend anything.


For the market this is aimed at, performance per watt is really irrelevant. Performance per dollar or just outright performance are far more important the vast majority of the time. That's how we ended up with 125w+ CPUs and 300w GPUs in the first place.


There are dedicated ML cards from Nvidia for that, far most powerful than a 3090, so that is indeed true. But PPW is never irrelevant when someone is doing things at scale, so the question becomes: who is doing this for money but somehow not at scale?


These aren't rack mount products aimed at cloud providers, they are essentially mini workstations. What are you calling "at scale" for this? You're basically always pairing one of these machines to one physical person sitting at it whose time is being paid for as well (even for a solo creator, time is still money). It's a terrible tradeoff to save pennies per hour on power to turn around and pay the person dollars more per hour waiting on results.


That seems like a really bad way to spend money. Why limit a person to a single workstation if the workstation is the limiting factor? This is where we get clouds or if it must be done locally, rack mounted systems with many cards.

If you are doing it solo, with "just the hardware you happen to have", it matters a bit less. If you are doing it constantly to make money, and you need a lot of power, buying a one-person desk-machine makes no sense.


The cost of powering my 3090 for a year is now more than the cost (RRP) of a 3090.


Where do you live that power is anywhere close to that expensive? And are you overclocking your 3090?

Even assuming literal 24hr/day usage at a higher, "factory overclocked" 450w sustained, at a fairly high $0.30/kWh that's $1200/yr. Less than half the retail price of a 3090. And you can easily drop the power limit slider on a 3090 to take it down to 300-350w, likely without significantly impacting your workload performance. Not to mention in most countries the power cost is much less than $0.30/kWh.

At a more "realistic" 8 hours a day with local power pricing I'd have to run my 3090 for nearly 10 years just to reach the $2000 upgrade price an M1 Ultra costs over the base model M1 Max.


UK electricity is $0.28/kWh but will be $0.36/kWh from the end of the month - my business is quoted at $0.60/kWh fixed for the next 12 months.

At $0.36/kWh - card alone @450W ~running cost ~= RRP 3090 $1,499

Yes, can power it down to be more efficient, however, that effectively agrees with the previous comment that PPW matters.


Wholesale electricity price in some EU states is €550/MWh today. Most EU states are above €250/MWh.


prices in Europe right now are considerably higher than $0.30/kWh


It seems safe to say prices right now are not the norm due to, you know, that whole war thing going on that is impacting one of the EU's primary power supplies. The 2021 EU average was otherwise $0.22/kWh.


yeah but let's how fast we have those "normal" prices again, maybe this is the new normal, who knows.


The war is a factor going forward, but we got notified of the price increases before the war started, they relate more to the piss poor planning of our rulers than any external factors. Unless the people in charge are going to suddenly start planning for the future this will be the new normal.


But do you really run it all year round?


> performance per watt is really irrelevant.

Watts are dollars that you'll continue spending over the system life. It matters because you can only draw so many amps per rack and there will be a point when, in order to get more capacity, you'll need to build another datacenter.


The market is not data center use.


You'll still spend another Mac Studio on energy in order to run a comparable PC for the next five years. To say nothing about not wanting to be in the same room as its fans.


What are you talking about? This is not for cloud datacenters trying to squeeze every bit of compute per resource.

These machines are commonly used by professionals in industries like movie and music production. They don't care what the power bill is, it's insignificant compared to the cost of the employee using the hardware.


> "They don't care what the power bill is"

Oh... They do. At least, they should. If a similar PC costs $500 less but you spend $700 more in electricity per year because of it, at the end of the year, your profits will show the difference.


As I said, these numbers are so insignificant that they don't matter. The cost of the employee and their productive is several magnitudes more.

I ran a visual effects company a decade ago. We bought the fastest machines we could because saving time for production was important. The power draw was never a factor; a few catered lunches alone would dwarf the power bill.


Note that Geforce RTX on cloud is prohibited by Nvidia.


Yep, that's true. You have to use the DC SKUs which (IIRC) aren't the same silicon either. Worse: some of the server SKUs are restricted for market segmentation where your ML and hashing performance is bad but video is good (and the other way around).

The silly thing about it is that most of the special engines can now be flashed into an FPGA which is becoming more common in the big clouds so special offload engines aren't that big of a deal when they are missing. So in some cases you can have your cake and eat it too; massive parallel processing and specialised processing in the same server box without resorting to special tricks (as long as it's not suddenly getting blocked in future software updates).


The part of the EULA which is supposed to enforce that is not enforceable in Germany. It is complicated. There might be other ways you can circumvent agreeing to the EULA based on your location.


How do they functionally do that? I googled and found this?: https://www.nvidia.com/en-us/data-center/rtx-server-gaming/

Honestly asking because I’m kind of out of the nvidia loop at the moment.


Technically: the driver detects if it is run in a virtualization environment, it is at least able to detect KVM and VMware. On the upside, it's relatively easy to bypass the check.

Legally: I assume no cloud provider will assume the legal risk of telling their customers "and here you have to break the EULA of the NVIDIA driver in that way to use the service". In Europe where the legal environment is more focused on interoperability, this might not be as much of a problem, but still it may be too much risk.


They disallow such usage for "GeForce" by proprietary driver's EULA, and they limit open source driver performance (IIRC they require signed binary blob).


An M1 Ultra is $2000 incrementally over a M1 Max, so there is no price gap, even with the inflated prices 3090s actually go for today.


To be fair that $2000 also gets you +32GB RAM, +512GB storage, and +10 CPU cores. It's not just the GPU. Although yeah you can definitely fit a 3090-equipped PC into a $4k budget even with ebay pricing if pure GPU performance is all you really want.


> To be fair that $2000 also gets you +32GB RAM, +512GB storage, and +10 CPU cores.

Even though it's not an apples to apples comparison, keep in mind that a 1x32GB DIMM sells for less than 150$, and you can buy 1TB SSDs for less than 100$.


M1 Ultra is 64Gb extra, not 32.

> keep in mind that a 1x32GB DIMM sells for less than 150$

Keep in mind that M1 Pro/Max/Ultra is LPDDR5 6400 (https://www.anandtech.com/show/17024/apple-m1-max-performanc...) connected by a 512 bit memory controller.

Whereas is a kit of 2x 32 GB LPDDR5 4800 (I could not easily locate a quote for 1x 64Gb LPDDR5 4800 DIMM, leave alone 6400) retails for USD 548 (https://www.newegg.com/crucial-64gb-288-pin-ddr5-sdram/p/N82...).

I could not locate a reliable source on the type of the SSD employed in M1 Pro/Max/Ultra, so I will refrain from remarking on the comparison.


I remember a slide mentioning I think “up to 7.4gb/sec SSD read/write speed” or similar, which drastically reduces the pool of comparison SSDs. Intel Optane meets those specs, as do a few other brands I hadn’t heard of previously. In the later case, a 2TB version seemed to be $350-400. Take this for what it’s worth, but SSD is going to be more than $100 extra cost imho.


That's pretty much the speed that Samsung advertises for the 980 Pro: https://www.samsung.com/us/computing/memory-storage/solid-st...

And what WD advertises for the Black SN850: https://www.westerndigital.com/products/internal-drives/wd-b...

And what Seagate advertises for the FireCuda 530: https://www.seagate.com/products/gaming-drives/pc-gaming/fir...

And what Gigabyte advertises for the Gen4 7000s: https://www.gigabyte.com/Solid-State-Drive/AORUS-Gen4-7000s-...

etc...

They aren't $100 for 1TB, no, but a lot of them are around $150. Which would be a lot less than +$100 to go from 512GB to 1TB, too. It's $40 to go from the 500GB SN850 to the 1TB SN850, for example.


I checked the first two and it’s 7,000 read / 5,000 write. Pretty sure Apple said read AND write, which would be a lot faster than those. I might go back and rewatch the keynote, but I still thinking we are arguing pointlessly as Apple has always overcharged for SSD and RAM upgrades vs the price you’d pay elsewhere. Thanks for the DV though even when what I stated was right! ;)


It achieves this speed by not insisting on flushing to disk when requested: https://nitter.net/marcan42/status/1494213855387734019

When configured to ensure data integrity in the case of power loss (more important in this new M1 Studio machine unless it comes with integrated battery), then it's a lot worse.


I have a 8TB M1 Max. It pretty consistently has a max write speed of ~7.3 GB/s and max read of ~5.4 GB/s. No idea why or how write is faster, but that's not a typo.


Their website just specifies read speeds with no mention of write speeds. It's also "up to" and tested on the 8TB models. Assuming it's like other SSDs smaller capacities are usually slower.


HBM2 is a tad more expensive, but yes.


Apple doesn't use HBM2, so not really relevant


Correct my bad. GDDR6x is still more expensive then GDDR4 you get with Thread ripper.


Apple isn't using GDDR6X, either, nor does threadripper use GDDR4.


Wait, you can get a 3090?


3090 is literally the easiest card to get at RRP, at least here in UK. Set up discord alerts, the FE cards stay in stock for hours, the last one in February was in stock for the entire day so anyone who wanted to buy one at RRP(£1399) could do so without any issue at all.


This.. The FE is in stock every month for RRP. The 3080Ti too, I got mine that way.


its still 2400 EUR in Germany for example.


I'm not sure who sells the FE cards in Germany actually. I know it's LDLC in Netherlands, France and Spain.

edit: just found out - it's NBB:

https://www.notebooksbilliger.de/

There's probably a German discord somewhere to have alert for drops.


Yeah. They've been in stock for months here (though the retailers are charging the inflated prices too), e.g. https://www.computeruniverse.net/en/c/hardware-components/nv...


I mean at that price…yeah it’s “in stock” but it sure as hell ain’t available


All of those listed are WAY above MSRP! MSRP is $1,499 in the U.S. and £1,399 in the U.K.


The question was if they were available, not if they were available at MSRP (thanks downvoters, the comment even called out the price was inflated...). My understanding is that until recently in the US you basically had to buy from scalpers as the stocks had none at any price unless you stalked for restocks.

They're overpriced for sure, and that's the only reason the M1U pricing looks equivalent rather than exorbitant


> The question was if they were available, not if they were available at MSRP (thanks downvoters, the comment even called out the price was inflated...).

I wasn’t a DVer, but they’ve always been available if you were willing to pay a scalper. The only thing that has changed is that more retailers find it appropriate to rip off their customers. It’s kinda like a liquor store I’ve done business with for years now wanting $3,800 for a bottle of 23yr PVW. MSRP is $299.99. Should the owner not be able to make extra profit on it, no, of course they should. But >12x MSRP is just predatory imho.


What does "overpriced" mean, other than "more than you are willing to pay?"


* Higher than historical norms due to very unusual market conditions

* Much higher than MSRP, which was reduced compared to the previous generation because that attempt at raising prices killed market demand for said previous generation.

* No longer affordable by the traditional customer base but sustained by a new market with questionable longevity in its demand

Take your pick.

Sure, in a pure rational economic sense the market price has risen because supply has fallen at the same time a new market of buyers became very interested in the product, but we're talking consumer expectations and historical trends here, not the current price in a vacuum.


A 16" macbook with an M1 only uses around 100w and that's when maxing all the things. It runs at about 40w for CPU and 60w for GPU. Based on those numbers, 120w seems totally expected for two chips at the same frequency.


> look suspicious since the GPU graph tops out at ~120W & the CPU graph tops out at ~60W yet the M1 Studio is rated for 370W continuous power draw.

Interested to know what you think a reasonable PSU would be for A machine that was consuming close to 200W for processing...


Gigabyte and strix 3090 are routinely in stock at Newegg. Msrp. The shortage is over


3090 is still 2400 EUR in Germany, pretty sure that's not MSRP


Cursory look gives you a ~$3500 price tag for a gaming PC with a 3090 [1], vs. at least $4k for a Mac Studio with an M1 Ultra. Roughly the same ballpark, but I wouldn't call the M1 Ultra more affordable given those numbers.

1. https://techguided.com/best-rtx-3090-gaming-pc/#:~:text=With....


> Cursory look gives you a ~$3500 price tag for a gaming PC with a 3090

That 3500 is for a DIY build. So, sure, you can always save on labor and hassle, but prebuilt 3090 rigs commonly cost over 4k. And if you don't want to buy from Amazon because of their notorious history of mixing components from different suppliers and reselling used returns, oof, good luck even getting one.


You mean I get to save AND have fun building my own PC?


Not to mention if you build your own PC you can upgrade the parts as and when, unlike with the new Mac where you'll eventually just be replacing the whole thing.


I believed that until I realized I couldn't individually upgrade my CPU or RAM because I have a mobo with LGA1150 socket and only supports DDR3 (and it's only 6 years old).

So eventually you still have to "replace everything" to upgrade a PC.


You were unlucky to buy ddr3 near its end of life then (like someone buying ddr4 now), but you could still upgrade stuff like your GPU or drives independently. My first SSD (a 240gb Samsung 840) is still in service after 9 years with its smart metrics indicating only 50% of its expected lifetime cycles have been used, for example.

You could also put a 4790k, 16gb of ddr3 and a modern gpu in that system to get a perfectly functional gaming system that will do most titles on 1080p high. Though admittedly we've passed the point where that's financially sensible vs upgrading to a 12400 or something as both devil's canyon CPUs and ddr3 are climbing back up in price as supplies diminish


Right now not many DDR5 boards. In fact none for AMD


> I believed that until I realized I couldn't individually upgrade my CPU or RAM because I have a mobo with LGA1150 socket and only supports DDR3 (and it's only 6 years old).

DDR4 was released in 2014, which would suggest you purchased your mobo two full years after DDR3 was already deemed legacy technology and being phased out.

Also LGA1150 was succeeded by LGA1151 in 2015, which means you bought your mobo one full year after it was already legacy hardware.


Yes, they entered the market around those years, but what does that change? DDR3 and LGA1150 were not deemed "legacy" the day DDR4 and LGA1151 motherboards entered the market. They were 2-3x the price, and DDR3 dominated RAM sales until at least 2017. In fact, the reason DDR4 took so long to enter the market was incompatibility with existing hardware, and higher costs to upgrade. [1] I didn't go out of my way to buy "legacy hardware" because they weren't, at the time.

Point being, PC-building makes it easier to replace and repair individual components, but in time, upgrading to newer generations means spending over 50% of the original cost on motherboard, CPU, PSU, RAM. Not too different than dropping $3K on a new Mac.

[1] https://web.archive.org/web/20101219085440/http://www.xbitla...


> Yes, they entered the market around those years, but what does that change?

It means the hardware was purchased after it started to be discontinued.

It's hardly a reasonable take, and makes little sense, to complain how you can't upgrade hardware that was already being discontinued before you bought it.

> DDR3 and LGA1150 were not deemed "legacy" the day DDR4 and LGA1151 motherboards entered the market.

I googled for LGA1150 before I posted the message, and one of the first search results is a post on Linux tech tips dating way back to 2015 on whether LGA1150 was already dead.

And you purchased the Mobo one year after that.


I think you are forgetting the context of my replies. I'm not saying it's unreasonable to have to upgrade discontinued hardware, even if you have to do it all at once. My take is that it's not too different from having to replace a Mac when the new generation comes in (which is usually every ~5 years for Apple, not too far from my own system's lifetime). Being able to upgrade individual parts through generations is a pipe dream.

Also, we must have a different interpretation of "discontinued", because DDR3 and LGA1150 were still produced, sold, and dominated sales for way long after I bought that system. At the time (and for the next 1-2 years), consumer DDR4 was a luxury component that most no existing hardware supported.


You can still buy DDR3 new for not that much? 16GB is about $50 from numerous brands on Amazon at the moment. I bought some for an old laptop a couple months ago.

To do CPU upgrades you eventually have to replace the motherboard but you can keep using whatever your GPU/storage/other parts is. Sometimes that also means a RAM upgrade but it's still better than the literal nothing of modern Macs.


AMD has never disappointed me in this regard.


Zen 4 will being using a new socket, I wouldn’t go buying a Zen 3 with plans to upgrade the CPU down the road.


Well, you should still get one last upgrade out of an AM4 socket in the form of the upcoming 5800X3D ( https://www.amd.com/en/products/cpu/amd-ryzen-7-5800x3d )


This is already known for a long time already. You would have to actively choose to not listen to AMD news to not know.


I understand just making sure no one jumps on Zen 3 now with a promise of forwards compatibility.


8 years isn't a bad run for a CPU socket.


Since the context here is using these machines for work, a mid-level engineer will easily cost an extra $1000* in his own time to put that together :)

EDIT: I’m quite confident this is not at all an exaggeration. Unless you have put together PCs for a living. $100/h (total employment cost, not just salary), 1-2 hours of actual build & setup, 8 more hours of speccing out parts, buying, taking delivery, installing stuff and messing around with windows/Linux (I’ve probably spent 40 hours+ in the past couple years just fixing stuff in my windows gaming pc. At least 1 of those looking for a cabled keyboard so I could boot it up the first time, ended up having a friend drive over with his :D)


1000 bucks for 45 mins work? Maybe 1.5hrs tops? I didn't realise their wage was >500 an hour?


To be fair here, there is more to it than just assembly.

You have to spec out the parts, ensuring compatibility. Manage multiple orders and deliveries. Assemble it. Install drivers/configuration specific packages.

All of these things are easier today than ten or twenty years ago - but assigning it to a random mid-level engineer and I'd set my project management gamble on half a day for the busiest, most focused engineers least likely to take the time to fuss over specs, or one day for the majority.

ofc. to get to $1000 for that they'd still have to be on $230k to $460k.


Given that the last time I put together a PC computer was 2006, it'd probably take me DAYS to spec out a machine because of all the rabbit holes I'd be exploring, esp with all the advances in computer tech.


PC part picker will do the heavy lifting for you. There are also management tools that will let you install software bundles easily, no real extra time investment.


Just knowing about services like PC Part Picker and the management tools you mention requires time and expertise that people generally do not have before they build a computer, so "no real extra time investment" may only be true for someone who can amortize those upfront costs across many builds.

In my case I have built a couple PCs before, but it was so long ago that I'd have to re-learn which retailers are trustworthy, what the new connection standards are these days, etc. It's just not worth it to me to spend a dozen hours learning, specing, ordering, assembling, installing, configuring, etc to save a few hundred bucks.


It's a lot closer than you might think.

A senior engineer in the Bay can easily pull down $400k/year in total comp, which is $200/hour. The rule of thumb I've always heard is that a fully-loaded engineer costs roughly 2x their comp in taxes/insurance/facilities/etc.

When someone costs the company north of $3k/day, it's cheaper all round to just plonk a brand new $6k MacBook Pro on their desk if they have a hardware issue.


It would take me over 1.5 hours just to figure out what parts I need to buy.


FSVO fun if you use Newegg


FSVO: For Some Value Of

(I've been accused of overuse of acronyms, but that one's rare!)


At MicroCenter you would be hard pressed to pay more then $250 on their PC building service, you'll even get water coling installed and tested for this price. https://www.microcenter.com/site/service/instore-custom-pc-b...


The 3090 claims are overstated. There are multiple competitors in that space, and all of them need the TDP.

Performance per watt? I could see that being disrupted, but an iGPU in 2022 will be orders of magnitude less powerful than a dGPU, if wattage is ignored.


They are still a year+ ahead of 3090 on process node. Max was about equivalent to a 2080, so 2X max does line up with a 3090. A big difference is no ray tracing hardware, which takes up a lot of die space. Same process node and no ray tracing hardware and nvidia would come in at far less die space (3090 is 628.4 mm ^2, M1 ultra is 850mm^2).

If Nvidia were on the same node and increased die space to match M1 (ignoring the CPU portion of the die size), they would then be able to run at a lower clock with more compute units and probably match the TDP discrepancy.

An iGPU isn't necessarily slower if the system ram is fast, and M1 was one of the first consumer CPUs to move to DDR5. 3090 has 936.2 GB/s with GDDR6X, M1 Ultra with DDR5 memory controllers on both dies gets 800GB/s.


Having had my M1 MB pro from work freeze and stutter, I'm just not buying it. Your theory is great, never once expected this BS in practice.

For the record: I was the first M1 recipient (temporary 16gb MB, stock issues). I needed an Intel MBP because Rosetta ain't all that. I opted for, and was upgraded to, the 32gb M1 MBP. I chose M1 over Intel because it was unbelievably faster for the form-factor. My original comment does not concern laptops. My PC is orders of magnitude more powerful.

TDP is physics. You all might perceive Apple as perfect and infallible and all so lovely, but physics is physics.

I use AMD, not NVIDIA. And "what if" is irrelevant. It's like intentionally neutering Zen2 by comparing it to Intel single-core (as was done all the time). The reality is absolute, not relative. Comparing effective performance, not per-TDP, is what matters to the user. And my network/gpu/audio drops on both my 16gb M1 MB and 32gb M1 MBP under load.

Seriously not buying that "Apple can do nothing wrong" bias.

Take those #Ff6600-colored glasses off. The M1 has unbeatable value proposition in a pretty wide market, but Apple couldn't be further from a universally good machine.


Prebuilt 3090 builds can often be found for less than the cost of the corresponding parts.


And there is usually a premium for small form factor prebuilts


I'm not being snarky but I dont believe Mac people would know how to build a PC given their history of non-modifiable hardware and no way to repair them.


Hahaha good luck getting your hands on a 30xx series card though.

Here in Australia, 3090’s go for close to 3k on their own.


And cheapest Mac Studio with M1 ultra is A$6000 so yes....

20-Core CPU 48-Core GPU 32-Core Neural Engine

    64GB unified memory
    1TB SSD storage¹
    Front: Two Thunderbolt 4 ports, one SDXC card slot
    Back: Four Thunderbolt 4 ports, two USB-A ports, one HDMI port, one 10Gb Ethernet port, one 3.5-mm headphone jack
A$6,099.00


Here in UK it's not a big deal. Subscribe to discord alerts for FE series drops, last 3090FE drop in February the cards were in stock for a full day, at RRP(£1399). I got a 3080 drop at RRP this way too(£649).

But even ignoring the FE series, the prices have already crashed massively, you can get a 3080 AIB for less than £1000, and 3090s frequently appear around £1500-1600.


I can right now (ok, in the morning actually) walk into a computer store across the road here and buy 3090 off the shelf for 2299..2499€ (different makes and models). Those are in stock and physically on the shelf. Same for lesser cards of same series or AMD RX6000.


Those are scalper prices. Anyone can get a 3090 tomorrow for that price.


Yeah, they can fuck right off with those prices. Ethereum’s proof of stake switch can’t come too soon.


I'm seeing 3080s, in stock in stores I might consider buying from, sub-1800 AUD. It is heading back towards RRP (still about 50% over I guess). 3090s are twice that, yep.


Don't know about Australia but in my area(Asia) the prices are now going back near MSRP.


I bought two on ebay no problem


We've been buying tons of 3090s at work for about 1.6 USD- 2k USD without to much trouble


> tons


Hey, at 5lbs each a ton is only 400 cards!


You also need to compare the right cpu. M1 Ultra cpu is the equivalent of the fastest threadripper. Which costs $3990. So a pc with similar performance would be $7500


Not the top-of-the-line threadripper (which can go up to 32c/64t), but probably similar to the 5950x (16c/32t), which costs like 1000$.

But you’re comparing apples to oranges, because the real advantage of M1 chips is the unified memory - almost no CPU-GPU communication overhead, and that the GPU can use ginormous amounts of memory.


A threadripper also has many more PCIe lanes than a Ryzen. It's a bit of a different usecase I think although there's overlap.


I can't put a 3090 into a 3.5 liter case. Even a 10 liter case is really pushing it. That's before mentioning power savings. 3090 real world power when in use is something like 4x as high.


They are also absolutely massive and probably much more expensive long-term because of the massively increased electricity usage.


Unless you're running a farm of these, the power cost differences is going to be largely unnoticeable. Like even in a country with very expensive power, you're talking a ~$0.10/USD per hour premium to have a 3090 at full bore. And that's assuming the M1 Ultra manages to achieve the same performance as the 3090, which is going to be extremely workload dependent going off of the existing M1 GPU results.


It shocks me how much payroll and cap-ex is spent on the M1 and how little is invested in getting TensorFlow/Pytorch to work on it. I could 10x my M1 purchases for our business if we could reliably run TensorFlow on it. Seems pretty shortsighted.

The GPU claims wouldnt even need to be on parity with NVIDIA, it would just need to offer a vertically integrated alternative to having to use EC2.


Having beaten my head on this for a while (and shipped the first reasonably complete ML framework that runs on Metal) Apple's opinion as expressed by their priorities is that it's just not important.


> reliably run TensorFlow

What reliability issues are you having with TensorFlow on M1 Macs?


We've followed five different instructional and documentation pages to make it happen and none seem to consistently install. Throw in a corporate system where you need IT for root access to make changes and it is game over. So i've got an M1-max fully loaded and cant get TF running on it.

Now i've got a team of data scientists in a fully MBP shop and we're holding off upgrades to M1 until this all gets resolved.

On my personal M1, I managed to make it work, but its hard to know the layers of changes made and what exactly allowed it to work.


You can get off this GPU circus and simply go with purpose-built AI solutions.

You can buy single tensor accelerators from Google: https://www.coral.ai/products/

You can buy a bunch of those integrated into a single PCI-E card. https://iot.asus.com/products/AI-accelerator/AI-Accelerator-...

Cheap too. Some of these work with Mac. More of them work for PC, because the hardware interface is outside of Apple's thin vertical slice/garden.


These are devices for Tensorflow Lite which is more appropriate for IoT etc. not doing the intensive initial training of a complex model


Could be worth tracking what you did and make a new set of instructions, and trying to reproduce with a fresh install.


This is something Apple should pay people for.


Deep learning support for Mac is not going to happen at a level of quality you can rely on for research & dev work (like PyTorch + TensorFlow). The underlying problem is no big company cares about Mac platform and the work to maintain framework support for a specific piece of hardware is way beyond a hobby project. If you want your own on-prem hardware just buy Nvidia.


Neural Engine cores are not accessible for third party developers, so it'll be severely constrained for practical purposes. Currently the M1 Max is no match for even last generation mid-tier Nvidia GPU.


They are accessible to third party developers, only they have to use CoreML.


xD


Huh? Neural engine is certainly usable by developers. You just use the CoreML framework.


Apple loves to compare incomparable stuff. G5 the "world's fastest personal computer" etc. Its easy to claim GPU performance when you dont support modern OpenGL nor Vulcan so nobody can just run modern games and verify and you end up with "relative performance" graph whatever that means.


I'm very skeptical of this because until the CUDA strangehold is gone, it will be a pain to develop on. Even if the frameworks themselves support M1's GPU, there are still lots and lots of CUDA kernels that won't run.

I really hope I'm wrong (as someone who owns an M1 Pro chip) but I find it hard to imagine things changing significantly in the next ~2 years unless someone is able (legally and technically) to release a CUDA compatibility layer.


Here's AMD's attempt: https://github.com/ROCm-Developer-Tools/HIPIFY

Naturally, the HIP tooling doesn't support M1 GPUs at this time. We'll see if anyone else tries.


The most important detail here is 128GB ram for GPU computation! This allows to train monster models, ex 1B params GPT series, on single M1 Ultra. This is quite unprecedented. Unfortunately, it is also about 3.5x slower than 3090.


The GPU claims for M1 Pro and M1 Max were wildly above their actual performance in real life (as opposed to CPU performance) so maybe don't put all that much faith in Apple marketing here either.


Note the label on the y-axis. "relative performance" from "0-200" seems like marketing bullshit to me.

"M1 Ultra has a 64-core GPU, delivering faster performance than the highest-end PC GPU available, while using 200 fewer watts of power."

Note that they say "faster performance" not "more performance". What does "faster" mean? Who knows!


I have heard this argument before, if it’s identical workloads you get faster output but the same total work. Thus “faster performance” seems correct for fixed workloads and “more performance” is correct on games or benchmarks where you get more FPS.

I still think “faster performance” sounds sound odd, but I understand their point.


Anyway any benchmark from first party should be taken with a mountain of salt.


I always take such claims with a grain of salt anyways. It usualy on one specific benchmark. I wait for better benchmarks always instead of trusting the marketing


Even if their claims are accurate, it usually has the asterisk of *Only with Apple Metal 2. I honestly cannot understand why Apple decided they needed to write their own graphics API when the rest of the world is working hard to get away from the biggest proprietary graphics API.


Because vendor lock-in and full control over their APIs which has always been an apple staple but especially now.

5-10 years ago they were still serious about open standards, like OpenCL.. Now it's all locked in.


I'd like to note that Microsoft is doing no better with DirectX. If it weren't for the drivers on Windows being distributed and supported by the GPU manufacturers themselves, Vulkan would only be a thing for Linux (incl. Android) and custom niche devices now.


Because when they wrote the API no real standard existed to suit their needs.


Yes, and instead of writing an open standard or adapting Vulkan when it came around, they instead decided to double down on proprietary and moved to Metal 2.


I mean, Metal is a good API and they designed it first. They could've decided to abandon it when Vulkan came out, but I guess they just didn't want to make their API worse. Not ideal, yes, but I don't really blame them for the decision.


I know Nvidia has never really care much about TDP, but this still seems unbelievable to me. How could a relatively new design beat a 3090 with 200W less power, while having to share a die with a CPU? It just doesn't seem possible.


Unless the M1 Ultra is actually magic I don't think it is possible.

My guess is they're putting a lot of weight on the phrase "relative power" and hoping you assume it means "relative to each other" and not "relative to their previous generation" (i.e. M1 Ultra -> M1 Max and RTX 3090 -> RTX 2080Ti) or "relative to the stock power profile".

Put bluntly, if the M1 Ultra was capable of achieving performance parity with an RTX 3090 for any GPU-style benchmark then Nvidia (who are experts in making GPUs) would have captured this additional performance. Bear in mind the claim seems to be (on the surface) that the M1 Ultra is achieving with 64 GPU cores and 800GB/s memory bandwidth what the RTX 3090 is achieving with 10,496 GPU cores and 936.2GB/s memory bandwidth.


It's actually kinda magic but in the opposite way. The M1 Ultra is absolutely massive. 114 billion transistors. The 3090? A measely 28 billion. Now granted the M1 also has some CPU cores on there but even still, it seems safe to say that the M1 Ultra has more transistors spent on the GPU than a 3090 does. More transistors does often mean more faster when it comes to GPUs.

But you'll all but certainly see the 3090 win more benchmarks (and by a landslide) than the M1 Ultra does. Because Nvidia is really, really fucking good at this, and they spend an absurd amount of money working with external projects to fix their stuff. Like contributing a TensorFlow backend for CUDA. Or tons of optimizations in the driver to handle game-specific issues.

Meanwhile Apple is mostly in the camp of "well we built Metal2, what's taking ya'll so long to port it to our tiny marketshare platform that historically had terrible GPU drivers?"


Given things like the AMD Epyc cores sit at around 40 billion transistors, I would assume a (roughly) even split.

It is also worth noting that the M1 Ultra is an SoC so it'll have more than just CPU/GPU on it, by the looks of things it has some hefty amounts of cache, it'll also have a few IP blocks like a PCIe controller, memory controller, SSD controller (the current "SSDs" look to just be raw storage modules).

All told it likely still has somewhere in the region of 30-40 billion transistors for the GPU. Each GPU core being physically bigger than the 3090 is probably pretty good for some workflows and not so good for others. Generally GPUs benefit from having a huge number of tiny cores for processing in parallel, rather than a small number of massive cores.

Current benchmarks put it at roughly the performance of an RTX 3070, which is good for its power consumption, but not even close to the 3090. As I mentioned in the previous post, it just doesn't have the cores or memory bandwidth needed for the types of workloads that GPUs are built for (although unified memory being physically closer can help here ofc.), certainly not enough to make it a competitor for something like a 3090.

Edit: Oh also, for massively parallel workloads (like what GPUs do), more cores and better bandwidth to feed those cores will be one of the biggest performance drivers. You can get more performance by making those cores bigger (and therefore faster) but you need to crank the transistor count up a _lot_ to match the kinds of throughput that many tiny cores can do.


For some definitions of "affordable."


And hopefully not make you deaf with their buzzing fans


Doesn't "electricity bill" dominate server/DL/nining datacenter workload costs these days - so perf/W is what really counts? The long-prophecised ARM apocalypse may finally be at hand.


GPU claims at carefully chosen tasks that don't need GDDR6 to get top performance. It wouldn't game like a 3090 for instance, but then not many people are gaming on macs anyway.


This is insane. They claim that its GPU performance tops the RTX 3090, while using 200W less power. I happen to have this GPU on my PC and not only it costs over $3000, but its also very power-hungry and loud.

Currently, you need this kind of GPU performance for high resolution VR gaming at 90 fps, but its just barely enough. This means that the GPU will run very loudly and heat up the room, and running games like HL Alyx on max settings is still not possible.

It seems that Apple might be the only company who can deliver a proper VR experience. I can't wait to see what they've been cooking up.


Ehh, I wouldn't put too much stock in graphs of "relative performance" on a 0-200 scale like these. Marketing can cook up whatever they like when they want to make a product look good. Wait for actual benchmarks before trying to judge the product. Their base claim is >2x the m1 max, which is still nowhere close to the performance of a 3090.

Apple's footnotes don't even pretend to explain what these charts are.

> Performance was measured using select industry‑standard benchmarks.


Remember last time Apple put out these weird relative performance charts, and we all thought they were hiding something?

The M1 announcement. They turned out to be pretty accurate.

So I’ll wait and see real benchmarks, but it wouldn’t surprise me if this does have incredible performance.


Their GPU claims weren’t accurate though. The M1 max doesn’t have real world performance anything like the 3070 as claimed.


On raw Metal performance, yes it does get the same performance. The benchmarks showed this.

The problem was the implication that you'd get 3070 gaming performance. That was never going to be true because of the un-optimisation tax for games on Mac.

There doesn't exist a AAA game built for Metal and the Mac. The closest are games like World of Warcraft, Divinity Original Sin 2 – and even they are just "good ports" not originally designed for Mac (and are far from AAA graphics). This is why on Intel Macs, games under Bootcamp always ran 30%-50% faster, even though the hardware was the same.

Games on M1 Max run as you'd expect – about 30% slower than a 3070 for the same old reasons (and some new ones, like not being compiled for Apple Silicon at all). The GPU is about the same speed as a 3070 and it's doing what you'd expect, given the 30% unoptimization-tax workload.


pushing all this to "un-optimization tax" is an easy pass on apple. - nvidia really is a software company, it's the running joke in the industry. when you buy a nvidia gpu, you pay for the drivers & the frameworks (cuda, dlss, optix, ..). Apple does close to nothing there, they support Metal and CoreML and call it a day, you can decently lay some of the blame at their feet

- the workloads in games can vary a lot, vertex/fragment shaders imbalance, parallel compute pipelines, mixed precision (which the M1 gpu does not do), .. So another explanation is that you can get some 3070 parity on a cherry picked game, like a broken clock is right twice a day, but that does not make it generally true. Objective benchmarks have put the M1 gpus way slower than 3070 on average, and software support seems like an easy but false distraction given the Proton tax on Linux (which is not 30/50%)

- the M1 gpus are lacking a ton of hardware, matrix mul, fp16 again, ray tracing, VRR probably (not sure about this last one). These are used by modern games or applications, you may find a benchmark which skip them, but in the grand scheme of things it's something that the M1 gpu will have to emulate more often than not, and this has a cost

Waving all that as "the GPU is about the same speed" is technically wrong, or not really backed by facts at the very least


I know that Baldur’s Gate 3 looks pretty frigging awesome on my mbp-max, and it has a native arm binary, so no need for Rosetta and uses Metal2.


For what its worth, they were comparing the M1 max to nvidia's mobile 3070, which is a completely different card than the desktop 3070.

My understanding was that it was a reasonable comparison in some benchmarks, though when plugged in to power the mobile 3070 still had more headroom.


_Some_ of the claims were pretty accurate.

A lot of them, particularly the GPU benchmarks, were misleading because they only looked at performance that they had dedicated silicon for.


What we're seeing right now with Apple's M1 chips for desktop computing is on the same level of revolutionary as what the original iPhone did for mobile phones.

The advancements in such a short period of time in the amount of computing power, low power usage, size, and heat usage of these chips is unbelievable and game-changing.


Don't get me wrong, the benefits of using ARM architectures for general purpose vs. x86 are really compelling.

The performance of x86 has been a leader for a while mostly because of the sheer amount of optimization work that has gone into them, but the cruft of the x86 instruction set and the architectural stuff you have to do to make the instruction set work is really showing it's age.

That being said, the GPU performance claims are incredibly misleading. The previous "relative performance" benchmarks that were done on the M1 Max for GPU performance were misleading as well, they definitely cannot keep up with a mid-tier modern discrete GPU.

The GPU claim isn't an ARM/x86 comparison like the CPU performance would be. This is comparing a 64 core 800GB/s GPU with a 10k core 900GB/s GPU and trying to make them look equivalent through misleading marketing.

None of this is to say that the M1 Ultra is bad necessarily, even if it performs roughly the same as a mobile GPU or powerful iGPU it would still be a very good chip, and I'd love to use one if I could use it in my environment properly. I'm just saying don't put too much faith in the GPU performance measurements provided here.


> The performance of x86 has been a leader for a while mostly because of the sheer amount of optimization work that has gone into them

Without denying some good work and engineering having gone into some x86 chips, they are not the reasons for the x86 becoming a leader. The duopoly of Intel and Microsoft – coupled with the aggressive Intel strategy to undermine competitors on the pricing and the sheer production volume they could quickly ramp up – has squeezed every single other viable competitor out of the market and relegated very few to become niche players (e.g. POWER) and entrenched the duopoly as a unfortunate leader. And then complacency and arrogance had set in for years to come until recently.


GP was talking about performance not marketshare.


It really feels like this is all in the name of their AR/VR efforts. The killer device as far as I can think would be a simple headset that packs the capabilities of full-blown workstations. Apple Silicon seems like it could totally be on that track in some way.


I don't think so. I think it stemmed from Apple's desire to make their SoC fully in-house for their iPhones. They realized they were onto something, and took it to a whole new level by focusing on desktop-class performance.

This was probably further fueled by their soured relationship with Intel which was responsible for thermal issues on MBP's for years, poor performance increases across their entire Mac lineup, and poor cellular radio performance on some iPhone models -- forcing them to settle with Qualcomm and ditch Intel for mobile radios.


True but I can only imagine the cost of that headset. It will be sure the 1% of the 1% :'(


Why would you believe that it tops the RTX 3090? It's just 2 of their prior chips in the same package? This is purely marketing nonsense with nothing to back it up.

At least in the US You can get a 3090 for 2200 even with markup and an almost as good 3080 for $1150. If you wait until its in stock at a big box store you can get one for less even. A machine could be built with a 3080 for $2000.

Meanwhile a system based on on the 64 core GPU will run you $5000 and as such is affordable to nearly nobody thus few will get a chance to see it drastically under perform in the gaming arena on any of the games that don't support mac on arm.

With an absolutely invisible market share in the gaming desktop there will never be any incentive for anyone to change this insofar as direct support leaving you reliant on translation from x86 and from windows executables paying doubly in terms of compatibility and performance from an already very expensive and lackluster starting point.


Too bad that historically Apple has not given any attention to Mac gaming.


Part of Apple's historic MO has been to not invest in areas they don't see themselves having a competitive advantage in. Now that they can make gaming happen with a very low wattage budget they may well try to enter that space in earnest.


The difference between an Apple TV and a Mac Mini is essentially how powerful of Apple silicon it has, whether it runs tvOS or macOS, and whether it has HDMI out or not.

The Studio is a more compact form factor than any modern 4K gaming console. If they chose to ship something in that form factor with tvOS, HDMI, and an M1 Max/Ultra, it would be a very competitive console on the market — if game developers could be persuaded to implement for it.

How would it compare to the Xbox Series X and PS5? That’s a comparison I expect to see someday at WWDC, once they’re ready. And once a game is ported to Metal on any Apple silicon OS, it’s a simple exercise to port it to all the rest; macOS, tvOS, ipadOS, and (someday, presumably) vrOS.

Is today’s announcement enough to compel large developers like EA and Bungie to port their games to Metal? I don’t know. But Apple has two advantage with their hardware that Windows can’t counter: the ability to boot into a signed/sealed OS (including macOS!), load a signed/sealed app, attest this cryptographically to a server, and lock out other programs from reading with a game’s memory or display. This would end software-only online cheating in a way that PCs can’t compete with today. This would also reduce the number of GPUs necessary to support to one, Apple Metal 2, which drastically decreases the complexity of testing and deployment of game code.

I look forward to Apple deciding to play ball with gaming someday.


This all makes sense, and in that context it’s unfortunate that Apple’s relationship with the largest game tools company, Epic, is... strained, to say the least.

They could always choose to remedy that with a generous buyout offer.


Won't there be a Pluton-based anti-cheating solution? Seems like a natural opportunity for Microsoft.

Edit: https://arstechnica.com/information-technology/2022/01/pluto... says

> Microsoft already used Pluton to secure Xbox Ones and Azure Sphere microcontrollers against attacks that involve people with physical access opening device cases and performing hardware hacks that bypass security protections. Such hacks are usually carried out by device owners who want to run unauthorized games or programs for cheating.

So initially you could have Pluton-only servers and down the line non-Pluton hardware will simply be obsolete.


Yep. On the plus side, anything with Apple silicon or a T2 chip has this available today already in macOS, so that's every shipping Mac starting in what looks like 2018: https://support.apple.com/en-us/HT208862

They won't have the Ultra GPU, but Apple's been shipping for years and Microsoft is just now bringing Pluton to market. I do wish them luck, but that's a lot of PC gamer hardware to depreciate.


Seems unlikely that the general population would spend 2-4k on a console.


If you could spend 2-4k on a special playstation or xbox with double/triple/quadruple graphics capability, it would sell. The games will work fine on the cheapest m1 mac mini, not everyone and every game will need max settings on 4k at 144hz to be a great experience.


Well, only the macOS users would be spending a thousand dollars or more for their console-capable Macs, which are general purpose computers with absurd amounts of memory. TV users could spend a lot less for an Apple TV 4K with M1 inside, assuming Apple released it with less of this or that.


The Apple TV (A12 chip) is currently $199. The new iPad Air with the M1 chip costs $599.

They also don't need the display, camera, microphone. And could sell it at a loss and make the margins with TV+ and game sales.

But they would need their own bundled controller/accessories and get serious about AAA gaming.


Apple has never sold a hardware product at a loss. Never.


Don’t gamers spend tons of money on gaming PCs?

Also, might be cheaper a couple years down the line.


PC gamers do. Console gamers don't. And there are a lot more of the latter than the former.


> I look forward to Apple deciding to play ball with gaming someday.

I wish.

But playing ball is more than hardware. It is spending billions to buy Activision or Bungie. And I can't honestly imagine Apple having the cultural DNA or leader aspiration to make a game like The Last of Us where the player is brutally beating zombies to bloody clumps.

In video games the business side demands having exclusives, or timed exclusives, to sponsor twitch streamers playing your game and cutting special deals with studios. This is very different to the App store where Apple emphasizes their role as a neutral arbiter and a dev having the same deal as any other dev. Can you imagine the complains here on hn if Epic Games would get a a special deal just because they are a bigger fish and Fortnite is popular?


A recent keynote compared an Apple series chip to an Xbox One S, can’t remember which though.


Pippin atmark again!


The gaming performance of the existing M1 GPUs is, well, crap (like far behind the other laptop competition, to say nothing of desktop GPUs). The Ultra probably isn't changing that, since it's very unlikely to be a hardware problem and instead a software ecosystem & incentives problem.


I don’t think the performance is that bad. On a whim I tried running WoW on a 16” M1 Pro MBP and it consistently got FPS higher than the refresh rate (120hz) with the game rendering at 1x scale and most effects maxed out. Granted, that’s not as good as what you’d get with a mobile RTX 3080 or something, but it’s nothing to sneeze at for a laptop that doesn’t get scorching hot and doesn’t sound like a leaf blower when being pushed.

I could definitely see the Max and Ultra with a beefier cooling system (like the Studio’s) having pretty respectable performance.


https://www.anandtech.com/show/17024/apple-m1-max-performanc...

M1 Max struggles to keep up with an RTX 3060 mobile.

Now that's with the overhead of Rosetta 2 and all that, so it's of course "not fair" for the M1. But that's also the current reality of the market, so ya know.


WoW isn't exactly the pinnacle of demanding performance. It's like saying you can run counter strike at 200FPS: congrats, so can everyone else, without paying $2000


Yes, it's "crap" not because GPU is slow but rather that most triple A title games are optimized for Windows. So it's difficult to expect a good performance from a game that runs with API conversion layer(s) (Windows -> Mac) and then CPU emulation (x86 -> arm).

So publishers/developers need to make more native games. Even though every Mac port will probably make 1/10th revenue of a Windows title I guess Mac users would be happy to pay more for better games. I certainly would.


It is a video memory problem. Unified memory is nice but you need GDDR6 to feed a powerful GPU. But you don't want GDDR6 for the cpu.


$4000-$6000 is a toy for a tiny number of rich people or a work machine for a well paid professional.

In the PC space the average spend is $800 and a PS5 is $500.


This has been always very strange for me. Apple chips used in smartphones were very consistently near the top of the pack even in terms of GPU performance, and unlike their Android counterparts, they rarely throttled, and could deliver said perf with decent battery life.

Yet the iOS 'gaming' scene, despite being one of the major revenue drivers, consists mostly of low-quality F2P games.


Every god damn child has an iPad already to keep them entertained and the hardware is plenty good enough. Going one step further in the age ranges to get big game revenue like the switch will help them capture more.


Or maybe Jobs was just salty about the Halo buyout?

In any case, it's a moot point. Apple clearly doesn't care about desktop gaming and it shows in both their hardware and software.


Apple has now too much money and is running out of core business areas. Expect more investing in non-Apple areas like gaming, cars, etc.

Though every video game company on the planet hates them because of App Store terms.


"every video game company" = Epic Games, and mostly because they don't like to pay overhead for their Fortnite loot boxes that most parents wish they could have some control over. I don't have allegiance to either company mostly because they don't care about me, just my money.


> I don't have allegiance to either company mostly because they don't care about me, just my money.

This criticism is something that is a positive to me. Opposing companies are often dependant on adverting money and the things this leads to are a whole lot worse in my view.


Epic doesn’t do lootboxes, next.


In the spirit of his argument, they do use time-gated content and other methods to psychologically exploit their users into buying things.


They used to? At least Rocket League had them for a long time. Unsure how long after Epic purchased it they had them though.


> Expect more investing in non-Apple areas like gaming, cars, etc.

I remember people saying this about phones in 2006.


The first half of the event was old wine in new bottles - I reckon that’s the main growth area they are squeezing.


Or was it last seasons wine in old bottles?


The "new" display is an 8 year monitor whose updates were a better webcam & integrated speakers, and they're charging more for it than ever before. More like rotten wine in old bottles.


Except when Bungie was going to release Halo on Mac[0] and Microsoft swooped in bought them and made it an Xbox thing.

[0] https://www.youtube.com/watch?v=Tzrme9yWens


The Marathon days were really great. That jump was a huge loss to me, even if it wasn’t to Apple.


Once they are confident of their graphics advantage I think they will enter the console market against Microsoft and Sony with Apple tv console


Apple is the world's biggest and most profitable gaming company. For every AAA gamer, there are a hundred casual gamers (one reason why Nintendo consoles run circles around Sony and Microsoft).

Apple has invested billions into their gaming division. The big thing they need right now is a new version of Metal that gets feature parity with Vulkan or DX.

Also of note, there are very persistent rumors of an upcoming VR headset. Their M1 alone would blow away competition like the Quest. A Pro or max chip with some disabled CPU cores wouldn't cost a ton due to being scavenged cores and would positively stomp the competition.


In what concerns developer productivitiy, it is Vulkan that needs to get feature parity with Metal.


HL Alyx is actually quite well optimized, and you can definitely run it on Ultra with super-sampling on a 3090.

[1] https://www.youtube.com/watch?v=kjNaC0-hiPE


I didn't see display resolution (or the model of HMD) mentioned in the video. I'm using Varjo Aero, which has 2880x2720 per eye, which is almost quapruple (per eye) compared to Valve Index. I think this resolution is enough for a good VR experience; pixels are almost invisible, and even small text is readable. However, HL Alyx doesn't run at 90 fps on full resolution.


Uh, Varjo Aero is a clear outlier in terms of resolution and definitely does not represent the current typical VR headset experience (which in early 2022 would be around Quest 2 or Valve Index). If you went out of your way to get an expensive niche high-end headset, it struggling on a 3090 is your problem, not HL:A's.


I'd say that its just a couple of years ahead of the mainstream curve. My thinking is that Apple probably wants to make a very high resolution VR headset (similar or close to Varjo Aero), and it seems that with their new chips, they might be able to pull it off.


How is the foveated rendering? Is it noticeable?


There's no software support for foveated rendering in any games yet, afaik. It works in desktop mode and it's fast enough to be unnoticeable.


The author of that video is using the valve index, which is 1440×1600 per eye


That HMD looks pretty badass. Didn't realize hardware support for eye tracking/foveation already existed. Does any software support it? That would fix your perf issue.


Yes, that would probably fix it, but there's no software support yet in any games, afaik. It works only in desktop mode.


But it's still impossible to replace RTX 3090 with this new Mac Studio because games just will not run on MacOS.


Maybe Valve can port proton to mac


That would not help. The reason proton works so well is because Linux on x86 and Windows on x86....are both on x86.

Proton on ARM Mac's would involve Rosetta and while that does a surprisingly good job of running x86 on ARM I'm not sure it's up to the job of running games at high speed...


Games seem playable running through Microsoft’s inferior x86 compatibility layer in a Windows ARM VM on M1 Pro/Max, so I don’t see how running games through Rosetta would be any worse.


If you have some monster like 3090, you expect it to run games smoothly on “ultra” settings. In reality, some games are pushing its limits even on “middle” settings (Cyberpunk 2077 @ 2k, RDR2 @ 4k). Any emulator will hit performance, so if you want some top-of-the-top performance - you still need Windows PC with RTX 3090. And it will not be even more expensive than Mac Studio ;)


I want my game to vsync with my monitor. That is all I care about.


Your 4k monitor?


I have a 5k monitor, but yes. That is literally all I care about.


Congratulations, no video card in the world is able to keep up a 5k res at 60FPS while doing even slightly demanding things. Nor will the M1 Ultra be.


My 3090 can drive 4k100hz for most demanding games at their highest settings.

Some games are a stable 4k120 and others are more like 4k75.

I feel the 3090 could feasibly drive 5k60 as a result.


They are not the most demanding games, that's why :)


Real world example M1 Max;

GTAV on Win11 Arm VM - okay, but not great.

GTAV on Crossover - much much better, lack of joystick support (but that's a crossover issue)


They used to have a Proton port for Mac, and discontinued it.


The 3090 should have no problem playing all but the most poorly optimized VR games at max or near-max settings. Often times the difference between "very high" and "ultra" is indistinguishable btw, with "ultra" just being shit that wasn't well optimized to begin with.


The relative graphics claims are particularly dubious. M1 Ultra has 10% less memory throughput than a 3090. I somehow doubt that nvidia has left performance on the table that Apple is picking up, even with the power savings of being on a smaller node. On the balance of things this just seems wrong. The intuitive range of relative performance for a 3090 in flat-out single precision vector workloads should top out at 90% but more likely 75% or less.


The problem is, last time I checked, you're not able to really make use of it for gaming. Obviously MacOS doesn't run all that many games compared to Windows and now Linux (with Proton, or to be honest quite possible even without), and something like parallels can't take advantage of the M1's power.

Not to say it'll never happen, but its not a done deal basically, and to my knowledge the process hasn't yet started


> you need this kind of GPU performance for high resolution VR gaming at 90 fps, but its just barely enough

I run VR games on the index at 144hz with high settings without issue on a 3060 Ti.

I've been on the market for a 3070-3090, but only because I want a card for which a water block is available, not because I need more power for any extant game.


I play Rust (the game, not the language) with decent graphics on an M1 Air with no issues other than heat, which an external fan quickly mitigates.

Really looking forward to Apple's VR offering after seeing the performance of their compact SoC


But Rust runs even on intel integrated graphics without any problem (HD4000)


You're right, it is insane, but this is Apple.

So expect what that graph actually means is some extremely specific, cherry-picked benchmarks


After their comparison claims on the M1 Pro/Max and Nvidia GPUs, I would take these comparisons with a huge grain of salt.


I wonder how it compares for crypto mining, and if that group would be buying these up


Poorly compared to an ASIC which has the SHA-256 algorithm encoded as hardware circuits. Maybe better than GPUs when it comes to mining Ethereum or others where ASICs aren't as prevalent.


If you're mining crypto you want 95% of your capex to go to graphics cards/ASICs, not to RAM+NICs+CPU+case+etc.


if it isn't marketing straight from the Apple HQ I will be wrong


unless it has its own gddr6 it won’t be anything like a 3090 for games


Is it actually more powerful than a top of the line threadripper[0] or is that not a "personal computer" CPU by this definition? I feel like 64 cores would beat 20 on some workloads even if the 20 were way faster in single core performance.

[0]https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-3...


My workstation has a 3990x.

Our "world" build is slightly faster on my M1 Max.

https://twitter.com/kiratpandya/status/1457438725680480257

The 3990x runs a bit faster on the initial compile stage but the linking is single threaded and the M1 Max catches up at that point. I expect the M1 Ultra to crush the 3990x on compile time.


Are both cases compiling to the same target architecture? If not, you may well be comparing the relative performance of different compiler backends instead of comparing the performance of your CPUs.

(+ now I see it's rust: how parallel is your build, really?)


> now I see it's rust: how parallel is your build, really?

Not the OP but I install a lot of Rust projects with Cargo and recently did some benchmarking on DigitalOcean's compute-optimized VMs. Going from 8 cores to 32 cores was a little disappointing:

Bat (~40 crates): 68s -> 61s

Nushell (486 crates): 157s -> 106s

Compilation starts out highly parallel and then quickly drops down to a small number of cores.


The target arch doesn’t matter for the change-build-test loop d vs do. All that matters is how fast can you compile your code to test it.

If the final x86 production build takes longer it doesn’t matter - that happens on the cloud anyway.

Edit: Rust builds are very parallel until linking. No different than any other LLVM build.


> The target arch doesn’t matter for the change-build-test loop d vs do.

It matters when comparing CPU performance, which is what this benchmark is being used for.


that the arm backend is faster to compile for is a legit advantage for the developer though, even if its distinct from cpu perf


> The 3990x runs a bit faster on the initial compile stage but the linking is single threaded and the M1 Max catches up at that point.

Isn't linking IO-bound?


https://github.com/rui314/mold would suggest otherwise. Massive speedups by multithreading the linker. I think traditional linkers just aren't highly optimised.


> https://github.com/rui314/mold would suggest otherwise.

Does it, though?

I mean, if you read that link you'll notice it boasts the linker's performance by comparing it with cp and how it's "so fast that it is only 2x slower than cp on the same machine."

Is cp supposed to be CPU-bound?


Yes, but that's for mold, which is multithreaded. The original context of this thread being the question of whether a linker would see speedups from multithreading. Most people are using traditional single-threaded linkers which are an order of magnitude slower than mold. The fact that mold is so much faster suggests that a linker does indeed see big speedups from multithreading.


The data the linker is running on will generally largely be in memory (in their posted example, which is a warm compile, completely in memory).


Exposing my limited understanding of that level of the computing stack - it is but Apple seems to have very very good caching strategies - filesystem and L1/2/3.

https://llvm.org/devmtg/2017-10/slides/Ueyama-lld.pdf

There is a breakdown in those slides discussing what parts of lld are single threaded and hard to parallelize so I suspect single thread performance plays a big role too. I generally observe one core pegged during linking.


> Exposing my limited understanding of that level of the computing stack - it is but Apple seems to have very very good caching strategies - filesystem and L1/2/3.

That would mean that these comparisons between Threadripper and the M1 Ultra do not reflect CPU performance but instead showcase whatever choice of SSD they've been using.


L1/2/3 are CPU caches, not SSD. Though there is a good chance these are mostly firmware optimizations, not hardware. So still not an apples-to-apples comparison of cpu design.


By firmware you mean microcode, but I don't think either of those actually use microcode to control this.


> L1/2/3 are CPU caches, not SSD.

Why did you omit the reference to "file system"?

Are we supposed to ignore the fact that a linker's main job is reading object files and write the output to a file?

I find this sort of argument particularly comical given a very old school technique to speed up compilation is to use a RAM drive to store the build's output.


For a clean build and a reasonably specced machine, all the intermediate artifacts will still be in the cache during linking.


Do you have any benchmarks of the two to share against a mean to ensure its built optimally? Maybe something like https://opendata.blender.org ? It's painful hearing random anecdotes only to learn the person didn't apply thermal paste to their cpu.


From the tweet reply you used sccache with hot cache, which would probably be mostly single-threaded since it's just fetching and copying things from cache.


I’m not trying to compare the details of SOC performance.

Just that with the same hot caches, the average change-build-test loop that developers do 100+ times a day is just faster on the M1 Max.


Try mold.


>Try mold

Curiosity got the better of me:

https://github.com/rui314/mold


We plan to move to it once MacOS support lands (for the laptops).


Intel was the single-threaded king until Zen3, so that's no real surprise.

Try the same thing with mold.


My 1st gen 16 core Threadripper is barely faster than an M1 Pro/Max at kernel builds, so a 64 core TR3 should handily double the M1 Ultra performance.

But you know, I'm still happy to double my current build perf in a small box I can stick in my closet. Ordered one :-)


How many threads are actually getting utilized in those kernel builds? I don't work on the kernel enough to have intuition in mind but people make wildly optimistic assumptions about how compilation stresses processors.

Also 1st gen threadrippers are getting on a bit now, surely. It's a ~6 year old microarchitecture.


Kernel has thousands of compilation units. Each of them is compiled by a separate compiler process. Only the linking at the end doesn't parallelise, however it should take a much smaller part of the time. The proportions change of course, if you develop kernel and do incremental builds lots of times. Then the linking stage might become a bottleneck.

The above statement should also relate to most other C/C++ projects.


Kernel is C code though, which doesn’t require as much cpu to compile as say c++. It could be more IO bound. There should be a lot of more legit data available on this than my armchair speculations.


C++ adds a few time-consuming stuff, like templates, however my belief was that what uses the most time is optimization passes, which won't be much faster for C. The actual benchmarks should be done however.


Yes, it'd be interesting to see this comparison made with current AMD CPUs and a full build that has approximately the same price.

I am curious whether there is a real performance difference?

I do lots of computing on high-end workstations. Intel builds used to be extremely expensive if you required ECC. They used that to discriminate prices. Recent AMD offerings helped enormously. I wonder whether these M1 offerings are a significant improvement in terms of performance, making it worthwhile to cope with the hassle of switching architectures?


but do the m1s have ecc?


All of them. The CPU graph is pegged for most of the compilation.


Kernel compilation can be heavily parallelized.


I wouldn’t automatically expect a linear decrease in compile time with growing core count. That would have to be tried.


Seems like for things that are: 1. Perfectly parallel 2. Not accelerated by some of the other stuff that's on the Apple Silicon SoC's ...it will be a toss-up.

Threadripper 3990X get about 25k in Geekbench Multicore [1]

M1 Max gets about 12.5k in Geekbench Multicore, so pretty much exactly half [2]

Obviously different tasks will have _vastly_ different performance profiles. For example it's likely that the M1 Ultra will blow the Threadripper out of the water for video stuff, whereas Threadripper is likely to win certain types of compiling.

There's also the upcoming 5995WX which will be even faster: [3]

[1] https://browser.geekbench.com/processors/amd-ryzen-threadrip...

[2] https://browser.geekbench.com/v5/cpu/search?utf8=%E2%9C%93&q...

[3] https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-p...


Also of note is that half of the Mac Studio's case is dedicated to cooling. Up to this point, all M1 Max benchmarks are within laptops while all Threadripper benchmarks are in desktops. The M1 Max in the Mac Studio will probably perform better than expected.


This is sound logic and probably be the case but I wonder if this effect will be less than what we have seen in the past because of the reduced TDP of the M1 processors in general.

Maybe the cooling and power delivery difference between laptop formfactors and PC formfactors will be less with these new arm based chips.


Maybe; every chip hits a point where feeding in more power doesn’t make it go any faster.

If I was to guess, the increased cooling probably helps the Studio sustain similar boost clocks as the laptops, but for longer.

Although it’s possible these are on N4x, which might increase the attainable boost.


Geekbench is extremely sensitive to the OS. Like the same CPU on Windows & Linux score wildly different on Geekbench. For example the 3990X regularly hits 35k multicore geekbench when run on Linux: https://browser.geekbench.com/v5/cpu/11237183


Something is seriously fishy about those geekbench results.

24-core scores 20k, 32-core scores 22.3k, and 64-core score 25k. Something isn't scaling there.


Many GB5 (and real world) tasks are memory bandwidth bottlenecked, which greatly favors M1 Max because it has over double a Threadripper's memory bandwidth.


Sort of. The CPU complex of the M1 Max can achieve ~200 GB/s, you can only hit the 400 GB/s mark by getting the GPU involved.

At the same time the Threadrippers also have a gargantuan amount of cache that can be accessed at several hundred gigabytes per second per core. Obviously not as nice as being able to hit DRAM at that speed.


That cache is not uniform time access. It costs over 100ns to cross the IO die to access another die's L3, almost as much as going to main memory. In practice you have to treat it as 8 separate 32 MB L3 caches.

Also, not everything fits into cache.


Yeah, it’s the real world tasks that GeekBench tries to simulate that don’t tend to scale linearity with processor count. A lot of software does not take good advantage of multiple cores.


That's certainly true. But if that's your workload you shouldn't be buying a 64-core CPU...

I use a few 32 and 64 core machines for build servers and file servers, and while the 64-core EPYCs are not twice as fast as the 32-core ones due to lower overall frequency, they're 70% or so faster in most of the things I throw at them.


Does Geekbench actually attempt to simulate that in their multi-core score? And how?

I was under the impression that all of their multi-core tests were "run N independent copies of the single-threaded test", just like SPECrate does.


> A lot of software does not take good advantage of multiple cores.

It sounds pointless to come up with synthetic benchmarks which emulate software that is not able to handle hardware, and then use said synthetic benchmarks to evaluate the hardware performance.


It has a very specific point: communicating performance to people who don't know hardware.

Most consumers are software aware, not hardware aware. They care what they will use the hardware for, not what they can use it for. To that end, benchmarks that correlate with their experience are more useful than a tuned BLAS implementation.


Probably it's the thermals that don't scale. The more the cores, the lower the the peak performance per core.


Having not seen benchmarks, I would imagine that claimed memory bandwidth of ~800 GB/s vs Threadripper's claimed ~166 GB/s would make a significant difference for a number of real-world workloads.


Someone will probably chime in and correct me (such is the way of the internet - Cunningham's Law in action) but I don't think the CPU itself can access all 800 GB/s? I think someone in one of the previous M1 Pro/Max threads mentioned that several of the memory channels on Pro/Max are dedicated for the GPU. So you can't just get a 800 GB/s postgres server here.

You could still write OpenCL kernels of course. Doesn't mean you can't use it, but not sure if it's all just accessible to CPU-side code.

(or maybe it is? it's still a damn fast piece of hardware either way)


On an M1 Max MacBook Pro the CPU (8P+2E) cores peak at a combined ~240GB/s the rest of the advertised 400GB/s memory bandwidth is only useable by the other bus masters e.g. GPU, NPU, video encoding/decoding etc.


So now the follow-on question I really wanted to ask: if the CPU can't access all the memory channels does that mean it can only address a fraction of the total memory as CPU memory? Or is it a situation where all the channels go into a controller/bus, but the CPU link out of the controller is only wide enough to handle a fraction of the bandwidth?


It's more akin to how on Intel, each core's L2 has some maximum bandwidth to LLC, and can't individually saturate the total bandwidth available on the ring bus. But Intel doesn't have the LLC <-> RAM bandwidth for that to be generally noticeable.


Fascinating!

Linking this[1] because TIL that the memory bandwidth number is more about the SoC as a whole. The discussion in the article is interesting because they are actively trying to saturate the memory bandwidth. Maybe the huge bandwidth is a relevant factor for the real-world uses of a machine called "Studio" that retails for over $3,000, but not as much for people running postgres?

1 - https://www.anandtech.com/show/17024/apple-m1-max-performanc...


I'm obviously going to reserve judgement until people can get their hands on them. Apple makes good stuff but their keynote slides are typically heavily cherrypicked (e.g. our video performance numbers compare our dedicated ASIC to software encoding on a different architecture even though competing ASICs exist kinds of things).


Next-gen Threadripper Pro was also announced today: https://www.tomshardware.com/news/amd-details-ryzen-threadri...


Bit of a wet fart though, even Charlie D thinks it's too little too late. OEM-only (and only on WRX80 socket), no V-cache, worse product support.

https://semiaccurate.com/2022/03/08/amd-finally-launches-thr...

The niche for high clocks was arguable with the 2nd-gen products but now you are foregoing v-cache which also improves per-thread performance, so Epyc is relatively speaking even more attractive. And if you take Threadripper you have artificial memory limits, half the memory channels, half the PCIe lanes, etc, plus in some cases it's more expensive than the Epyc chips. It is a lot to pay (not just in cash) just for higher clocks that your 64C workloads probably don't even care about.

AMD moved into rent-seeking mode even before Zen3 came out. Zen2 threadripper clearly beats anything Intel can muster in the segment (unless they wanted to do W-3175X seriously and not as a limited-release thing with $2000 motherboards) and thus AMD had no reason to actually update this segment when they could just coast. Even with this release, they are not refreshing the "mainstream" TRX40 platform but only a limited release for the OEM-only WRX80 platform.

It was obvious when they forced a socket change, and then cranked all the Threadripper 3000 prices (some even to higher-levels than single-socket Epyc "P" skus) what direction things were headed. They have to stay competitive in server, so those prices are aggressive, but Intel doesn't have anything to compete with Threadripper so AMD will coast and raise prices.

And while Milan-X isn't cheap - I doubt these WRX80 chips are going to be cheap either, it would be unsurprising if they're back in the position of Threadripper being more expensive for a chip that's locked-down and cut-down. And being OEM-only you can't shop around or build it yourself, it's take it or leave it.


The PRO Threadrippers are not cut down, they have 8 memory channels and 128 PCIE 4.0 lanes. I think the only limitation compared to Epyc is that you can have 1 socket only.


And WRX80 has dedicated chipset lanes (https://www.amd.com/en/chipsets/wrx80), so effectively more PCI-e than EPYC, and bootable NVME RAID support on top.


It's bad for competition that only Apple gets to use TSMC's 5nm process. Though what's really bad is that Intel and Samsung haven't been able to compete with TSMC.


AMD will be on TSMC N5P next year, which will give them node parity with Apple (who will be releasing A15 on N5P this year), and actually a small node lead over the current N5-based A14 products. So we will get to test the "it's all just node lead guys, nothing wrong with x86!!!" theory.

Don't worry though there will still be room to move the goalposts with "uhhh, but, Apple is designing for high IPC and low clocks, it's totally different and x86 could do it if they wanted to but, uhhh, they don't!".

(I'm personally of the somewhat-controversial opinion that x86 can't really be scaled in the same super-wide-core/super-deep-reorder-buffer fashion that ARM opens up and the IPC gap will persist as a result. The gap is very wide, higher than 3x in floating-point benchmarks, it isn't something that's going to be easy to close.)


We've already seen x86 draw even with Intel 12th gen: https://www.youtube.com/watch?v=X0bsjUMz3EM


So in that one, you've got a 20-thread Intel part (6+8C/20T) at probably 3.5 GHz going against a 10-thread Apple part (8+2C/10T), at probably 3 GHz, and the Apple part still comes out on top by ~5% in Cinebench R23 MT. And that's with Intel having 50% more high-performance threads available.

Work out the IPC there - the Intel has a 2x thread count advantage, a 17% clock advantage, and Apple comes out 5% ahead. So the IPC gap there is about 2.46x.

It's not a perfect comparison of course, since we're mixing SMT and big/little cores, but in basically every area Intel should (on paper) have more resources available and Apple is coming out on top anyway by sheer IPC.

That's what I'm saying - you can't really do that approach with x86. It's not power-advantageous or transistor-advantageous to go super wide on the decode or reorder buffer like that on x86. And regardless of the tricks x86 uses to mitigate it, you've still got a 2.5x IPC gap at the end of the day. A 2.5x IPC gap will not be closed up by just a single node shrink.

And that's looking at MT, where your task scales perfectly. See where I'm going with this? Intel is using 2x the number of threads, and 3x the number of efficiency cores to get there. Apple can deliver that punch across a much lower number of threads - meaning ST-bottlenecked tasks will scale much much better on Apple.

With a single-threaded test, the M1 is pulling 7W vs 33W for the Alder Lake intel. Obviously that tells us nothing about efficiency, since we'd need to know the scores, but that's the downside, is for normal, poorly-threaded tasks, like surfing the web or editing code, the 12900HK is going to be boosting high to reach the same performance levels the M1 does at 3 GHz. And that's exactly what you see in the power figures there.

In short: you will likely see x86 able to keep up in one metric or another. You can win on performance if you just go nuclear on power. You can match on power on perfectly-threadable tasks that allow the x86 to deploy twice the threads (sharing instruction cache/etc). You can match on single-threaded battery life if you accept lesser performance. But the overall performance of the M1 derives from the massive IPC it generates, and that's something that x86 can't match nearly as easily.

Going ham on a single metric just to claim victory isn't nearly the same thing as the level of all-round performance and efficiency that Apple has achieved there.

(see also, putting a 128-thread Threadripper 3990WX workstation up against a 10-thread M1 Max laptop just to win at rendering... and people here thought that disproved that Apple was great hardware lol)


I don't think there's anything specifically different about the reorder buffer between x86 and ARM.

The reorder buffer size is just a logical consequence of the frontend width.

And yes, scaling an Aarch64 frontend is dead simple compared to x86 due to the fixed instruction width. The disadvantage of x86 is serious, but I don't know if we can count it out quite yet. This is the first time Intel and AMD got any serious pressure on that front. I'm sure they're taking the challenge seriously, and it'll take some years before we'll see the results.


x86 and ARM have different memory access assumptions which affects the reorder buffer in material ways. Apple added something to their chips to use the x86 memory model when flagged for Rosseta2.


How much little cores and HT contribute at this workload tho? If I say turn them off, could I claim the opposite, losing even 20% but using 40% less threads?


There is a third variable: Apple is putting ram much closer to the CPU than AMD. This has the advantage that you get lower latency (and slightly higher bandwidth), but the downside that you're currently limited to 128gb of ram, compared to 2tb for threadripper (4tb for epic). Amd's 3d cache that they're launching in a few months will be interesting since it lets the L3 go up a ton.


Latency to RAM is not any better. Bandwidth is extremely much better but not for this reason.


why so smug? who cares whichever company makes a better processor?


Well compare that to a 400$ CPU like a 5900x, the first M1 is slower than this one and cost 2x the price.


Apple's ARM chips can process a metric ton of ops per cycle due to the architecture of the chip: https://news.ycombinator.com/item?id=25257932


But the answer to the question is still "no".


Doesn't have to though. A Threadripper 3990X uses barrels of electricity, generates plenty of heat, comes with no GPU, has worse single-threaded performance, and still costs $4000 by itself without any of the parts needed to make it actually work.


The question is in relation to Apple's claim that it's "the world’s most powerful and capable chip for a personal computer".


It might also be reasonable to say that the threadripper is a workstation chip, not a chip for personal computers.

Edit: even AMD themselves call their threadripper lineup workstation chips, not personal.

https://www.amd.com/en/processors/workstation


Threadripper Pro is the workstation chip. Regular Threadripper (non-Pro) was not aimed at workstations, it was aimed at the "HEDT" market. Strictly speaking it's considered a consumer market (albeit for the enthusiasts of enthusiasts)


I'd call them personal chips. When I think of non-personal chips I think IBM POWER or Ampere Altra.


Why do you think of Altera as non-personal chips?


Not Altera, Ampere Altra. This: https://amperecomputing.com/processors/ampere-altra/

If the purchase page says to "contact sales" and doesn't list a price then it is not for consumers.


Click where to buy tab, then there is a list of distributors. Including ones which sell workstation versions with a configurator and pricing. https://store.avantek.co.uk/ampere-altra-64bit-arm-workstati...


Depends on what you define "capable" as. Remember, they specify that it is the most powerful and capable chip, not necessarily complete system.

There's no other chip that has the power of an RTX 3090 and more power than an i9-12900K in it - after all, Threadripper doesn't have a lick of graphics power at all. This chip can do 18 8K video streams at once, which Threadripper would get demolished at.

I'm content with giving them the chip crown. Full system? Debatable.


It's a fantastic chip but that wasn't the question. I love my M1 Max and I love my Threadripper workstation, each has their own strengths and that's alright.


Through you would need to compare it to the coming threadripper 5000WX(?) or better the soon coming Ryzen 7000 CPUs (which seen to have integrated graphics).

I mean they all are CPUs coming out this year as far as I know.


Only if the only thing you compare is CPU performance - adding a big GPU on die adds a certain amount of ‘power’ by any measure.


It doesn't matter. Speaking as an Apple cult member imo Threadripper is better value if you're not using the machine for personal use.


It doesn’t support your argument when we’re talking about a massive processor like a threadripper vs. a M1 Ultra.

The performance per watt isn’t in the same universe and that matters.


The article claims that the chip is "the world’s most powerful and capable chip for a personal computer". It's reasonable to ask whether it genuinely is faster than another available chip, it's not an implicit argument that it's not powerful.


The M1 Ultra is by a very wide margin the bigger of the two. According to Tom's Hardware [1], top-of-the-line Epycs have 39.54 billion transistors. That is about a third of the 117 billion in the M1 Ultra. Apple builds bigger than anyone else, thanks largely to their access to TSMC's best process.

The M1 Ultra is a workstation part. It goes in machines that start at $4,000. The competition is Xeons, Epycs, and Threadrippers.


That’s not really a fair comparison. Apples chip spends most of that on their GPU, and the neural engine takes a chunk too. Threadripper is only a CPU.


> The performance per watt isn’t in the same universe and that matters.

I couldn’t give less of a shit about performance-per-watt. The ONLY metric I care about is performance-per-dollar.

A Mac Studio and Threadripper are both boxes that sit on/under my desk. I don’t work from a laptop. I don’t care about energy usage. I even don’t really care about noise. My Threadripper is fine. I would not trade less power for less noise.


This is what some folks miss.

One hour of my time is more expensive than an entire month of a computer electricity bill.

Some people just want tasks to perform as fast as possible regardless of power consumption or portability.

Life's short and time is finite.

Every second adds up for repetitive tasks.


The only reason I've ever cared about watts is that generally speaking 120 watt and 180 watt processors require more complicated cooling solutions. That's less true today than it ever was. Cases are designed for things like liquid cooling, and they tend to be pretty silent. The processors stay cool, and are pretty reliable.

I personally stick to the lower wattage ones because I don't generally need high end stuff, so I think Apple is going the right direction here, but it should be noted that Intel has also started down the path of high performance and efficiency cores already. AMD will find itself there too if it turns out that for home use, we just don't need a ton of cores, but instead a small group of fast cores surrounded by a bunch of specialist cores.


wattage doesn't really tell you how difficult it is to cool a part anymore. 11th-gen Intel is really easy to cool despite readily going to 200W+. Zen3 is hard to cool even at 60W.

Thermal density plays a huge role, the size of the chips is going down faster than the wattage, so thermal density is going up every generation even if you keep the same number of transistors. And everyone is still putting more transistors on their chips as they shrink.

Going forward this is only going to get more complicated - I am very interested to see how the 5800X3D does in terms of thermals with a cache die over the top of the CCD (compute die). But anyway that style of thing seem to be the future - NVIDIA is also rumored to be using a cache die over the top of their Ada/Lovelace architecture. And obviously 60W direct to the IHS is easier to cool than 60W that has to be pulled through a cache die in the middle.


The Zen 3 stock coolers work pretty well. I've never had a problem.

Looking it up though I do see a lot of concerns with the heat they generate. I can only conclude I don't push my chip very hard (which, honestly, I probably don't)

I've been happy with the AMDs I purchased over the past 4 years, we'll see how they hold up and how this next gen comes out. I did see that the recent Intels are quite competitive which is good for everybody.


yup, the correct answer here is people need to stop being worried about the thermals as a metric in themselves, and look at the performance their chip is generating. If your chip is running at 90C, but you're hitting 100 ScoreMarks, and attaching a 5hp chiller to it lets you hit 105 ScoreMarks, that's not really worth it.

Yeah, longevity, blah blah, but laptop chips are designed to sit above 90C under load, it's fine.

Just saying that "how hard it is to cool" doesn't solely depend on power consumption anymore. Heat density is making that harder and harder, even if power consumption stays the same.

What does improve though is how much heat it pumps into your room. Yeah, a Rocket Lake at 200W might be roughly as hard to cool as an AMD at 90W or whatever... but one is still putting 200W into your room and the other is still putting 90W. Temperatures are not the same thing as power dissipation either. I don't like having my gaming PC running in my room during the summer, and I'm actually looking at maybe running cables through the walls to have it in the basement instead. I also have a 5700G and some NUCs that are much lower power that I prefer to use for surfing and shitposting.


> wattage doesn't really tell you how difficult it is to cool a part anymore

Sure it does. Reading the rest of your post I think you're more talking about temperature than cooling requirements, but a 200W CPU needs 200W of heat dissipation, while a 60W CPU only needs 60W of heat dissipation. It's literally a 1:1 relationship since CPUs don't do any mechanical work, so power in == heat out.

Keeping temperatures below some arbitrary number does then include things like density, IHS design, etc... But that only matters for something like Intel's "Thermal Velocity Boost" where it's really important to stay under 70C specifically instead of just avoiding thermal throttling.


Air coolers can handle 300 watts without any complexity. Just a big block of fins on heat pipes.


I mean, yeah, but then you've got case issues and whatnot. I appreciate your point though.


The power is only relevant because it makes the machine quite in a compact form. If you've got a bit of space, then a water cooled system accomplishes a lot of the same thing. For some people there is an aesthetic element.

Power does make a big difference in data centers though - it's often the case that you run out of power before you run out of rack space.

Where power for a computer might make a difference could be in power-constrained (solar/off grid) scenarios.

I don't know if I've ever heard anyone make an argument based on $$$.


Did you know noise pollution causes dementia?

https://www.theguardian.com/society/2021/sep/09/transport-no...


Ouch. I wonder what Apple thinks about that after selling millions of noisy intel macbooks.

As for desktops, watercooling makes computers dead silent.


Yep, it's bad that those products were like that.

I personally bought a Ryzen 5950(?) instead of a Threadripper because I figured I'd accidentally spill water all over it or however it works. There are not many watercooled OEM products as far as I know.


The vast majority of developers today has a laptop as their main machine. Performance-per-watt is absolutely crucial there.


I'll just never understand this. Chances are you are at the same desk day in and day out. You probably have monitors and external keyboard/mouse hooked up because hunching over a laptop and using a touchpad is unnecessary torture for a fixed workspace. Given that why would you hamstring yourself with a thermally constrained and overpriced-because-miniaturization-isn't free setup?

Until maybe these M1's (and I'm not entirely convinced) I've not in the 20 years I've been computing seen a reasonably configured desktop (eg not just a laptop on a stick ala iMac but an ACTUAL desktop) ever not smoke the pants off of every single laptop you could put up against it. It's hard to beat the one-two punch of lots of power and room to cool it. If you are sitting at at desk why the heck wouldn't you leverage that?


I was a desktop diehard until performance reached the point where laptops are good enough. Being able to take the same computer around with you and use it seamlessly in a different place is a big improvement.

I still have a proper desk-based working environment hooked up to a docking station though. I really wouldn't want to use a laptop that doesn't have a first-party dock as my primary machine.


Meetings, working from home, travel, changing desks every few months. Not having a jungle of cables. Retina screen (still hard to find on external monitors). The vast majority of devs also don’t need an insanely powerful machine. If Docker on Mac wasn’t dog slow, I could easily get by with a $999 MacBook Air.


The only reason I also went up to 32GB RAM(almost bought the 64GB one) was because of Docker.


That’s cool. I am not the vast majority of developers. I am me. I use a desktop for high-end game development. I don’t give a shit about web development or laptop development. I care about compiling large C++ projects as fast as possible.

I agree that most developers are web/mobile developers who use a laptop. That’s great. I am an increasingly niche developer.

The root comment was a comparison against Threadripper. Normal developers should not waste money on a Threadripper. If someone is a niche developer that warrants a Threadripper then pointing out that most developers don’t need a Threadripper is a waste of time.


I care about power only because it means louder fans, and I like quiet.


Same. That's why I use watercooling to keep the room silent.


It is worth noting that it has, at least according to Apple's graphs, slightly more than an RTX 3090 in graphics performance.

So, even if it doesn't quite beat Threadripper in the CPU department - it will absolutely annihilate Threadripper in anything graphics-related.

For this reason, I don't actually have a problem with Apple calling it the fastest. Yes, Threadripper might be marginally faster in real-world work that uses the CPU, but other tasks like video editing, graphics, it won't be anywhere near close.


It wont be even close to RTX 3090 looking at m1 max and using same scaling maximum it can be close to 3070 performance.

We all need to take Apple claims with grain of salt as they are always cherrypicked so i wont be surprise if it wont be even 3070 performance in real usage.


Looks like all the people saying "just start fusing those M1 CPU's into bigger ones" were right, that's basically what they did here (fused two M1 Max'es together).

And since the presenter mentioned the Mac Pro would come on another day, I wonder if they'll just do 4x M1 Max for that.


> I wonder if they'll just do 4x M1 Max for that.

Unlikely, M1 Ultra is the last chip in the M1 family according to Apple [1].

"M1 Ultra completes the M1 family as the world’s most powerful and capable chip for a personal computer.”"

[1] https://www.apple.com/newsroom/2022/03/apple-unveils-m1-ultr...


I've been saying 4x M1 Max is not a thing and never will be a thing ever since the week I got my M1 Max and saw that the IRQ controller was only instantiated to support 2 dies, but everyone kept parroting that nonsense the Bloomberg reporter said about a 4-die version regardless...

Turns out I was right.

The Mac Pro chip will be a different thing/die.


Plus they are running out of M1 superlatives. They’ll have to go to M2 to avoid launching M1 Plaid.


They could do M1 More Thing.


m1 hyper turbo deluxe

or they could take a page out of microsofts book and just call the next one "m one"


M1 One: The Second One!


M1 Series One


M1 Ultra²


M1 Mark II ala Sony


Bloomberg brought the Supermicro hit pieces. I personally can't take them seriously anymore. Not after the second article with 0 fact checking and sad attempt at an irrelevant die shot. And their word is certainly irrelevant against one of people who are working (and succeeding) at running linux on M1.


Could they do a multi-socket board for the Mac Pro?


I expect this for CPU side. Multiple SoC introduces NUMA but it's already done on Dual Xeon based Mac. I wonder how their GPU work for the configuration.


In a world where latency:storage size tradeoffs need to be made for practical reasons (and will, at some point, be required for fundamental physical reasons), we should just embrace NUMA anyway. Death to the lie of uniform access! NUMA is the future!

Ehrm, anyway.

It actually isn't clear to me whether designing a two socket motherboard is fundamentally an easier task than jamming more of the things into a single package (given that they have already embraced some sort of chiplette paradigm).


I believe designing/manufacturing a 5nm chip only used for Mac Pro can't be profitable. That's why they designed M1 Ultra as MCM of M1 Max.


They would never do that


The Mac Pro (and the high-end Powermacs that preceded it) were always available in a dual socket incarnation, right up to the trashcan Mac.


The trashcan Mac Pro was announced almost 9 years ago. Apple today is a very different company. They've moved onto chiplet designed SoC's with little to no care about upgradability.


They have done this previously for dual socket Xeons. Historical precedence doesn’t necessarily hold here, but in fact, it’s been done on the “cheese graters” previously


They've moved on to chiplet design. Don't disregard their clear direction to SoC's with non-upgradeable RAM, GPU, CPU.


They also water-cooled them but I’d bet a lot of money we never see that again either.


The Mac Pro chip will be a 4x M2 Max


In a previous Apple press release[1] they said:

> The Mac is now one year into its two-year transition to Apple silicon, and M1 Pro and M1 Max represent another huge step forward. These are the most powerful and capable chips Apple has ever created, and together with M1, they form a family of chips that lead the industry in performance, custom technologies, and power efficiency.

I think it is just as likely that they mean "completes the family [as it stands today]" as they do "completes the family [permanently]."

[1] https://www.apple.com/newsroom/2021/10/introducing-m1-pro-an...

edit: This comment around SoC code names is worth a look too: https://news.ycombinator.com/item?id=30605713


What is M2 really going to be difference wise?


I think M2 is going to improve on single thread performance.

Judging from the geekbench scores[0], M1, M1 Pro, and M1 Max perform identically in single threaded tasks. And the newly leaked Mac Studio benchmark[1] shows essentially identical single thread performance.

[0]: https://browser.geekbench.com/mac-benchmarks [1]: https://browser.geekbench.com/v5/cpu/13330272


~15-20% faster if releases start this year, plus whatever optimizations learned from M1 in wide release such as perhaps tuning the silicon allocation given the various system. If next year, M2 or M3 (get it) will use Taiwan Semi's so-called 3nm, which should be a significant jump just like 7-5nm several years ago for the phones and iPads.


Hopefully one of the changes of the M2 design will be a better decorrelation of RAM and cores count.

They’d need that anyway for a Mac Pro replacement (128GB wouldn’t cut it for everyone), but even for smaller config it’s frustrating being limited to 16G on the M1 and 32 on the Pro. Just because I need more RAM doesn’t mean I want the extra size and heat or whatever.


For my purposes, the biggest drawback of using an SoC is being constrained to just the unified memory.

Since I run a lot of memory intensive tasks but few CPU or GPU bound tasks, a regular m1 with way more memory would be ideal.


I doubt there will be much learned after actually shipping M1. Developing silicon takes a long time. I wouldn’t be surprised if the design was more or less fixed by the time the M1 released.


They also said that the Mac Pro is still yet to transition. So they'll have to come up with something for that. My suspicion is that it won't be M branded. Perhaps P1 for pro?


I have been thinking about this, and one theory I came up with was a switch chip between M1 Max chiplets... you have 4 (or more?) M1 Maxes connected to the same switch chip... it might add latency for some tasks but it would be one way to scale without going to a new M2 processor... Then again, M2 could come out in June at the WWDC, doubling everything and adding 3 more ultra connect things to each M2 Max, allowing unlimited (ish) upgradability... but, probably not...


It may be M branded, just not M1. It could be based on an M2 Ultra or M2 Mega or whatever and what they said would still hold true.


That doesn't necessarily rule out more powerful iterations that also launch under the M1 Ultra branding though.

(edit: per a sibling comment, if the internals like IRQ only really scale to 2 chiplets that pretty much would rule it out though.)


Probably not on the same design as the current M1 series, at least not for the Mac Pro. The current x86 pro supports up to 1.5TB of RAM. I don’t think they will be able to match that using a SoC with integrated RAM. There will probably be a different CPU design for the Pro with an external memory bus.


M2 family Mac Pro upcoming…


>Looks like all the people saying "just start fusing those M1 CPU's into bigger ones" were right,

Well they were only correct that Apple managed to hide a whole section of Die Image. ( Which is actually genius ) Otherwise it wouldn't have made any sense.

Likely to be using CoWoS from TSMC [1] since the bandwidth numbers fits. But needs further confirmation.

[1] https://en.wikichip.org/wiki/tsmc/cowos


The interconnect area was already known from third-party die photos of the M1 Max https://twitter.com/vadimyuryev/status/1466526403331952644


They were not known when people keep rampling about the the Mark Gurman report on "just start fusing those M1 CPU's into bigger ones".

I wrote about it three months ago.

https://news.ycombinator.com/item?id=29430817


I've been using a Vega-M for some time which I think follows this model. It's really great.



Right. Still riding gains from the node shrink and on package memory.

Could AMD/Intel follow suit and package memory as an additional layer of cache? I worry that we are being dazzled by the performance at the cost of more integration and less freedom.


The next CPU coming from AMD will be the 5800X3D with 96MB cache. They stack 64MB L3 on top. Rumours say it comes out 20th of April.

edit: typo + stacking + rumoured date


They might have to have the unified memory more dense to get to 1.5 TB max of RAM on the machine (also since this would be originally shared with a GPU). Maybe they could stack the RAM on the SoC or just get the RAM at a lower process node.


The M1 Max/Ultra is already extremely dense design for that approach, it's really almost as dense as you can make it. There's packages stacked on top, and around, etc. I guess you could put more memory on the backside but that's not going to do more than double it, assuming it even has the pinout for that (let's say you could run it in clamshell mode like GDDR, no idea if that's actually possible, but just hypothetically).

The thing is they're at 128GB which is way way far from 1.5TB. You're not going to find a way to get 12x the memory while still doing the embedded memory packages.

Maybe I'll be pleasantly surprised but it seems like they're either going to switch to (R/LR)DIMMs for the Mac Pro or else it's going to be a "down" generation. And to be fair that's fine, they'll be making Intel Mac Pros for a while longer (just like with the other product segments), they don't have to have every single metric be better, they can put out something that only does 256GB or 512GB or whatever and that would be fine for a lot of people.


> You're not going to find a way to get 12x the memory while still doing the embedded memory packages.

https://www.anandtech.com/show/17058/samsung-announces-lpddr...

> It’s also possible to allow for 64GB memory modules of a single package, which would correspond to 32 dies.

It is possible, and I guess that NVIDIA’s Grace server CPU will use those massive capacity LPDDR5X modules too.

The M1 Ultra has 8 memory packages today, and Apple could also use 32-bit wide ones (instead of 64-bit) if they want more chips.


Do people really need 1.5 TB of unified memory? If you had 128 GB of unified memory and another pool of 1.5 TB or whatever of “slower” memory (more like normal speed on the Intel/AMD side) would that work?

You (or the OS or the chip) could page things in and out if the unified memory. Treat unified memory as a MEGA L3 cache.

Depending on how it’s done it may not be transparent if you want the best performance. But would it work?


>I wonder if they'll just do 4x M1 Max for that.

They'll be running out of names for that thing. M1 Ultra II would be lame, so M1 Extreme? M1 Steve?


Just need to increment the 1 instead... Eventually moving into using letters instead of numbers, until we end up with the MK-ULTRA chip.


I suspect they will have a different naming convention after they get to M4.

There might be some hesitance installing an M5. You should stay out of the way if the machine learning core needs more power.

I guess by the time they get to M5, anyone old enough to get the reference will have retired.


That would really be a trip!


> M1 Steve

That would be the funniest thing Apple has done in years. I totally support the idea.


Pro < Max < Ultra < Ne Plus Ultra < Steve


And Steve < Woz, perhaps?


I'm pretty sure all their messaging is preparing us for "M1 Outrageous"


Super M1 Turbo HD Remix


M1 Ludicrous the IV


Maximum Plaid


"M1 More", would show Apple is fun again!


It seems kind of strange to have the "A" line go from smartphones to... iPads, and then have the "M" line go all the way from thin-and-lights proper workstations. Maybe they need a new letter. Call it the C1 -- "C" for compute, but also for Cupertino.


M1 Max Pro... :P


"iPhone 14 Pro Max, powered by the M1 Max Pro".


M1 Hyper or M1 Ludicrous:)


M1 Houndstooth


M1 God


Just get the guys who came up with the new name for the iPhone SE working on it. Oh, wait.


I like "X1" way more than "M2 Extreme".


Plus Ultra would at have historical precedent.


I like M1 Steve, as it can honor two people.


M1 Plaid


lol, good one! I hope marketing sees this.


Steve would be awesome, but a deal with Tesla to use "Plaid" would be perfection.


I would think they could just go to Mel Brooks instead of dealing with Tesla.


Never happen.

Elon is somewhat toxic these days...


iM1 Pro.


M1 Magnum XL


Epic M1 fit good


M1 Greta in 2030 (when it is "carbon neutral")


You don't just "fuse" two chips together willnilly. that was designed in from the beginning for the architecture for future implementation.


My guess would be an M1 Ultra with additional modular expansions for specific purposes. I.e. instead of having a GPU, you could add a tensor processing module if you need to do machine learning, or a graphics processing chip for video production, and so on.


its a chiplet design. whenever people ask what we going to do after 1nm...well, we can combine two chips into one


The CPU trend nowadays seem to be combining chiplets and having performance and efficiency cores.


Throwing more silicon at it, like this, sounds extremely expensive or price-inefficient.

It's at least two separate chips combined together. That makes more sense, mitigates the problem.


The power leveling in this chip's naming scheme can rival Dragon Ball Z.


How many CUDA cores? It's over ninethousaaaaa .. oh wait nevermind!


My first thought was "does it include an LSD subscription?".


I for one am holding out for the ULTRA GIGA chips.


the M1-O9000


I wish HN was like this more often


I really hope it won't. Let's cherish the high quality comments of HN. Once this comment section becomes a karma-fed race to the bottom driven by who can make the most memeable jokes, it will never recover. Case in point: Reddit.


I appreciate that it's infrequent. Sure, it's fun to blow off some steam and have a laugh, but that's fundamentally not what this place is about. Confining it to Apple release threads makes it more of a purge scenario.


I don't.

There's already Reddit if you want to crack puns and farm karma. Let's try to keep the signal:noise ratio higher here.


Be the comments you want to see in the HN


Would be interesting to get more info on the neural engine. On one hand, I find it fascinating that major manufacturers are now putting neural architectures into mainstream hardware.

On the other hand I wonder what exactly it can do. To what degree are you tied into a specific neural architecture (eg recurrent vs convolutional), what APIs are available for training it, if it's even meant to be used that way (not just by Apple-provided featues lke FaceID)?


It's a general purpose accelerator. You have coremltools[1] to convert your trained model into a format or you can make your own using CreateML[2].

[1] https://coremltools.readme.io/docs

[2] https://developer.apple.com/machine-learning/create-ml/



Typical "neural engines" are intended for real-time network inference, not training. Training is highly parallel and benefits more from GPU-like vector processing.


Apple is pushing for training & fine-tuning on the devices too.

https://developer.apple.com/documentation/coreml/model_custo...


I'm cross posting a question I had from the Mac Studio thread (currently unanswered).

----

Mac Pro scale up?

How is this going to scale up to a Mac Pro, especially related to RAM?

The Ultra caps at 128 GB of RAM (which isn't much for video editing, especially given that the GPU uses the system RAM). Today's Mac Pro goes up to 1.5TB (and has dedicated video RAM above this).

If the Mac Pro is say, 4 Ultra's stacked together - that means the new Mac Pro will be capped at 512GB of RAM. Would Apple stack 12 Ultra's together to get to 1.5TB of RAM? Seems unlikely.


I think some of this can be guessed from the SoC codenames

https://en.wikipedia.org/wiki/List_of_Apple_codenames

M1 Max is Jade C-Die => 64GB

M1 Ultra is Jade 2C-Die => 128GB

There is a still unreleased SoC called Jade 4C-Die =>256GB

So I think that's the most we'll see this generation, unless they somehow add (much slower) slotted RAM

If they were to double the max RAM on M2 Pro/Max (Rhodes Chop / Rhodes 1C), which doesn't seem unreasonable, that would mean 512GB RAM on the 4C-Die version, which would be enough for _most_ Mac Pro users.

Perhaps Apple is thinking that anyone who needs more than half a Terabyte of RAM should just offload the work to some other computer somewhere else for the time being.

I do think it's a shame that in some ways the absolute high-end will be worse than before, but I also wonder how many 1.5TB Mac Pros they actually sold.


How is slotted RAM slower? 6400Mhz DIMM exists. This would match the specs of the RAM on the M1 Max. Even octa-channel has been done before so the memory bus would have the exact same width, latency and clock frequency.


The memory bandwidth of the M1 Max is 400 GB/s with 64GB of RAM, where as the memory bandwidth of Corsair's 6400MHz DDR5 32GB RAM module is 51GB/s per stick, or 102GB/s for the M1 Max equivalent.


51GB/s * 8 (octa-channel, not dual channel as you are calculating) is 408 GB/s. Basically the same as the M1 Max. It's not fair to use an off the shelf product since even if the RAM is slotted Apple wouldn't use an off the shelf product.

Whether they use slotted RAM or not has nothing to do with performance. It's a design choice. For the mobile processors it makes total sense to save space. But for the Mac pro they might as well use slotted RAM. Unless they go for HBM which does offer superior performance.


Is 8 channel RAM doable, are there downsides? If no to both, why don't high end x86 processors have it?


High-end x86 do have it. Threadripper 3995WX for example.


Note that those are overclocked out of spec configurations today.

https://ark.intel.com/content/www/us/en/ark/products/134599/...

4800 MT/s is the actual maximum spec, anything beyond that is OC.


Agreed, I think they will use the 4C config to debut M2 and make a slash. They said in the keynote that M1 Ultra completes the M1 family. Timing works out well for November launch with the 2 year apple silicon transition timeline they gave themselves. Not sure what they are going to call it and if it will be A15 or A16 based.

A16 would give great performance, and I think it’s safe for them to have a two year iteration time on laptop/desktops vs one year for phone/tablet.


Hard to believe it’s already been almost 1.5 years!


I believe using Optane as fast swap could be great way to increase RAM capacity. Not completely fair, but enough for most usages.

It is strange Apple didn't cooperate with Intel in this area.


This went by so fast I'm not sure I heard it right, but I believe the announcer for the Ultra said it the last in the M1 lineup.

They just can't ship a Mac Pro without expansion in the normal sense, my guess is that the M2 will combine the unified memory architecture with expansion busses.

Which sounds gnarly, and I don't blame them for punting on that for the first generation of M class processors.


This is what I've been thinking as well, a M2 in a Mac Pro with 128/256gb soldered and up to 2TB 8 channel DDR5-6400 expandable, and do a tiered memory cache


A few points to make...

- the shared CPU+GPU RAM doesn't necessarily mean the GPU has to eat up system RAM when in use, because it can share addressing. So whereas the current Mac pro would require two copies of data (CPU+GPU) the new Mac studio can have one. Theoretically.

- they do have very significant video decoder blocks. That means that you may use less RAM than without since you can keep frames compressed in flight


Also, the memory model is quite different - with the ultra-fast SSD and ultra-fast on-die RAM. You can get away with significantly less RAM for the same tasks, not just because of de-duplication but because data comes in so quickly from the SSD that paging isn't nearly the hit it is on say an Intel based Mac.

I'd expect it to work more like a game console, streaming in content from the SSD to working memory on the fly, processing it with the CPU and video decode blocks, and insta-sharing it with the GPU via common address space.

All that is to say, where you needed 1.5TB of RAM on a Xeon, the architectural changes on Apple Silicon likely mean you can get away with far less and still wind up performing better.

The "GHz myth" is dead, long live the "GB myth."


> ultra-fast on-die RAM

The RAM is not on die. It’s just soldered on top of the SoC package.

> All that is to say, where you needed 1.5TB of RAM on a Xeon, the architectural changes on Apple Silicon likely mean you can get away with far less and still wind up performing better.

No, it does not. You might save a bit, but most of what you save is the transfers, because moving data from the CPU to the GPU is just sending a pointer over through the graphics API, instead of needing to actually copy the data over to the GPU’s memory. In the latter case, unless you still need it afterwards you can then drop the buffer from the CPU.

You do have some gains as you move buffer ownership back and forth instead of needing a copy in each physical memory, but if you needed 1.5TB physical before… you won’t really need much less after. You’ll probably save a fraction, possibly even a large one, but not “2/3rd” large, that’s just not sensible.


I agree, I got a 8GB mac mini (was really just curious to see the M1 in action and the 16GB model was backordered badly) and it performs extremely well memory wise. I often use 12GB+ and I never notice unless I check activity monitor.


That's what I based my assessment on too. I had an 8GB M1 MacBook Pro (swapped for the 16GB M1 Pro MacBook Pro 14 since) and I had no issues developing iOS and macOS apps with it using Xcode, and Rust development with VSCode. This brought my old 16GB Intel MacBook Pro to its knees.


Another thing to consider is memory compression. If Apple added dedicated hardware for that, it can effectively double the total memory with minimal performance hit.


Memory compression only works in certain scenarios. It requires your memory to actually have low entropy.


This is myth.


The Mac Pro will have replaceable RAM. It will use the RAM soldered onto the CPU as cache.

You’ll most likely also be able to buy dedicated GPUs/ML booster addon Cards and the likes for it.

It’s the most likely thing to happen or they won’t release another Mac Pro.


Why would they use soldered RAM as cache? It's not like it's faster then replaceable RAM. Unless they go HBM2 but I doubt that.


The bandwidth of the soldered ram is much higher which makes it much faster for code that accesses a lot of RAM like video editors.


I think they will unveil M2 which can probably at least double the 64GB max to 128GB max RAM of M1-series.

Then, on the highest configuration, I think they actually can put 6 M2-top-specced or more into the Mac Pro.


Does the mac studio potentially replace the mac pro concept? It seems targeted at exactly the audience that mac pros targeted (ridiculous amounts of video simul-editing)


No this looks like a modular replacement of the iMac Pro. If it was to replace the Mac Pro they wouldn't have said at the end of the event "the Mac Pro will have to wait until next time".


To me, this seems to have killed the iMac Pro not the Mac Pro.


The presentor very explicitly said they are not done and they will replace the Mac Pro.

But yes, I see a lot of folks replacing current Mac Pros with Studios.


The pro is most likely going to have ram and PCIe slots.


In some ways I wish this processor was available from a CPU chip seller. As a compute engine it gets a lot "right" (in my opinion) and would be fun to hack on.

That said, the idea that USB C/Thunderbolt is the new PCIe bus has some merit. I have yet to find someone who makes a peripheral card cage that is fed by USBC/TB but there are of course standalone GPUs.



Thanks! Of course they are bit GPU centric but the idea is there.

Very interesting stuff. I wonder both if the Zynq Ultrascale RFSOC PCIe card would work in that chassis and if I could get register level access out of MacOS.


Yes, you can interface with PCIe devices using DriverKit, Apple's new user-space device driver platform.

No need to run inside the kernel for these things any more.

https://developer.apple.com/documentation/driverkit

https://developer.apple.com/documentation/pcidriverkit


> That said, the idea that USB C/Thunderbolt is the new PCIe bus has some merit. I have yet to find someone who makes a peripheral card cage that is fed by USBC/TB but there are of course standalone GPUs.

I hope we get closer to that long-standing dream over the next few years.

But right now you can see laptop manufacturers so desperate to avoid thunderbolt bottlenecks that they make their own custom PCIe ports.

For the longest time, thunderbolt ports were artificially limited to less than 3 lanes of PCIe 3.0 bandwidth, and even now the max is 4 lanes.


> USB C/Thunderbolt is the new PCIe bus

Oh please hell no.

I have to unplug and plug my USB-C camera at least once a day because it gets de-enumerated very randomly. Using the best cables I can get my hands on.

File transfers to/from USB-C hard drives suddenly stop mid-transfer and corrupt the file system.

Don't ask me why, I'm just reporting my experiences, this is the reality of my life that UX researchers don't see because they haven't sent me an e-mail and surveyed me.

Never had such problems with PCIe.


Friendly reminder that USB-C is a form factor, and thunderbolt is the actual transfer protocol.

Sounds like you're listing the common complaints with usb-3 over usb-c peripherals, which are not a suitable replacement for PCIe. Thunderbolt is something different, more powerful & more reliable.


Thunderbolt is more reliable, but still weirdly unstable on the software support sometimes. I don't get why we still have external monitor detection issues on macs at this day and age for instance.


Thunderbolt 3 is USB4


USB4 is the successor to USB 3.2 and TB3.


My USB-C dongle (AMD processor, so not Thunderbolt) that has PD plugged into it permanently and is my "docking station" for the office, and I have to cycle its power (unplug/plug PD) to get the DisplayPort monitor that's connected to it to work, on top of the fact that there are other issues with it, especially with external drives as you also reported.

So, I'm in total agreement.


You have a very exotic configuration if you plugged your webcam and thumb drives into PCIe slots.


Webcam are often positioned as cheapish accessories so PCIe is exotic, but for internal drives (parent's comment wasn't about "thumb" drives) it's pretty mainstream: https://www.newegg.com/p/pl?d=pcie+drive

TBH, even a thumb drive would have me pissed if it disconnected at random times. That's what I hated about using the SD slot of MacBooks to host a semi-permanent drive.


An eGPU enclosure is exactly what you're describing - PCIe x16 over TB4. They're quite commonplace now


No, TB4 is PCIe x4, unchanged since TB3.


> This enables M1 Ultra to behave and be recognized by software as one chip, so developers don’t need to rewrite code to take advantage of its performance. There’s never been anything like it.

Since when did the average developer care about how many sockets a mobo has...?

Surely you still have to carefully pin processes and reason about memory access patterns if you want maximum performance.


Not sure what is required of a dev, but as an example, Adobe Premiere pro doesn't take any advantage of >1 CPU, at least on Windows. https://www.pugetsystems.com/labs/articles/Should-you-use-a-...


It’s probably not “average developer” either but some of the big box software still has per-socket licensing, or had until recently anyway.


CPU as in core or socket? These days most CPUs are "many-CPU-cores-in-1-socket" and having X CPU cores over 1 or 2 sockets make a small difference, but software does not care about sockets.


Plenty of enterprise software is licensed on a per-socket basis.


And if they read this press release they will probably try to switch to per-core licensing.


Article is 5 years old.


I thought that to extract peak performance out of NUMA based systems, you had to get down-and-dirty with memory access & locality to ensure you don’t cross sockets for data thats stored in RAM attached to other CPUs.

Or am I out of date on NUMA systems?


This is what they were referring to. To get optimum performance out of NUMA systems, you need to be careful about memory allocation and usage to maximize the proportion of your accesses that are local to the NUMA domain where the code is running. Apple's answer here is essentially "we made the link between NUMA domains have such high bandwidth, you don't even have to think about this."


The big dies these days (M1 included) have non-uniform memory access baked in because they distribute the memory caches. If you want maximum performance, you will certainly want to be aware of which "performance core" you're running in.


This line is nonsense and you can safely ignore it. There have been multi-chip-modules that act like a single socket for many years. In particular, pretty much every current AMD CPU works that way. I guarantee you that for the M1 Ultra, just like every CPU before it, the abstraction will be leaky. Programmers will still care about the interconnect when eking out the last few percent of performance.

Remember the Pentium D? Unfortunately, I used to own one.


The existing AMD CPUs aren't quite like that. Technically they are all UMA, not NUMA - the L3 cache is distributed, but they are all behind a single memory controller with consistent latencies to all cores. But the Threadripper 1st gen was absolutely like that. Straight up 2+ CPUs connected via infinity fabric pretending to be a single CPU. So is that 56 core Xeon that Intel was bragging about for a while there until the 64 core Epycs & Threadrippers embarrassed the hell out of it.


>Technically they are all UMA, not NUMA - the L3 cache is distributed, but they are all behind a single memory controller with consistent latencies to all cores.

This stuff rapidly starts to make my head spin. I have not studied interconnects and have never written any NUMA-aware software. I will just post this link (read the "Memory Latency" section):

https://www.anandtech.com/show/16529/amd-epyc-milan-review/4

As I understand it, the I/O die is partitioned into four quadrants. Each quadrant has two memory controllers and is attached to two compute dies. CPUs can access memory attached to the same quadrant with lower latency than going to another quadrant. This is a NUMA system that can be configured to appear as one logical NUMA node.

I believe their smaller parts with two or fewer compute dies will be UMA, but with the same non-uniform latency to L3.

>So is that 56 core Xeon that Intel was bragging about for a while there until the 64 core Epycs & Threadrippers embarrassed the hell out of it.

I believe the 64-core Epycs and Threadrippers came first. The 56-core Xeon was a purpose-built part for HPC, so it wasn't quite a marketing gimmick.


They are referring to the GPU part of the chip. There are two separate GPU complexes on the die but from the software point of view, it is a single large GPU.


For any applications people use to justify buying a more than one socket machine needs this.

Eg simulation softwares often used in the industry (but the one I’ve on top of my head is Windows only.)

Anyway, the point the make is this: if you claim doubling performance, but only the selected few softwares as you observed would be optimized to take advantage of this extra performance, then this is mostly useless to the average consumer. So their point is made exactly with your observation in mind, that all your softwares is benefiting from it.

But actually their statement is obviously wrong for people in the business—this is still NUMA and your software should be NUMA aware to be really squeezing the last bit of performance. It just degrades more gracefully to non optimized code.


I think its more the case that OSX hasn't really had SMP/numa for a long time.

My understanding was the the dustbin was designed with one big processor because SMP/numa was a massive pain in the arse for the kernel devs at the time so it was easier to just drop it and not worry.


The price alone is turning me back into a Linux user after 20 years. I simply cannot justify 6800 Swiss Francs (neighborhood of 7000 Euros or USD) for the max CPU/RAM and a 2TB SSD, and I cannot justify getting any less because it's soldered-down and not in any way upgradeable or repairable. Not to mention, even with AppleCare+ it's only guaranteed to work for 3 years (in Europe, I know you Americans can get long-term AppleCare subscriptions, but we don't have that option here).

This is a tragedy for the future of computing. It might as well be encased in resin. Great performance, but I won't spend car money on something I can't upgrade or repair.


Curious about the naming here, terms like "pro" "pro max" "max", "ultra" (hopefully there's no "pro ultra" and "ultra max" in the future) is very confusing and hard to know which one is more powerful than which, or if it's a power-level relationship. Is this on purpose or it's just bad naming? Is there example of good naming for this kind of situation?


It probably is on purpose. They all sound positive, full of goodness and speeds and synergies for today’s hip intelligentsia (which could be you!). You didn't settle for a MacBook with a harvested midrange chip, no, that’s a pro under your fingertips. What does that mean? Doesn’t matter, it makes you feel good.

It works for them because most of their products have only one or maybe two choices. It would never fly for white box sales, but Apple is not in that market.


I think the naming is based entirely around how the announcement sentence lands in the keynote. So they've optimized for "We're adding one last chip to the M1 family, and it's gonna blow your mind... (M1 Ultra appears on screen)".


As long as Asahi Linux is in good working order by the time the M3 Plus Ultra: Max Quantum Pro releases, I'll get one despite the name.


Didn't AMD do something similar with putting 2 CPU chips together with cache in-between? What's the difference here in packaging technology? (maybe there is no shared cache here)


They have been shipping multi die CPUs for quite a while, but the interconnect is closer to a PCIe connection (slower, longer range, less contacts).

Intels upcomming Saphire Rapid server CPUs are extremly similar, with wide connections between two close dies. Crossectional bandwith is in the same order of magnitude there.


This doesn’t seem to be two cores connected in standard SMP configuration, or with a shared cache between them. Apple claims there were like 10,000 connection points.

It sounds like this operates as if it was one giant physical chip, not two separate processors that can talk very fast.

I can’t wait to see benchmarks.


Modern SMP systems have NUMA behavior mostly not because of a lack of bandwidth but because of latency. At the speeds modern hardware operates at, the combination of distance, SerDes, and other transmission factors result in high latencies when you cross dies - this can't be ameliorated by massively increasing bandwidth via parallel lanes. For context, some server chips which have all the cores on a single die exhibit NUMA behavior purely because there's too many cores to all be physically close to each other geometrically (IIRC the first time I saw this was on an 18 core Xeon, with cores that themselves were a good bit smaller than these).

It's probably best to think of this chip as an extremely fast double socket SMP where the two sockets have much lower latency than normal. Software written with that in mind or multiple programs operating fully independent of each other will be able to take massive advantage of this, but most parallel code written for single socket systems will experience reduced gains or even potential losses depending on their parallelism model.


AMD is currently shipping high-end CPUs built with up to nine dies. Their ordinary desktop parts have up to three. They are not built with "cache in-between". There is one special I/O die but it does not contain any cache. Each compute die contains its own cache.


I think you're referring to AMD's 3D V-Cache, which is already out in their Epyc "Milan X" lineup and forthcoming Ryzen 5800X3D. https://www.amd.com/en/campaigns/3d-v-cache

Whereas AMD's solution is focused on increasing the cache size (hence the 3D stacking), Apple here seems to be connecting the 2 M1 Max chips more tightly. It's actually more reminiscent of AMD's Infinity Fabric interconnect architecture. https://en.wikichip.org/wiki/amd/infinity_fabric

The interesting part for this M1 Ultra is that Apple opted to connect 2 existing chips, rather than design a new one altogether. Very likely the reason is cost - this M1 Ultra will be a low volume part, as will be future iterations of it. The other approach would've been to design a motherboard that sockets 2 chips, which seems would've been cheaper/faster than this - albeit at expense of performance. But they've designed a new "socket" anyway due to this new chip's much bigger footprint.


Yes. AMD has had integrated CPU+GPU+cache-coherent HBM for a while. You can't buy these parts as a consumer though. And they're probably priced north of 20k$/each at volume, with the usual healthy enterprise-quality margins.


Given how low the power consumption is for the power you get, I wonder if we'll see a new push for Mac servers. In an age where reducing power consumption in the datacenter is an advantage, it seems like it would make a lot of sense.


I don’t think this will be the case until they’re more right to repair friendly


Most people don't want to repair, they want to replace. What people really want is a modular computer, the way things used to be. Right to repair is going to force legislation for schematics, at best. Even then I don't think it's going to happen. Like it or not, Apple builds integrated products now. You are the old man grumbling about the iMac not including a floppy drive.


I'm not complaining; I generally agree with you. But (I believe) for people to invest in Macs for servers they need to be able to fix and replace them faster than dealing with Apple would allow.


Honestly CPUs and memory fail very rarely. If the storage was on standard m.2 sticks and maybe the I/O was on a replaceable module in case Ethernet fails… would that be enough to assuage most server concerns?

I don’t see Apple getting back into that business. But I think they have the ability to make a good option if they want.


Given the hyperscalers' tendency to just leave failed machines in place in the rack because it's not worth the money to even swap it out, for specifically cloud type stuff the price point and density may be more relevant than the repairability.


At what point Apple will put those chips in their servers or sell server chips? It only makes sense for them to take this architecture to the cloud deployments


As more developers move to ARM architecture by buying Macbooks (I did it last year the first time in my life), ARM cloud will grow very fast, and Apple needs growth, so they can't afford not to do it in a few years (probably with M2 architecture they are already thinking of it). Regarding the exact timeline: I don't know :)


They’d have to go all-in on supporting third party OSs like Linux first. Sure, there are projects to bring linux to the M1, but enterprises that buy commercial server hardware will demand 1st party support


Knowing Apple, their version of "cloud" servers would probably be some sort of SDK that lets developers build applications on top of their hardware / software stack, and charge per usage. Kind of like Firebase, but with Apple's stack.


It will be a hard business decision for them, as at this point it's extremely hard to compete with Amazon, Google and Microsoft. Maybe they will buy up some cloud services provider, we'll see.


The major Linux providers already offer 1st party supported Linux on AWS. Both RHEL and Ubuntu instances offer support contracts from their respective companies, as well as Amazon Linux from AWS themselves. It is already here and a big force there. You can provision ElastiCache and RDS Graviton instances too.


Yes but on architectures that are already supported outside of AWS. It's not about running Linux in AWS, it's about running Linux on the chosen chip.


They likely won't re-visit the Xserve, IMO. No reason to. They can't sell them at a premium compared to peers and its outside their area of expertise.


> They can't sell them at a premium compared to peers

Intel is believed to have pretty good margins on their server CPUs

> and its outside their area of expertise.

That's what people used to say about Apple doing CPUs in-house.


FWIW, Apple's been helping define CPU specs in-house since the 90s. They were part of an alliance with Motorola and IBM to make PowerPC, and bought a substantial part of ARM and did a joint venture to make Newton's CPU. And they've done a bunch of architecture jumps, from 6502 to 68k to PPC to Intel to A-series.

Folks who said CPUs weren't their core expertise (I assume back in 2010 or before, prior to A4) missed out on just how involved they've historically been, what it takes to get involved, the role of fabs and off the shelf IP to gradually build expertise, and what benefits were possible when building silicon and software toward common purpose.


I don't know, the performance per watt has a big effect on data-centers, both in power budget and HVAC for cooling.


I doubt they'll go after the web server market. But I wonder if they might go after the sort of rendering farms that animation studios like Pixar use. Those guys are willing to pay silly money for hardware, and are market Apple has a long history with.


To be fair, that exact criticism has been leveled against them multiple times before and been proven wrong.


It doesn't make much sense to me. The M1 is designed to have memory in the same package as the processor. This leads to reduced latency and increased bandwidth. Moving to off-package memory might totally destroy its performance, and there is an upper limit on how much memory can go in the package.

The M1 Ultra is already a little light on memory for its price and processing power; it would have much too little memory for a cloud host.


They could be secretly working on their own cloud platform, with their data centres having a choice between M1, Pro Max ultra instances. $$$$


For a company betting so heavily on “services,” it would be borderline incompetence if they weren’t working on this. Even just for internal use it would still be a better investment than the stupid car.


Don’t they already offer Xcode build as a service? That presumably is using Mac servers so it wouldn’t be totally out of the blue to have more Mac SaaS.


It's going to be a couple years. The guys who bought those power workstations and servers will be very peeved if it happens too quickly


It makes sense, although what I am concerned about is the cost. Apple isn't exactly know for providing services at or near cost.


I bet someone will make a bracket to hold a bunch of Mac Studios in a rack.


Unfortunately I still don't think the market has much interest in energy-efficient servers. But maybe the energy-sector crunch created by Putin's war will precipitate some change here...


Energy is probably the biggest bill for a data centre.

Lower TDP = lower electric bills and lower airconditioning bill. Win win


So then, when will common ML frameworks work on Apple? I guess compiled Tensorflow works with some plugins or whatever, where afaik performance is still subpar. Apple emphasizes that they have this many Tensorcores... but unfortunately to use them one has to roll one's own framework on what Swift or something. I am sure it gets better soon.


Apple have some serious chip design abilities. Imagine if they entered the server market, with this architecture it could be very successful.


They tried that before and flopped.

The server market is different. Companies buy servers from the low bidder. Apple has never really played in that market.


People care about performance per watt now. So they could compete. The real question is if they would support Linux. In our containerized world I can’t see their servers getting super big running macOS


The reason they dominate at PPW, is because they are on TSMCs 5nm process. No one else has made a cpu chip on this process yet. AMD are scheduled for later this year (they are currently using 7nm).

It will be interesting to see the difference in performance and performance per watt, when both companies are on the same node.


No, it's hetrogeneity, system integration, pipeline width and instruction set as well.


Well, that’s a relief then, all any data center needs to run these days is Apple’s wide-ranging suite of in house services, such as applache, dovewin, pOSXgres, and SAPple.


In the context of the server market of course you're right. But the idea that it's a given that AMD, Intel have similar PPW chips on the way still using the x86 architecture, is not correct. There's going to be more to it than node size.


Apple's older 7nm chips still handily beat out Zen on the same 7nm process in performance by watt (and it's not even close).


Arm I believe helps a bit as well.


That was before people cared about performance per watt.

Besides for some use cases, these Mac Studios will be racked and in data centers as is.


This is pretty true. While people who buy racks consider vertical improvements they tend to think laterally and how easy is it to expand (aka how cheap is the +1 server)


Companies also want to know that the server OS is going to be a long-term play. Apple would have a long way to go if they wanted to try that again.


My guess is they will launch their own cloud platform powered by Apple Silicon.

Click & deploy from within Xcode (I hate Xcode though.)


Are the neural engines actually used for anything, to anyone's knowledge?

Edit: Apparently in iPhones, they are used for FaceID.


Neural Engines may be used in CoreML models. I don't know it can be used with Apple's BNNS library [1]. You can use with TensorFlow Lite with coreML delegate as well [2]. And some tried to reverse engineer it and used it for model training [3].

[1] https://developer.apple.com/documentation/accelerate/bnns

[2] https://www.tensorflow.org/lite/performance/coreml_delegate

[3] https://github.com/geohot/tinygrad#ane-support-broken


Adobe Photoshop, Premiere etc make use of it for scene detection, content aware fill, "neural filters" and so on



I think the most important use of the neural engines so far is for the internal camera postprocessing. Better camera postprocessing is the reason why people buy new iPhones.


Throw in translation, on device image labeling, stuff like on body walking/biking detection, voice recognition.


Adobe uses them Lightroom / Photoshop for some functions.

https://www.digitalcameraworld.com/news/apple-m1-chip-makes-...


Probably object tracking in videos will be the best use of them.

Or, there will be some new form of video generation (like the ones generating video from Deep Dream etc, but something aimed at studio production) using ML that wasn't practically usable before.

It opens many doors, but it will take at least many months, if not years, to see some new "kind" of software to emerge that efficiently makes use of them.


CoreML models may run on them to macOS's discretion. If you manage to get your neural network in CoreML's format you may use it.


Would something like huggingface transformers ever be able to support this? Or is it best fit to just use the GPU.


It's to build a giant distributed Aleph in which a preserved digitized Steve Jobs can live once again.


I thought it's also used to activate Siri by voice without any CPU usage


Or does any popular ML libraries support it ?


I’m wondering what this all means for the upcoming Mac Pro.

Apple mentioned that the M1 Ultra is the last member of the M1 family. So how is the Mac Pro going to scale?

Will Apple enable “traditional” scaling by allowing multiple M1 Ultra chips to be combined in one system? Or what?

Further, how will an Apple Silicon based Mac Pro be made expandable; something that has been a corner stone feature of Mac Pros in the past?


Pretty God damn impressive. I have a MacBook Air w/ the M1 and 16GB and it's more computer than I need for my work flows. Work gave me a MacBook Pro 16" M1 Max w/ a 10-Core CPU, 32-Core GPU, 64GB Unified Memory... it is a monster.

Intel's got a lot of work to do to catch up. I think the only way Intel will catch up is to completely embrace RISC-V


The very same dual-chiplet design marcan predicted - nice!


I read “Apple unveils MK ULTRA”


Low key what if this was planned to change Google results to "Did you mean M1 Ultra?" when searching for the experiment? The CIA is using all that money for something consumers can use now!

/takes off foil hat


They say this is the M1 Ultra Benchmark https://browser.geekbench.com/v5/cpu/13330272 Wow.


Single core does not appear any better than m1 mini.


That’s not too surprising, it’s the same base chip. The core designs are identical, there are just more of them and they may run at a slightly higher clock.

I wouldn’t expect single-threaded improvement until the M2.


What are people doing with these CPUs on a desktop? I'm just watching videos, surfing, and doing some programming -- on a 5 year old Ryzen with 32GB and it seems perfectly fine. For my productivity needs, performance is mostly about I/O speed and so switching to SSD and then NVME was the biggest boost for me (Linux at work).

When I get home, it's all about the GPU on my gaming PC (Windows). It's just that CPU just doesn't seem to be a huge bottleneck for me on the desktop anymore. Are Mac's different somehow where they need more CPU?


The Mac studio seems targeted at pro users doing things like video editing and CAD. The M1 regular is good but its insufficient in a lot of ways. It can only power one video output, it's capped at 16GB memory, and if you try to do anything sufficiently demanding like gaming, it really shows its limits.

If you are doing CAD, things like fluid/particle physics simulations can really slam the CPU. The M1 Ultra isn't marketed to the normal user just doing some web browsing. Its the top tier chip for people who find the M1 insufficient.


For compiling c++ code, this really matters. At home, I have a 16 core 5950x and at the office an 8 core 2700x, and it is a night and day difference. Even if it's just a few minutes of compile time, it creates a different mindset. At home: Yeah, let's just recompile and see the changes. At the office: Oh no I have to recompile, let's go to the kitchen and grab a coffee or so, I have the time.


> M1 Ultra features an extraordinarily powerful 20-core CPU with 16 high-performance cores and four high-efficiency cores. It delivers 90 percent higher multi-threaded performance than the fastest available 16-core PC desktop chip in the same power envelope.

Maybe not a huge caveat, as 16-core chips in the same power envelope probably covers most of what an average PC user is going to have, but there are 64-core Threadrippers out there available for a PC (putting aside that it's entirely possible to put a server motherboard and thus a server chip in a desktop PC case).


Is that Threadripper in anything like the "same power envelope"?


If I'm reading the graph in the press release right, M1 Ultra will have a TDP of 60W, right? A 3990X has a TDP of 280W. I know TDP != power draw, and that everyone calculates TDP differently, but looking purely at orders of magnitude, no, it's not even close.


That line is blatantly dishonest, but not for the reasons you pointed out. While the i9-12900K is a 16-core processor, it uses Intel's version of big.LITTLE. Eight of its 16 cores are relatively low performance 'E' cores. This means it has only half the performance cores of the M1 Ultra, yet it achieves 3/4 of the performance by Apple's own graphic.

Alder Lake has been repeatedly shown to outperform M1 core-per-core. The M1 Ultra is just way bigger. (And way more power efficient, which is a tremendous achievement for laptops but irrelevant for desktops.)


"in the same power envelope" is a pretty big caveat. Desktop chips aren't very optimized for power consumption.

I'd like to see the actual performance comparison.


I really wish popular companies would focus more on software optimization than hardware renovations. It's really sad to see how fast products die due to increasingly bloated and overly-complex software.


How does Apple manage to blow every other computing OEM out of the water? What's in the secret sauce of their company?

Is it great leadership? Top tier engineering talent? Lots of money? I simply don't understand.


If I had to guess, their secret sauce is that 1) they're paying lots of money to be on a chip fabrication node ahead of both AMD and Intel, 2) since their chip design is in-house, they don't have to pay the fat profit margin Intel and AMD want for their high-end processors and can therefore include what is effectively a more expensive processor in their systems for the same price, and 3) their engineering team is as good as AMD/Intel. Note that the first two have more to do with economics rather than engineering.