>in order to give the M1 Max some real competition, one has to skip laptop chips entirely and reach for not just high end desktop chips, but for server-class workstation hardware to really beat the M1 Max
Sadly this is not really interesting, it's disingenuous. He benchmarked against a ten year old (03/06/2012) server CPU. My two year old intel laptop cpu (i7-9750H) also outperforms the xeon's he's comparing against by almost 40%.
The M1 is a great chip, it's sad that this got published with a "server chip" comparison at all. A real server class CPU from the modern era, at a comparable price point, is the AMD EPYC 7443P, which is 400% faster on cpu bench.
Hello, author here! I'm aware that that the Xeon E5-2680 is ancient, which is why I also included comparisons with the Xeon W-3245 that is in the current 2019 Mac Pro, and with the Threadripper 3990X, which came out in February 2020.
Unfortunately I don't exactly have access to piles of fancy server chips sitting around, so I had to make do with just testing against what I have access to, and that's all I had access to. ¯\_(ツ)_/¯
Yeah, but you don't need to "reach for not just high end desktop chips, but for server-class workstation hardware to really beat the M1 Max". Low-market CPU's for desktop will like the Ryzen 5 3600 (newegg for 240$) surpass it on CPU bench. Likely they also surpass it on on raytracing as well. And chips like the AMD Ryzen 7 4800H, which is a "big laptop" CPU, also beat it. The M1 Pro 10 manages to best that, but not by a huge margin on passmark.
None of them beat an actual, modern, mid-market desktop CPU. And none of those compare to the likes of the threadripper.
The test results are amazing, the exposition here is simply inaccurate. Again, you don't need a "server-class highend workstation" CPU to best the M1 in raw performance. Nothing on the planet beats the m1 in tpd/performance. That's amazing enough on its own.
He also benchmarked it against the Intel Xeon W-3245 which is what that referred to. It's not disingenuous at all: the old Xeon is explicitly called out as old, which is why he compared it to the 2019 Xeon in the Mac Pro as well.
> My two year old intel laptop cpu (i7-9750H) also outperforms the xeon's he's comparing against by almost 40%.
Yeah, but for how long? My laptop has a respectable CPU as well but any kind of sustained load would melt it down if it didn't automatically throttle itself back to 2 GHz and below after like 5 seconds.
These M1 chips somehow manage to have great performance while also keeping temperatures and power draw low. I think this will change everything forever.
Cooling laptops just isn't going to happen. I've never had one with a thermal management system that didn't suck. My current laptop has 3 noisy ridiculously loud fans and it can't cool the processor enough to maintain a sustained load at maximum speed. All I have to do to watch it nearly melt itself down at 93 °C is run make all in some project's directory.
The test results are great. But you don't need a highend server CPU to best the m1 inraw performance. Most mid-level (around 300$) desktop cpus will do that just fine. The exposition here is simply... wrong.
The remarkable thing about the m1 is the tpd/performance it gets.
The 5950X is only 55% faster than the M1 Max on multi-threaded integer benchmarks. The M1 Max is even 26% faster than the 5950X on multi-threaded FP benchmarks (maybe it has two AMX units?).
The M1 Max is really in 5900X/5950X territory... in a laptop.
I agree on the pricing (I also have a 5900X and a 5950X machine). But it is fantastic that we can have that kind of performance in a mobile device and still have many hours of battery life.
We should complement both AMD and Apple and be happy that we finally have serious competitors to Intel. AMD has managed to compete with and outrun Intel from the an initial position of an also-ran underdog. I think the M1 line is more impressive technically than Ryzen, given the very low TDPs. But Apple has vastly more budget and much more opportunity for vertical integration.
Both companies have done impressive work the last few years.
In Geekbench Multi-Core, the 5950X is only 33% faster (at 16.6k) than the M1 Max (which benches around 12.5k). So the difference is actually smaller than in SPEC that Anandtech uses.
For all the dissing of Geekbench, I found it to actually correlate pretty closely with supposedly more elaborate benchmarks like SPEC. (The writing was on the wall for Apple Silicon performance for many years, when the iPhone/iPad Ax CPUs where catching up and surpassing laptop CPUs in Geekbench, but lots of people dismissed it because it was just Geekbench...).
The comparison should not even be close TBH. We're talking about a laptop chip that can run on battery for an extended period of time that performs as well and sometimes better than the high end consumer AMD desktop chip. Kudos to Apple.
I'm not suprised. Apple has the lead in the fabrication process, 5nm vs 7nm. I eagerly await a true apples to apples comparison when AMD uses the same 5nm process.
I'm definitely not an Apple fan but what's available on the market at any given moment is an apple to apple comparison of what companies are able to do. I mean: any modern army (or pary of) would crush Roman legions, but they had to do it 2000 years ago.
The article ends wondering about the impact on PC OEMs. I presume they are extrapolating the performance improvement curves, talking with Microsoft about Windows on ARM and working at HW contingency plans in case they have to leave a sinking x86 ship. I don't think they are resting on promises of big improvements from AMD and Intel.
If you could conceivably run a chip at a much higher TDP before hitting thermal limits you could get significantly more performance. Not that you can (probably) OC these chips at all but it suggests there may be more rabbits in the bag.
If you're plugged in all the time, lower TDP is nice but not critical. And if you live in a cold country, you have to heat up the house six months per year, TDP is just heating with a computing side effect.
>Except it is still direct electrical heating which is atrociously inefficient.
Electric heating converts practically all energy into heat, making it ~100% efficient. You can make statements about cost-effectiveness compared to burning things, but not all houses can.
CHP configurations are more common in colder climates with district heating, so their "waste" heat during generation often isn't wasted at all.
Which is why I covered them in my comment about CHP, which recoups a large portion of those "losses". Either way other power sources also require logistic challenges and/or big equipment installs to use, so it isn't exactly 1:1 comparison.
I feel this is a bit disingenuous, because using the same logic burning wood is thousands of % effective, or even ∞% if the system only uses convection, making heat pumps seem like a poor choice even when they're perfectly valid.
I'm in Quebec (Eastern Canada). Most of the electricity is produced from hydraulic power (dams) up north, while cities are in the southern part of the province. Most houses are electrically heated (especially those built after 1970). Production from water turbines is very efficient. Transmission losses are about 30%, because of the distance (> 1500km). Heating itself is 100% efficient, no moving parts, no maintenance. In this context, heating with a baseboard or CPU makes no difference.
Here in Sweden, roughly 50% of all small houses (ie one household) have a heat pump. Direct electricity heating is just somewhere around 15%.
In bigger houses direct electricity just isn't a thing, most have some sort of central heating, and lots have either some combustion or heat pump solution. The latter is gaining.
This winter I'm going to experiment with having a Raspberry Pi act as a thermostat that starts/stops containers running on a server in our basement. That, combined with the laptop we just got for gaming that has a 3070 in it, should do nicely to supplement our heating system.
Watts go in, Watts come out. So basically, find a space heater with the TDP of your server, and that’s the upper bound on what you can expect heating-wise.
I know that’s a bummer for people who want to heat their house with their computer, but Thermodynamics is always a bummer, I don’t make the Laws.
> So basically, find a space heater with the TDP of your server, and that’s the upper bound on what you can expect heating-wise.
I wonder how many CPUs I need to build a cryptocurrency miner water heater... That would be so much better than just wasting energy heating up dumb elements.
Modern heat pumps provide a lot more than 100% efficiency. I'd stick to the heating system unless those containers are doing something profitable (mining?)
Literally nothing provides 100% efficiency. You're conflating coefficient of performance with efficiency. They're not even close to the same thing, modern heat pumps reach their CoP because they don't actually generate heat, they simply move it around, which provides more heat indoors than if you had converted an equivalent amount of electricity directly into 100% heat.
Thermodynamics would not take kindly to you having a >99.999...% efficient anything.
> because they don't actually generate heat
Heatpumps still do use electricity(or other power), and all that electricity also ends up as heat. It's why heatpumps have higher CoP than the same system as a refrigeration cycle.
> Thermodynamics would not take kindly to you having a >99.999...% efficient anything
Well the cogen gas powerplants here can produce 50kWh of electricity from burning 100kWh of natural gas. I can use 50kWh of electricity to put 200kWh of heat into my house with a heatpump.
Seems like a good deal to me, and I think carnot would be fine with that.
As someone who lives in south-east Queensland, Australia, lower TDPs and thermal output is always welcome, at least for me. I adore my Ryzen 5600X/3060 Ti mini-ITX desktop, but mining on it makes my room annoyingly warm, and it's not even summer yet.
Was semi-useful in winter though, all 6 weeks of it...
A hot laptop can have significant impacts on fertility in men. And in the summer you have to dissipate that heat. A heat pump is more efficient for heating, to boot.
Ok, lets build an equivalent PC. These are prices in Germany (incl. VAT):
A 14' Macbook Pro (M1 Max with 32 GPU cores, 32 GB RAM, 512 GB storage) is 3440€.
A PC: 5950X (750€), 32GB RAM (150€), Mainboard (130€), WD SN850 500GB (90€). Now if you build from scratch you need a PSU, CPU cooler and a case (~200€ together).
That's 1320€ without a GPU. Depending on your workload the GPU performance seems to be between a desktop 3060 and 3080. So between 700€ and 1400€.
Tl;dr: A 3440€ Macbook competes with a 2000-2700€ desktop depending on your workload. The desktop has no peripherals and no USB4/TB4.
I wonder how much a comparable desktop monitor would add to that.
On the other hand I really want to give the keyboard a shot before I pull the trigger, after 3 years of Thinkpad I think that will be the real pain point...
Intel's Alder Lake benchmarks show that their new mobile processor is faster than the M1 Max. So, this domination doesn't seem to be a long term thing.
I would imagine if you were willing to buy a bicycle chain which had a several hundred dollar profit margin, then you would be able to find one rather quickly.
Supply chain chaos? I assume bike parts in the US come from overseas in shipping containers. One of my shifters broke in September and the local bike shop said they were out of parts and didn't expect to have any more until April 2022.
It is? Source please. I've heard, for availability, "before the end of the year" and "Q1 2022" for mobile Alder Lake, which is what we're talking about.
But did Intel use liquid nitrogen cooling to achieve this?
I am joking but just a bit. Intel has long history of dirty tricks when it comes to benchmarks - using liquid nitrogen cooling and not mentioning it, manipulating the compiler to generate slower code on competing CPUs, terms of service forbidding publication of benchmarks, fuzzy use (or not mentioning it at all) of TWP/energy usage.
What they say can only be used as an upper bound of how it actually is. The odds are is that it's much worse and with some strings attached as well.
The same Intel that faked the CPU performance in 2018 by overclocking it and using some insane cooling?
Intel is playing this PR game where they release / announce or leak info, so people might try and wait for what Intel has to offer. It's super obvious, weak, and disingenuous.
The geekbench benchmark is more CPU focused AFAIK, and the article we are discussing is talking about just how much the astonishing M1(.+) memory access bandwidth is improving their rendering time.
Apple doesn't have to score the absolute highest in performance. For better or worse, they need a credible high-end chip that they can marry with their proprietary OS and computer designs to win customers via a compelling package not some benchmark result.
After the M1*, intel is facing an uphill fight to keep other laptop manufacturers on x86. They may well pull together and come out stronger, but the last 10y haven't been stellar and Apple exposed that weakness.
So, will intel win back some of the performance benchmarks? Sure, no doubt. But will Dell win over Apple customers using the latest intel chips? Let's see...
Well be very interesting to see given Alder Lake is dumping AVX-512 which was one of their core performance strengths against AMD. Albeit at staggering cost in thermals and power.
Yeah the 12900HK beats the Max by 3%, even though the M1 Max is way more power efficient and likely utterly smoked it on GPU. After all it’s 8% faster than a mobile 3060.
The 3060 alone is a 60W TGP.
The M1 Max running Cinebench consumes 9W while the 11980HK (previous gen) consumes 45W. Perf/W Is about 3X higher for the M1 Max on CPU alone and I suspect factoring in graphics we’ve got a breakaway here.
Just FYI because I've seent that a few times already: A 60W/70W 3060 is -very slow-. There are many laptops that supply much more power and conversely also get better results.
That's a very good point. Do you thing running a 3060 at a higher TGP gets you the same perf/W? I'm genuinely curious where they have them set along the curve.
And it will probably devour a ridiculous amount of power and generate a thermos of heat to be marginally overall faster. It also won’t have a comparable GPU.
M1 Max still is the better experience and innovation.
If you don't require a laptop then an alder lake chip doesn't take anything close to a ridiculous amount of power, nor is it hard to cool.
Edit: I know this discussion is partly focused on laptops, but the overall comparison is including server chips so it's clearly not just about laptops. And 60 or 100 watts is child's play in even a tiny desktop.
When these chips hit the rackmount Mac Pro, the server competition will be very interesting. It's kind of a unique package for a server, which such a powerful GPU, but that has advantages for some compute types too of course.
The rackmount Mac Pro isn't really a server, it's just a Mac Pro in a rack mountable enclosure. It's 4U and kind of poorly laid out for a server. It helps if you need 10 or 20 or a 100 Mac Pros specifically and have to position them in an efficient manner but a proper server would be no more than 2U with a very compact layout and little wasted space inside so you can pack a rack as densely as possible.
>The second bit of design is in the optional “Afterburner” video editing card. The discussion of it starts at around the 1hr. 26 min. point in the Keynote. It is described as a custom FPGA-based HW accelerator. It is claimed to be capable of processing 6 billion pixels per second. The silicon is not Apple’s but the design is.
I’ve certainly seen in low volume specialty equipment (think things like Video Routers, and surprisingly some high end audio equipment) FPGAs being used where an ASIC would be a better fit. I suspect however this comes down to a business decision, that the time and money spent getting a run of an ASIC with the same logic put together may be better spent elsewhere.
The FPGA can be deployed and designed for today, while an ASIC isn't going to ready until after a few tape outs, and it's often a race to market. While it's really cool and actually works, the reality is that reconfiguring the hardware to do something different isn't a necessary feature in just about anything. There's still a huge market for FPGAs, but its already got a lot of active competition.
From what we’ve seen, the M1 design is quite modular. I don’t see a reason why they could not use 4 or 6 high-performance CPU clusters and a low-core GPU for a server part of they wanted. No GPU at all might be impractical because a lot of things in the OS might have come to expect one.
Anyway, it’s academical because I don’t see them doing that. Expect maybe to power their own datacenters, but then we’d never hear about it.
Well, considering how well the M1 variants are performing in the performance/watt category and this is one major cost factor for data centers, I would assume Apple by all means is going to use their own chips in the future. And then there is no good reason not to sell a tons of them to anyone who is willing to pay Apple prices for server hardware. (Which aren't that bad, compared to other server hardware prices)
But this might a bit in the future as they barely can make enough of them for the MacBooks. I even would assume that the launch of the other computers in the lineup has been put to a later time to be able to keep up at least a bit with demand.
Some are actually hosted in a cloud provider like Google, Microsoft, or AWS.
But they also have a fair amount of their own physical data centers, and I would not be surprised to find that they start filling them out with Apple Silicon hardware.
And they might roll out a lot of services on AWS using Apple Silicon hardware, because AWS has recently made available actual Apple hardware in some of their regions. I think this could potentially be a precursor, but this is just my own personal hunch and I have no information outside my own brain towards this end.
Can you name those laptop Xeon CPUs that beat the pants off the M1 Max?
[Spoiler because I don't think I'll get a response -- there are none. Even when you get into the "luggable" category of workstation that is ostensibly portable but really needs to be plugged in, there is no competition right now. The upcoming Alder Lake should significantly improve Intel's entrant in this category, and hopefully brings some real competition]
The upcoming Thinkpad X1 Extreme is going to give it some stiff competition. It's wielding the insurmountable RTX 3080, and it's priced very competitively.
But I'm just going to tell it to you now so we don't make the same mistake we have for the past 10 years of computer hardware discussions: specs don't matter. You could tell 90% of the people buying PCs with dGPUs about your 5nm GPU and next-gen power efficiency, but they won't care. They're buying them as gaming devices, general-purpose machines and game development laptops. I'd argue the market for Mac users and PC users has not radically shifted, just the hardware you're using. If we're here to talk smack about hardware superiority, this website would have been insufferable for the past decade, because there was quite literally a complete lack of professional dGPU Macs. Now that the tables have shifted slightly, I don't see why Mac users feel the need to crawl out of the woodwork and declare the game as changed, now that Prometheus gave them the gift of a laptop that doesn't throttle to hell.
People will still buy all sorts of computers. Lots of options will appeal to different people. Intel is finally being forced to actually complete. It's all good. My M1 Mac (not even M1 Pro/Max) is quite easily the best computer I've ever owned, but I have zero need to proselytize and simply do not care what you or anyone else use.
That doesn't change the fact that the above claim about "laptop Xeon chips" beating the pants off the M1 Max is delusional nonsense.
I have to comment on the RTX 3080 bit: I have used many PC laptops over my career, and currently have a Lenova with a fat, barnburner Nvidia dGPU. The GPU is literally never used, because the moment it engages my battery life falls to cartoonish levels (somewhere in the range of 40 minutes), the laptop becomes a space heater, and the fans turn into jet engines. This is the sort of "spec chasing" that the industry is addicted to, providing absurd, completely unreasonable solutions just so someone can boast. One of the things about Apple, quite contrary to your claim, is that they don't do that. When they provide something, it is meaningfully usable and useful 100% of the time.
It's using DDR4 versus DDR5 in the Mac and a fraction of the memory bandwidth so will be interesting to see its performance on heavier tasks. Also looks like it has at least half the battery life if not far less when put under heavy load.
I do love this review though:
"It is a nice laptop but extremely noisy. Even when idle the fans are on all the time."
Memory bandwidth is nothing for most workloads. Besides a scant few Geekbench figures, I have genuinely never encountered a workload that was bottlenecked by my ability to transfer assets to the GPU. Is 100gbps of PCIe bandwidth not enough for your needs?
That's not true if you're actually using the ram though.
I remember memory bandwidth being the bottleneck when running large(ish) datasets for game worlds. It was so much that we put a lot of pressure on google cloud because we worried they wouldn't be able to compete with bare metal (since it's not usually measured, reported and can be non-guaranteed when you have neighbours).
xeon branded workstation laptops aren't using Icelake-SP or similar server chips, they are using Tiger Lake-S or Ice Lake-S or Skylake-S client chips.
They are what would previously have been branded as "Xeon E3" series chips - on the desktop platform they used to share a socket and be drop-in upgrades with consumer desktop chips, because they're basically the same chips with "enterprise" features like ECC turned on.
An example would be Xeon E3-1285 v3 - which is basically the same thing as an i7 4770.
These products are nowhere, nowhere near the M1 Max. They are consumer laptop chips with ECC and vPro turned on.
Yes, your Xeon from 2013 example would be nowhere near the 2021 CPU from Apple. The outdated Xeon Apple had in their old laptops - a year behind everyone else, is also slower. The Xeon W-11955M however makes the M1 look like a kid's toy. In fact, if you remove 2 cores from that Xeon, you'll have my 6-core Xeon. Which also smokes that 10-core M1 in a bong.
I'm also not sure why you're sarcastic about ECC RAM. I have 128GB of RAM in my laptop. If it wasn't ECC, I'd have crashes in my VMs and errors in my calculations. When you go 32GB+ and actually use the RAM, anything that doesn't support ECC cannot be taken seriously for professional use. Like the M1 Max.
The point is that a laptop Xeon from 2021 is also going to be the same as an 1185G7 or something similar - because they’re the same silicon. It’s not like you’re getting more silicon because it’s a Xeon, it’s not a server chip, it’s just a laptop chip with the enterprise features enabled.
So, really no need to test them specifically. Go get an 1185G7 or something and you know what “Mobile Xeon” benches will look like. Anandtech already did those benches.
Just to humor you I looked it up - your Xeon W-11955M is the same chip as a consumer 11980HK, which is one of the processors in Anandtech’s benchmark. Same cores, same cache, same clocks, slightly lower TDP limit - the consumer one has a 65W boost configuration. It is the same TGL-H die with (very slight) variations in what feature fuses are blown.
So based on Anandtech’s benchmark you can pretty much extrapolate how that is going to go - slightly lower performance and slightly higher efficiency due to TDP limiting clocks a bit, but same cache, same core configuration, etc mean that it’ll be identical to if you went into bios on a 11980HK and set a lower power limit.
By the way the 11980HK is specifically the chip they called out the M1 Max as being a factor-of-6 more power efficient than. And in fact they said exactly what I just said - that limiting TDP will reduce performance on the 11955M and the perf/W gap will close a bit, but the performance gap will get even wider.
> In multi-threaded tests, the 11980HK is clearly allowed to go to much higher power levels than the M1 Max, reaching package power levels of 80W, for 105-110W active wall power, significantly more than what the MacBook Pro here is drawing. The performance levels of the M1 Max are significantly higher than the Intel chip here, due to the much better scalability of the cores. The perf/W differences here are 4-6x in favour of the M1 Max, all whilst posting significantly better performance, meaning the perf/W at ISO-perf would be even higher than this.
> The m1 max TDP is projected to be 90W. The Xeon is 45W.
That article says that the M1 Max CPU TDP will be ~30W. The 90W figure is for the whole chip, which includes the GPU. The Xeon is just the CPU.
> here's that Xeon beating that 11090HK.
The CPUmark figure is 24092 for the Xeon, versus 23549 for the i9. That's a difference of about 2%. I suspect it's mostly down to statistical error. (Maybe it's also true that the Xeons tend to be put in systems whose other hardware is faster; I can see that going either way.)
Looking at the data in your second link, the performances differ by less than 3%. Looks like paulmd’s assertion that both are the same cpu core is fairly well supported by the data.
We've banned the other account, but you broke the site guidelines badly yourself in this thread, by feeding the flamewar and generally ignoring the rules. It's not ok to do that, regardless of how badly anyone else is behaving. Please read https://news.ycombinator.com/newsguidelines.html and stick to the rules in the future. Note this one:
"Don't feed egregious comments by replying; flag them instead."
And a Porshe is the same as a Kia, just with a few racing features.
>You don’t understand what you’re talking about
So you look at the graph I linked that shows the Xeon model is faster than the consumer I9 chip. And your brain says "the Xeon is slower." Then tell me I don't know what I'm talking about.
Tell me, do you ever stand in a lake and tell people you're dry? Are you in the lake to hide the pee? Do you tell people about the wonders of horse dewormer?
See, the only way your argument works is to compare the M1 to the I9. And despite the Xeon being faster than the I9, as literally and plainly shown in the tests I linked, you keep saying it's slower. ... But then you keep saying they're the same. Reality: it's faster, literally look at the numbers. Your mental gymnastics are amazing. I am not hostile. I have been getting great laughs and relaxation out of you. Yes, that comes at your expense.
> person who tried to explain it to you
By saying wet is dry. I get your explanation. I just see what my eyes tell me. You see blue, your brain says "nah, it's orange."
Just being pedantic, DDR5's implementation of on-die ECC is still not equivalent to the implementation on CPUs. It doesn't account for errors that occur during processing, and there's still a pretty significant chance of corruption in L1-3 caches.
One of my laptops is a Dell Precision 7760. Xeon W-11855M, NVIDIA RTX, 128GB ECC RAM. I don't need too much storage, but you can get it with 14TB if you want. Note mine is a 6-core. There is an 8-core available w/ the Xeon W-11955M, which is faster than mine.
It's only a little thicker than the macbook pro. It's keyboard doesn't break, and the product line has had a 4k screen since 5 years ago. It's 120Hz refresh rate. It has a very large power brick - 240W, it gets hot and loud with a huge fan exhaust. It's thick metal and about 7-9lb - you can run over it with a car. It only gets 9 hours battery w/ regular usage, and about 3 hours of "fan on time." Keep in mind, with the large and loud fan on, it can stay at 5GHz. This is called a pro laptop - a workstation. When I travel, I bring a 65W PSU, and it runs fine on that, just w/o turbo boost.
No, don't point to the lower "geekbench" score for this laptop - that's not a CPU test. GPU performance is a large part of that test, and they run the test on the default GPU. The M1 only has a single GPU. The Precision's default is the low power integrated graphics, not the discreet GPU. If you have a test where they assign the discreet GPU, please feel free to point it out.
As I've discussed before here, I have a shell script that runs in parallel with a bunch of VMs. My coworkers air (yes, I know it's not the max) runs it in 8-10 hours overnight. I run it over lunch. It loads, does calculations on, and creates graphs from several gig of ascii performance data.
What does compare in performance to the M1 air is my Latitude w/ the I7 in it. The M1 "max" is "max for apple" but competes with mid-tier laptops from everyone else. And that's ignoring the fact that it can't run pretty much any of the useful industry tools or games w/o recompiling x86 to arm on the fly.
Yes, I think my Dell cost the company $7-8k, without support. That's why it's called a pro laptop.
I feel like you have the setup necessary to produce an interesting, compelling argument here -- your benchmark seems, at least to some degree, less synthetic than some do, you have an uncommonly fast workstation which a lot of people doing comparison tests wouldn't be able to do...
... but the way you present it undermines your case a bit.
It seems like what you want to say is that if money is no object, weight is no object, heat is no object, battery life is no object, and portability is no object, but the comparison must absolutely be laptop to laptop, then there exists a laptop PC configuration that beats M1. If this is your point, then probably you want to compare an M1 Max to your PC, not your coworker's Macbook Air (which is a fanless laptop...)
I think this is a pretty unusual use case and there aren't too many people who are looking for this exact market segment. I definitely think it's fair to admit that Apple isn't intending to operate in this market segment, for better or worse.
You'd probably also want to drop the part about game support, since anyone who wants to play games can spend 1/4 what your workbench costs and get a shitkicking fast small form factor PC. But also, like, recompilation isn't what you should highlight -- what you should highlight is performance. If the recompilation is fast enough for users not to notice, then it doesn't matter, and if it's not, then the reason why it matters is performance, not recompilation.
Anyway, again, I don't think you're necessarily wrong or whatever here, but you're just presenting your point in a way that I think it's extremely unlikely anyone will care or be convinced.
“A little thicker” is a bit of an understatement. At its thickest point, it’s apparently 71% thicker than the thickest point of the 16-inch MacBook Pro (2021), and 55% thicker at its thinnest point compared to the thickest point of the Mac. That’s a huge difference for portable electronics.
It's also "sale" price of $6,700 with a 2TB HD and just a 6-core CPU. Going with the 8-core xeon bumps price to Bump it up to $6,900 and moving to 8TB of SSD goes up to $9,200 ($8,000 for the 4TB version).
The 2TB M1 macbook 16" is $4,300 and the 8TB, max-specced version is $6,100.
I know Apple has a bad rap for high prices, but that machine's prices make Apple prices look bargain basement. You could almost buy TWO M1 macs for the price of one 4TB Dell.
Not really, because the first Xeon CPU he compares against is ten years old. That kind of system isn't worth the cost of shipping it somewhere. A current generation desktop i3 is likely faster in almost any workload.
It's not even interesting that the M1 is "slightly" beaten by a Mac Pro's Xeon, because nobody is doing raytracing on CPUs.
A Pascal nvidia GPU will render scenes in Blender faster than a current mid-tier AMD CPU, and it's two generations old.
It takes an RTX 3080 card (not even the fastest consumer Ampere card) 11 seconds to render a scene that the fastest Pascal card (1080) took 94 seconds. That's nearly an order of magnitude faster.
I saw an article comparing the M1 Max to a high end AMD card, but that's also meh. AMDs's GPUs don't perform anywhere near as well as Ampere and they use far more power, so they're an easy target to compare against. AMD fakes their benchmarks by overclocking the cards depending on temperature, and the benchmarks they run to tout performance figures are just shy of where the card starts to throttle back for thermals. An AMD GPU that has been running for an hour will perform substantially worse compared to an Ampere card.
> It's not even interesting that the M1 is "slightly" beaten by a Mac Pro's Xeon, because _nobody is doing raytracing on CPUs._
This is backwards -
Nearly all major raytracing workloads today are done primarily on CPUs: vfx, animated film and tv, etc.
There are several reasons for this, including memory limits and the lack of coherence in both path tracing algorithms and common acceleration structures. GPU-based tracing is currently a small subset of rendering tasks (primarily those with real-time requirements).
As GPU memory increases, GPUs will become more attractive in this space - likely in the next few years.
Final renders may be done on CPU for maximum accuracy, but that's not what is being discussed. What's being discussed is desktop workstation rendering.
Mental Ray, heavily used in the industry for feature films: GPU accelerated since eleven years ago.
Pixar licensed a bunch of GPU rendering tech from NVIDIA in 2015, so that's six years ago.
Mental ray is no longer heavily used in the industry for feature films: it hasn't been since ~2013 (when Arnold came around).
After Effects doesn't really belong on this list?
Gazebo is a preview renderer explicitly designed for GPU use, it's not designed for accurate lighting or material look.
Manuka is not GPU accelerated (it used to be when first written, but it wasn't worth it given complexity of build setup for very little benefit, it's pure CPU now and has been for > 7 years).
Until the GPU renderers can run custom shaders (rather than just the stock shaders the renderers ship with), i.e. OSL supports full GPU support, high end VFX facilities aren't really using GPU renderers that much for final lookdev and lighting.
Smaller facilities which are happy using the stock shaders are though.
> Until the GPU renderers can run custom shaders (rather than just the stock shaders the renderers ship with), i.e. OSL supports full GPU support, high end VFX facilities aren't really using GPU renderers that much for final lookdev and lighting.
Surprised no one has mentioned Redshift Renderer yet; GPU-based, supports out of core for both geo and textures, and OSL.
Nevertheless, I agree that "nobody renders on CPU" is just patently false - one need only look at the Corona Renderer numbers to instantly disprove that.
I think you're confusing me for the other poster - that was the first time I've responded to you...
I haven't moved any goalposts.
Arnold has only been GPU accelerated for a few years - not since 2013. Same with Renderman - XPU only came out this year, and still doesn't support everything on the GPU, including custom shaders (like Arnold GPU doesn't).
> Mental ray is no longer heavily used in the industry for feature films
Mental Ray really isn't used anywhere. It sort of died like more than a decade ago.
Honestly, I used it back in 2003 in the VFX industry, but not since.
You have to be mistaken here. No one uses Mental Ray that I have heard of in production in the last decade. Probably someone does, but not anyone serious.
Mental Ray is used by literally nobody anymore. It was discontinued in 2017. You can't even get a license for Mental Ray anymore.
Pixar didn't license any particular special GPU rendering tech from NVIDIA in 2015; they're just using OptiX and CUDA like everyone else. Pixar released initial GPU support in RenderMan this year, but currently XPU only supports a subset of the full RenderMan functionality. Before that, the only GPU ray tracing they were using internally was on an extremely limited basis for early lookdev workflows. Final lookdev and all of lighting continues to be CPU-only. I likely know more about this particular usage than you do because... I worked on the early internal GPU ray tracing for lookdev stuff at Pixar.
Autodesk Arnold has initial GPU support, but it was only added in the past few years. Most studio usage limits Arnold GPU usage to lookdev; most desktop workflows for iterating continue to be CPU because again... not enough memory on GPUs. SPI Arnold has no GPU support at all. Yes, there are two Arnolds out there.
After Effects is not really used for high-end 3D rendering? Why is it on your list? Did you just do a Google search for "GPU renderer" and throw what you found into a list without actually understanding what each thing is?
V-Ray has had extensive GPU support for many years now, and it probably leading the pack among usage in film/vfx/etc houses. However, most film/vfx/etc places use V-Ray CPU for both iterating on the desktop and for final frames, due to GPUs not having enough memory. V-Ray GPU has seen wide adoption in the archviz world though.
Gazebo is an OpenGL viewer, used in DCC viewports. It's not even remotely close to a final frame renderer. Manuka does not have GPU support whatsoever; the Manuka development team literally published an entire ACM TOG paper on this topic which I guess you haven't checked.
I'm not entirely sure how you put your list together; everything on here is either out of date, misunderstood, or just flat out wrong, and all in ways that are immediately apparent to those of us who actually work in this industry.
Almost every Blockbuster film you watch has raytracing done on the CPU. A lot of studios have dabbled with GPU pathtracing, and indeed it works for smaller projects like commercials and TV. However big set pieces just don't jive well with GPU rendering, and most GPU path tracers are too young to feature all the bells and whistles productions want.
Nobody is doing realtime raytracing on CPUs. The author is talking about production ray tracing rendering working with scenes that won't fit in most workstation GPUs.
> There’s really no way to understate what a colossal achievement Apple’s M1 processor is; compared with almost every modern x86-64 processor in its class
On the other hand I'm sure there's more than a few chipheads out there who are saying "it's about time", there was a longstanding prediction that the arm architecture would overtake x86.
It is about time, although it is worth saying that a bunch of the chips here are actually basically ancient due to being released years ago, or being fairly old microarchitectures.
Hat's off to Apple, of course, but Intel were sort of a victim of their own success in that they nearly beat AMD to extinction over the last decade.
I also note that there's no comparison with a properly modern X86 (i.e. an AMD 5000 series). The Apple will still come up on top, most likely, but that Xeon did literally come out in 2012!
I thought the observation was always that instruction set was not such a big difference in high performance CPUs next to manufacturing technology which was by far the first order effect. It would be expected for a high performance ARM CPU to reach roughly the same performance in that case (AMD is 1 generation behind here, I think Intel is 2).
Is there really "chipheads" who are predicting ARM ISA to buck this trend and start pulling ahead at equivalent technology nodes? By what mechanism do they believe this will happen, do you know?
1. The instruction set isn't so much a performance thing as much as a thing that bites you with power usage (you need to have a big fat decoder on all the time in the worst case). The widest X86's can only dispatch 75% of the instruction per cycle but X86 instructions can do more, so you'd have to check a specific benchmark.
2. I want X86 to die more because it's fucking ugly than performance as per se. ARM is not a simple ISA, so although you won't be writing a disassembler in 10 minutes like RISC-V, aarch64 is still much less insane than X86 with all the extensions.
> The instruction set isn't so much a performance thing as much as a thing that bites you with power usage (you need to have a big fat decoder on all the time in the worst case).
Eh, it's overhead, but isn't massive overhead. Power usage is much more a function of how the chip was designed.
The vast majority of developers will never even see assembly for x86 or ARM. Compiler developers and those need to hand optimize code may care but for everyone else it's a black box.
The unwashed masses don't have to care and won't actually use the performance, I'm not that bothered whether they care or not, although it's worth saying that X86 is such a mess that hiding instructions at weird alignments is a valid obfuscation technique - i.e. it's not just aesthetics/performance.
I’m not part of the category you’re part of - but after learning 6502, Z80 and 68K, when it came time to look at the 80x86 I gave it a hard pass and learned C instead. I kinda assumed x86 would die a natural death. Wrong!
Sadly I have not programmed in assembly since, and I put it down to how ugly the ISA was.
https://microcorruption.com/login might be one place to start, but it's got a slightly different focus, but if you broadly want to learn lower-level stuff it's a place to start.
We'll get a live test of this very soon - Zen4 is going to be going head-to-head against Apple A16 (Apple's next core architecture) on TSMC N5P next year.
Does anyone expect x86 to close a factor-of-6 perf/watt difference? (from Anandtech's M1 Max preview) A factor-of-2-to-3 IPC difference? And that's just A15, not against the next-gen A16.
Node makes a big difference, it doesn't close up a factor-of-3 IPC gap in a single node though, that's facially ridiculous. Name a full-node shrink+architectural step that has tripled IPC in the last 10 years. Now name one that has done it while cutting power in 1/6th.
At that point we will see the goalposts shift again and it will be "well, x86 could do it if they wanted but Apple is just more willing to spend more transistors..."
Fact of the matter is the x86 makes it very difficult to spend those transistors efficiently - otherwise it already would have been done. If it was such an obvious gain to just spend those extra transistors, then surely AMD would have done it, if nobody else.
Everyone acknowledges x86 has some problems, but the other thing is that they've already mostly played their hand trying to fix those problems, the known solutions like instruction cache have mostly been exhausted at this point. The idea that AMD and Apple can just triple IPC at a whim but they've chosen not to do so for some reason, is facially ridiculous.
I know what Jim Keller said but the math just doesn't add up on it for me. OK, full node shrink, great, even if that doubles your transistor count at iso-power, or even doubles it at a little less power, that doesn't double your IPC let alone triple it, and it doesn't close a factor-of-6 perf/watt gap.
I don't really know where to start on this. SKUs that target very different markets are necessarily going to have different performance and efficiency tradeoffs. And core which target different cycle times are going to be able to achieve different IPC. This clearly confuses the basis of performance and different freq/ipc design points.
I also didn't suggest Apple would never have the best chips ever. Clearly all else being equal if ISA was irrelevant and you had 1 ARM competitor and 1 x86 competitor then sometimes the ARM CPU is going to be the better of the two.
I'm asking is there some continued effect by which people think ARM is going to continue to pull ahead. Is it going to remain < 5%, or is there some turning point where that will start to increase? I'm no expert on this, but there are experts who don't seem to think that there will be such an inflection point.
> I don't really know where to start on this. SKUs that target very different markets are necessarily going to have different performance and efficiency tradeoffs. And core which target different cycle times are going to be able to achieve different IPC. This clearly confuses the basis of performance and different freq/ipc design points.
see my response elsewhere, but these aren't unrelated problems: Apple has higher IPC at a lower power-per-core. You can slide around where on the scale x86 falls - maybe you can match perf/watt but then you're getting wiped by a factor of 3 on performance, and you can match on performance but then you're getting wiped by a factor of 6 on perf-watt. You can't do both at once.
There simply isn't enough transistor gain from a single node shrink there to clear that much of a gap, basically Apple is also seeing much better performance-per-transistor and that's a harder gap to close.
> I'm asking is there some continued effect by which people think ARM is going to continue to pull ahead. Is it going to remain < 5%, or is there some turning point where that will start to increase? I'm no expert on this,
where in the world are you getting that this is <5% gap?
again, 3990WX is an absolute best-case scenario here, that is putting a laptop Apple chip up against a HEDT-class (really, server-class) CPU with 6 times the silicon area and five times the TDP, and all it can do is match it. Mac Pro is the Apple competitor to those chips, and you'll see it slide back into the lead again.
and again, task energy as a measurement favors getting it done faster over pure perf/watt. It's still a 280W TDP / 350W PPT chip against a 60W laptop chip, and it has way more silicon, it's the best case scenario and all they can do is match the M1 in task energy.
That's actually still an extremely good outcome for the M1 and the 40-core Mac Pro is going to slide back over the top again.
> see my response elsewhere, but these aren't unrelated problems: Apple has higher IPC at a lower power-per-core. You can slide around where on the scale x86 falls - maybe you can match perf/watt but then you're getting wiped by a factor of 3 on performance, and you can match on performance but then you're getting wiped by a factor of 6 on perf-watt. You can't do both at once.
I'm not sure how you established that. IPC and picoseconds available per cycle are intrinsically linked. Talking about IPC in isolation is nonsense, particularly when comparing a core that makes less than half the cycle time.
> where in the world are you getting that this is <5% gap?
I just mean the rule of thumb for the "x86 tax", not any specific device. The full thread should have context here, I'm not saying Apple is or is not ahead in a particular instance, I'm asking about more general trends of device performance and ISA.
“Intrinsically linked” doesn’t mean anything, if you want to make an argument then make it.
I’ve already made mine - Anandtech shows a factor of 6 difference in perf/watt between a 11980HK and a M1 Max at peak performance, and this likely translates into a ~factor of four-ish difference in perf/watt and IPC at iso-power. That’s a performance gap that is unlikely to be closed by a node shrink - there is a large architectural gap there. Sure, Apple is probably using tighter pitches as well, but that doesn’t add up to a factor of 4 difference either.
If you have one processor that is doing 4 times the performance at iso clocks, and 2.2x the performance with both processors running at peak clocks, the "megahertz myth" isn't applicable, one of those processors is just faster than the other.
We will see next year, with Zen4 and A16 (apples next core) going head to head on N5P. I strongly doubt Zen4 will even get close.
> I just mean the rule of thumb for the "x86 tax", not any specific device.
Ah, so you are conflating “the amount of transistors spent on x86 decoding” with “the architectural impact that x86 has on performance”.
Unfortunately those are not the same thing. To make the car analogy, how much of your car’s engine bay is spent on aspiration? Probably 5%, maybe 10% right? So obviously aspiration is not important to a car’s performance output at all? And a different method of aspiration would not affect performance at all, a turbocharged car performs almost identically to a naturally aspirated car?
That’s the argument you’re making by focusing on number of transistors spent decoding instead of the impact on the rest of the design. Having a much higher “rate of feed” enables much higher-performance optimizations in the rest of the design - like a much much much deeper reorder buffer.
And just like with cars - that 5% or so of the processor is a key enabling factor that can produce gains of 2x in the rest of the processor, because it’s the only way to keep an engine that is 2x as powerful fed. It doesn’t, itself, produce all that much speedup, but you can’t design bigger engines without clearing that bottleneck. Even if it’s only 5% you can’t do those same designs naturally aspirated.
Similarly, even if the decoder is only 5% of the x86 design, it doesn't mean it's not strangling the ability to scale the rest of the design.
The biggest die shrink in recent times was probably Intel's 22nm jump to FinFET (Global Foundries made a decent jump at 14nm, but it couldn't save bulldozer). Rather than the usual 50-70%, Intel claimed transistor density DOUBLED.
Haswell got a very modes (<10%) performance improvement and an equally modest 10% lower power at load (though almost 25% at idle).
Sandy Bridge did much better in performance (up to 40% in some benches) and also did decently well in power consumption despite using the same node.
AMD saw massive increases with Zen on the same 12nm node. The also saw almost 20% IPC increases from Zen 2 to Zen 3 despite both being on the same N7 node
If anything, history shows that node shrinks are always overrated at improving power and performance.
> Does anyone expect x86 to close a factor-of-6 perf/watt difference?
The one number that surprised me in this review was that the perf/W of Threadripper for the rendering phase: it is very close to the M1 Max. I understand that the numbers are not apples to apples because of the total laptop vs CPU-only comparison, but the power consumption of the Threadripper CPU itself is very high and probably takes the lion's share of the overall power consumption. And that's for a previous generation Threadripper.
task energy comparisons are usually won by the processor that gets it done fastest in absolute terms: because of the overhead of the rest of the system, it takes a really big perf/watt win to come out ahead of the system that maybe isn't as efficient in actual watts but gets it done in half the time, because you pay the system overhead for a shorter period of time.
it's also a 128-thread processor being put against a 8+2 thread processor, and that's the closest thing to something that will outweigh Apple's IPC advantage here: super wide processor clocked super slow, and unlike the more realistic comparisons (laptop processors, etc) the Epyc has deployed over four times as much silicon just to match the M1.
This is the absolute best-case scenario for x86 - they get six times as much silicon and 16 times as many threads and all they can do is match it.
Do the comparison again against the Mac Pro 40-core chip when it comes out and you'll see A15 pull ahead again.
The 3990x is not designed for energy efficiency, on an older node, and on an older architecture... uses 3kWh vs Apples 2kWh (using a very flawed methodology).
An yet you're claiming Apple has a 3-6x power efficiency advantage.
Hmm? Perf/watt isn’t the same thing as task energy, it sounds like you’re the one confused here. As I said, task energy (which is what’s measured in the OP) heavily favors “getting it done quicker” as well as this task being embarrassingly parallel such that the 3990WX can get all 128 threads into play. It’s basically a “race to sleep” benchmark not a perf/watt benchmark with processors that are that disparate.
For a task energy test - which is not the same thing as a perf/watt test - this is completely loading the dice in favor of x86 and M1 still manages to match it. That is an extremely good result for putting a laptop chip against a top of the line HEDT processor in a test that is normally all about race to sleep.
> An yet you're claiming Apple has a 3-6x power efficiency advantage.
I’m not the one claiming anything, if you disagree with Anandtech’s numbers go take it up with them. The numbers don’t change just because you find them uncomfortable.
> In multi-threaded tests, the 11980HK is clearly allowed to go to much higher power levels than the M1 Max, reaching package power levels of 80W, for 105-110W active wall power, significantly more than what the MacBook Pro here is drawing. The performance levels of the M1 Max are significantly higher than the Intel chip here, due to the much better scalability of the cores. The perf/W differences here are 4-6x in favour of the M1 Max, all whilst posting significantly better performance, meaning the perf/W at ISO-perf would be even higher than this.
> In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of.
And again, that’s with it running at half the clock and half the threads of its laptop peers, so IPC is something like 8x higher in those scenarios.
That is not the kind of gap you close up with a node shrink or tightening pitches a bit. Hackernews experts know best though.
Next year Zen4 and A16 will be on the same node, and then it’ll be another excuse for why x86 is still getting dumpstered. Just keep the goalposts on wheels, you’ll need it.
> Perf/watt isn’t the same thing as task energy, it sounds like you’re the one confused here.
How do you calculate Perf/watt if not Task/Task Energy?
> As I said, task energy (which is what’s measured in the OP) heavily favors “getting it done quicker”
Which x86 can do.
> It’s basically a “race to sleep”
Oh no, it gets the task done faster!
> I’m not the one claiming anything, if you disagree with Anandtech’s numbers go take it up with them.
I have no issue with Anandtech, since they made no such claim. I read the article, you're badly misquoting it.
> And again, that’s with it running at half the clock and half the threads of its laptop peers, so IPC is something like 8x higher in those scenarios.
None of which is ultimately important.
> That is not the kind of gap you close up with a node shrink or tightening pitches a bit. Hackernews experts know best though.
Look at the results again. For example for SPEC2017 ST the Apple M1 MAX is essentially tied with the 5950x.
Sure the M1 Max might be more energy efficent (though by how much you'd need to measure) - but remember it's a massively larger chip, a whole year newer, and on a newer node.
For MT we see the M1 max bearely beat a 5800X for int, and do significantly better for floating point - which shows different priorities of design.
Again, take a 5800X, shrink it down, add another FPU, and it beats a M1 max hands down.
> Next year Zen4 and A16 will be on the same node, and then it’ll be another excuse for why x86 is still getting dumpstered.
Except x86 isn't getting dumpstered.
You're just cherry picking specific comparisons that make M1 look great, and then ignoring all contrary information.
I mean, I could the the Borderlands 3 1080p benchmark, and say that the M1 got 21.1FPS to the GE76's 100, and therefore x86 is ~8 times faster.
It wouldn't be honest (I'm deliberately chosing the worst M1 chip, and picking a workload that really favors x86) - but I could do it.
But I don't, because it's not honest nor is it helpful.
Chip Core# TDP 1-Core 1-Core All-Core All-Core
Power Freq Power Freq
3990X 64 280 W 10.4 W 4350 3.0 W 3450
3970X 32 280 W 13.0 W 4310 7.0 W 3810
3960X 24 280 W 13.5 W 4400 8.6 W 3950
3950X 16 105 W 18.3 W 4450 7.1 W 3885
As you can see, going from 4.35GHz down to 3.45GHz reduces power consumption by over 3x. Further, per-core power usage of the 3990x is very low overall.
This lower clock and lower per-core performance gives higher overall performance per watt.
I have an AMD Ryzen 5900X and it is nearly the same in single core and much faster multi-core (12 core). So I don't think it's accurate to say that x86 needs 4x the silicon just it match the M1.
So why don't you write up an article and provide some benchmarks? You'll win lots of internet points, at least on Hacker News, as a lot of people seem pretty upset that Apple's new silicon is killing it.
It is very relevant: people keep bringing up the "x86 loses because it is clocked higher" (you did so yourself!) but the thing is, sure, clock that x86 down and then instead of matching in performance and losing heavily in perf/watt you lose heavily in performance and match in perf/watt.
The fact of the matter here is that Apple is getting much better IPC at a much better power-per-core, and that is the real architectural gap. You can slide around where on the scale that x86 falls, but there isn't enough gain from a full node shrink to close a factor-of-6 perf/watt gap and a 3x IPC gap.
CPUs are designed to hit a certain clock rate. The target frequency influences the design of literally every circuit in the CPU. Downclocking a CPU is not the same as designing it for a lower frequency. An x86 CPU is designed to run at ~5GHz in its highest turbo mode. Simply reducing the frequency in the BIOS does not undo all the tradeoffs the architects made that deliberately sacrificed IPC for frequency.
You could make a very wide CPU indeed if you decided to run it at 100MHz. That would be obviously stupid, though, because it is the product of IPC and frequency that matters.
It certainly looks like Apple has made the better tradeoff. However, you can only tell that from the benchmarks. Either going wider or going faster are valid approaches.
This of course has the subtle implication that it’s equally easy for x86 to go as wide as ARM, and it’s not.
Yes, it’s true that x86 often relaxes pitch, that doesn’t make up for a factor of 6 perf/watt difference like Anandtech measures.
Like I said, I guess we’ll see, Zen4 and A16 will be on the same node next year. By that time the goalpost will move to something else, like this “x86 isn’t designed for power efficiency” defense.
Why do you think x86 can't go as wide as ARM? (I predict your answer will involve something about decoders and nothing about uop caches.)
What goalpost have I moved? I have said one (true) thing: that you should judge by performance rather than by implementation details. You are falling victim to the Megahertz Myth, just the other way 'round.
OK, here’s the performance. Factor of 2.2X over its peers (other laptop SKUs) in multithreaded performance, going head to head with 5950X in some (floating-point) scenarios.
Why do you think megahertz myth is relevant here? Core for core A15/M1 is plainly faster than any of its peers, ignoring clocks, and it is even farther on top when you do look at clocks (i.e. IPC). It doesn’t matter at all which way you look at it, unless you are putting M1 up against HEDT SKUs like 3990WX - there are a few non-peer scenarios like that it only ties x86 in, like OP looking at task energy (3990WX gets to use 280W TDP/375W PPT and race to sleep) but that’s still an incredibly good outcome considering the loaded test, and Mac Pro with its 32+8 configuration will almost certainly be back on top in the “peer” comparison scenario.
It’s amazing how much breath was wasted on “IPC is what really matters” when Ryzen came out and now it’s “the other side of the megahertz myth” when Apple is on top. Ryzen was never even remotely close to being in the lead on IPC compared to where Apple currently is.
Even at iso-power you are looking at a factor-of-3-to-4 difference in performance - I was being generous with the “only 3x IPC” thing. That is what Anandtech measured in their review. And that still means a gap of 4x perf/watt - which is better than 6x for sure, but it doesn’t mean low-clocked x86 magically beats A15.
> It’s amazing how much breath was wasted on “IPC is what really matters” when Ryzen came out and now it’s “the other side of the megahertz myth” when Apple is on top. Ryzen was never even remotely close to being in the lead on IPC compared to where Apple currently is.
IPC is what "really matters" when all the chips you're evaluating are capping out at pretty similar frequencies. When there's a 30% difference in frequency then you need to use instructions per second, and evaluate it with the context of different wattages and different benchmarks.
Sigh. IPC matter when you are comparing within the same architecture, as that means the comparative performance vs other cores of the same architecture ends up (somewhat simplified) as IPC * frequency. It is utterly irrelevant when comparing across architectures as the instruction sets are different. IPC matter when comparing Ryzen to Intel, and between A15 and A16, but it doesn't matter at all when comparing A15 to anything x86-based.
>OK, here’s the performance. Factor of 2.2X over its peers (other laptop SKUs) in multithreaded performance, going head to head with 5950X in some (floating-point) scenarios
Finally a sensible comparison! As I said, it is quite impressive.
>it doesn’t mean low-clocked x86 magically beats A15.
Nor did I ever say it would.
What I have said is a simple truth: leading on IPC and trailing on frequency is not obviously an advantage. I don't understand what it is you think I said. Pretty much everything you are writing is a non-sequitur.
Would you please stop posting in the flamewar style to HN? I just had to warn you about this elsewhere in the thread. This is not cool.
You obviously know stuff about this topic and that's great, but the benefit of that is smaller than the damage you cause by flaming. Would you please review https://news.ycombinator.com/newsguidelines.html and stop this going forward? We'd appreciate it.
Not a chiphead, but saw this in the article that might be a reason ARM is better for this kind of thing:
"The theory goes that arm64’s fixed instruction length and relatively simple instructions make implementing extremely wide decoding and execution far more practical for Apple, compared with what Intel and AMD have to do in order to decode x86-64’s variable length, often complex compound instructions."
Not sure it's true, not an expert. But it doesn't sound wrong!
The fixed width decoders have always been a commonly cited advantage of fixed width, and to some degree it must be true. But this is not a recent thing, the "common wisdom" about instruction format not mattering too much still very much applies here.
Pre-decode lengths or stop-bits and more recently micro-op caches have been techniques that x86 has used to mitigate this and improve front end widths, for example.
People like Jim Keller (who has actually worked and lead teams implementing these very processors at Apple, Intel, and AMD!) basically say as much (while acknowledging decode is a little harder, in the large scheme of things on modern large cores it's not such a big deal):
> A consistent 5% win is pretty huge for certain industries.
Are you referring to Andy Glew's thread? He said perhaps 5%, but he also went on to say probably less than 5% for basically the lowest-end out of order processor that was fielded (A-9), not what you would call a high performance core (even then 10 years ago). On today's high performance cores? Not sure, extrapolating naively from what he said would suggest even less. Which is backed up by what Jim Keller says later.
So << 5%, which is significantly less than process node generational increases.
I'm not saying ARM won't leapfrog x86, I'm just asking what the basis is for that belief, and what those who believe it think they know that the likes of Jim Keller does not.
If it's an argument about something other than details of instruction set implementation (e.g., economics or process technology) then that would be something. That is exactly how Intel beat the old minicomputer companies' RISCs despite having "x86 tax", is that they got revenues to fund better process technology and bigger logic design teams. Although that's harder to apply to Apple vs AMD/Intel because x86 PC and server units and revenues are also huge, and TSMC gives at least AMD a common manufacturing base even if Apple is able to pay for some advantage there.
Most of the chip-heads' deep hatred for x86 comes from the number of pain-points for dealing with its ISA.
In terms of actual raw performance the instruction set is extremely secondary to about 10 other architectural choices.
The reason Apple is getting such performance is because it's caught up with all the cutting edge techniques Intel and AMD use, and a few more, and implemented them inside its core architecture.
The two primary mechanisms I am aware of are drastically simpler instruction decoding and lack of backwards compatibility. Both of these save on power and area, allowing those budgets to be spent elsewhere. Often times on wider instruction decode, extra execution units, better branch prediction, etc.
These budgets are much tighter on a mobile chip, but are still relevant on desktop design.
I mean, VLIW is fantastic as long as you have very predictable workloads with high parallelism and little branching that are very easily statically scheduled.
If you want to design a DSP or something, where you know it’ll be 100% saturated all the time with an absolutely predictable workload, it’s extremely efficient. Same for AMD Terascale GPUs back in the days before async compute and other GPGPU style tasks took over.
I'm not totally convinced that'll happen in the desktop space. Embedded, I completely buy it, but I have read fairly convincing arguments that ARM might be a better ISA for really high-performance architectures. RISC-V also seems to value compressed instructions far too much, so we could see a high-performance fork of RISC-V?
Given how much cruft x86 has, there's a comments about world class experts above saying the gap between ARM and x86 is maybe 5%, at best, and probably less.
There's realistically probably no reason RISC-V cannot work, expect that the number of techniques needed to implement a leading edge high performance evaluator is enormous.
RISC-V has a few ISA advantages over ARM64 (mainly the variable length instructions, which provide instruction density that's comparable with x86 while still being very easy to decode) but meaningful competition would have to come from something more like the upcoming Mill architecture - VLIW-like power efficiency with the same flexibility and IPC as current OOO designs.
The ISA has relatively little to do with it. Sure, x86 requires power-hungry decoders, but most of the time you'll be running from the uop cache anyway. Plus you get denser code. Arm and x86 aren't all that different under the hood these days and generally RISC vs CISC is a wash. It took heroic engineering to get x86 that fast, but that work is already done.
Yeah, your quote is a bit unfair to x86 - after all the x86 architecture is hampered by having to maintain compatibility not only with the original 8086, but also with most other instruction set extensions that Intel and/or AMD came up with during the last 40 years. Compared to that, the M1 is almost a "clean sheet" design. So, while still a great achievement, its dramatically better efficiency is not that surprising. I wonder however if Apple leading the way will also enable the PC (Windows/Linux) platform to leave its x86 past behind. Microsoft has been trying to sell ARM-based tablets for years, but without a lot of success, because there's not enough native software available.
> So, while still a great achievement, its dramatically better efficiency is not that surprising.
Is it? I was under the impression modern CPUs translate x86 instructions do on the fly translation to something else anyway. I’d be very surprised if that hardware translation process was making their chips 4x less power efficient (or whatever the number is).
Also as a consumer, the reasons why x86 stayed around as long as it did sound like a bunch of excuses. If x86 is really as inefficient as you say, then it’s about time the industry started sidelining Intel and moving to a better architecture with or without them. I’m hoping AMD follows suit before long and starts selling competitive arm or risc-v chips.
Windows ARM failure to take of so far seems more likely to do with the relative underperformance of Qualcomm chips not the lack of software. Especially since in recent years you've been able to run x86 software anyways.
Windows 10/11 on an M1 Mac via parallels however has been the best Windows ARM device I've ever used and I have no trouble using it daily as my main work laptop.
x86 is quite a bit older instruction set. It has a lot of cruft and the all chips are really backward compatible with older software and many operating systems. Its extremely flexible at a cost to performance. The new Apple chips while impressive, only run new ARM64 instruction set programs compiled for apple machines. Depending on your use case that might not matter, but its still something. And competition is good for a cpu market that was lacking the massive year to year improvements of 20 years ago..
I always wondered why Apple was pushing so much functionality into the iPhone that almost all users really didn't need. I was thinking it would get them ready for coming AR glasses and whatnot, but it was also preparing them to win on the Mac side as well.
I was randomly curious how long it would take to render a full movie at the quality of that forest image, which absolutely blew my mind.
At 24fps, a 2h movie has 172800 frames. 21,970,310 M1 seconds or 8.5 M1 months. Which is less than I was expecting. The rendering seems to scale linearly per core too.
Presumably bad math or a lot more rendering complexity for the pixar super computer deploys?
You've got a few flawed basis points in your math.
1) that forest isn't near the complexity of most major feature films. It's great, but it's got a ton of instancing, it's not got very complex shaders, it's not got very complex lighting. It's a good benchmark scene, but it's not approaching the level of scenes in most feature films.
2) You're assuming a single render happens per frame. Every scene in a movie is rendered several times. Potentially every single time a new hand touches it, and definitely multiple times as lighters iterate on it. On average I think a single shot must be rendered about 40-50 times or more in a large production. At least 10 times per each major department.
3) Pixar and other studios don't have really extraordinary computers. They're usually run of the mill HP or Dell workstation specced machines with a dual Xeon and tons of RAM.
4) static scenes with diffuse lighting are fast to render. Add in motion and things get tricky. Do you want motion blur in render (as opposed to composited in via motion vectors)? Well now you're sampling multiple frames. What about depth of field? Way more samples to resolve properly. Now add in more lights, and you need more samples per light to resolve a noiseless render.
1. Instancing doesn't reduce the complexity of rendering a scene in terms of light transport, it only reduces the memory storage and Accel structure complexity.
Actually, a forest with occluding foliage geometry is a quite difficult light transport situation: it's very difficult for next event estimation to work efficiently, so therefore most of the lighting is indirect and therefore is quite noisy, as it's difficult to find the environment / IBL light for the sun.
Portal guiding and NEE guiding can help a bit, but it can still be quite slow.
I didn't mean to imply that instancing would reduce the light transport costs, but rather just reduce the overall render time cost. E.g it's complex in a way that is great for benchmarking things, but not necessarily in a way to be representative of production scenes in a feature film.
Worth keeping in mind is the similarity of frames. Or, the lack of difference. If you know that the light-source isn't changing then you can get way with just copying the last frame and super-imposing < 300 pixels to reflect what's changed. i.e. there's no need to render the same wall for every frame.
I'm guessing they don't interpolate frames either, but maybe that would be a good time saving feature if you know the end compression format and you can just interpolate the frames at the render stage instead of re-encoding it again later.
There are ai upscalers and also ai interpolators to uprez the frame rate. Haven’t heard they used used in film but I suspect they will be as they get better.
If you're using the same software, you can assume there's no super high-level speedups like that happening on the M1, although it might specifically target speedups on common instructions used in these workloads.
Just for fun, I would run PostgreSQL on this Mac with 64GB of memory and see how it flies. Heck even install Linux on it and then put on Postgres. It might just put some baremetal servers to shame if 64GiB of memory is enough.
If only I had enough money to splurge on such curiosity experiments… :-(
For my personal machine, I’d wait for Framework with Ryzen 5000 series…
Good luck with that... it's the one thing that doesn't seem to come up in these comparisons. At least with the Ryzen/Intel CPUs you can easily install Linux - but until this is true of the M1 chips then these benchmarks aren't all that useful.
Holy shit. It’s absolutely incredible what Apple has cooked up, seeing it in a real life workflow like this. And the bigger iMacs and Mac Pros with even beefier M1 derivates haven’t even been released yet!
The problem with Apple is, and remains, the software, which I find utterly hostile and inflexible for my workflows, and getting worse by the year. Personally, I only consider getting one of these after it is able to a decent Linux distribution well.
Mint is okay. Except the time that grub randomly failed for me and I had to manually fix my boot config. But at least my webcam and brightness buttons worked on my ten year old laptop, which they didn’t when I booted into windows even after spending a few hours trying to find the correct drivers.
My wife (who while smart doesn’t like tinkering with computers) has been using Ubuntu for two years now and her only complaint is “my hard drive is full and how the heck do I uninstall applications?” Even things like Microsoft Teams and all the virtual doctors meetings apps work for Ubuntu, which surprised me.
I mean Linux distros kinda suck when something goes wrong but I’m honestly impressed how long my wife’s Ubuntu laptop has been working without major issues.
Linux distros suck the least when something goes wrong. Both Windows and MacOS fail in inscrutable ways and have required me to do complete reinstalls in the past, whereas on Linux there's usually some sort of forum discussion or Arch wiki page that will get you back on the right track.
> With these drivers, M1 Macs are actually usable as desktop > Linux machines! While there is no GPU acceleration yet, the > M1’s CPUs are so powerful that a software-rendered desktop > is actually faster on them than on e.g. Rockchip ARM64 >
> machines with hardware acceleration.
It's very normal. High end path tracers are still very much run on the CPU. GPU rendering is improving quickly, but the author is working at WDAS. Hyperion is still a CPU renderer, and I don't see it going to the GPU anytime soon.
Commercials and mid to low budget TV shows are moving to GPU rendering , but it's very much a factor of scene complexity and available time.
There are two major approaches to rendering: rasterization and ray tracing. The former is faster, the latter is more real. They are completely different approaches.
Historically, ray tracing is only used for movies whereas games generally uses rasterization. The former is highly coherent workload, and is great for GPU. The latter is incoherent workload, and generally isn't suitable for GPU.
Games have started using ray tracing, and there have been GPU implementation of ray tracing in the past 5 or so years. What I said above may be stale.
Raytracing can absolutely be done on GPU even without RTX-style hardware; there are tons of implementations of it in the literature. Last I read the research, the best approaches dynamically bundle up rays hitting the same or nearby objects to squeeze coherency out of the GPU.
As far as I can tell, the biggest problem is simply that GPU raytracing requires a completely different software architecture. Giant boil-the-ocean rewrites that require not only new software but also new hardware are very difficult to justify, especially in a mature industry dominated by (relatively) short-term film production schedules. There are other technical issues too, such as VRAM limits.
>Giant boil-the-ocean rewrites that require not only new software but also new hardware are very difficult to justify, especially in a mature industry dominated by (relatively) short-term film production schedules.
Anybody want to take my PhysX card off my hands? :)
Yeah it is a bit stale. 3D artists these days want their render boxes to be filled with gpus so they can use cycles or redshift or octane to render fast. Old school artists use cpus these days.
Since when was the discussion purely feature films? Our little studio can't afford racks full of Xeon CPU renderers, so we do as GP said, using redshift, etc. and just have a ton of GPUs.
Because the commenter I replied to said that only old school artists render on the CPU.
People pick the renderer that suits their needs, and it's not just a matter of old school people picking CPU pathtracers because it's what they know. You picked GPU rendering because you're, as you said, a smaller studio. That suits your needs. It's however not old school for big studios to use CPU rendering because GPU renderers just can't handle the scene complexity required yet.
The reason why most artists highly prefer gpus is it gets their cycle time down. Farms often use CPU’s still because of cost and less pressure on a single image render time.
Usually what one does is break the shots down into individual elements rather than render everything as once and then composite it in. Rendering it all in one go can prevent touchups to individual elements. But I am much more VFX than pure animation - I think animation does more one shot one render work.
Funny thing I mostly know what I am talking about.
> Funny thing I mostly know what I am talking about.
You are only however talking about your subjective experience and projecting it onto the entirety of the industry.
I've worked in both animated features and VFX at a few of the bigger studios. What sort of studio do you work at?
I'd suggest looking at the landscape of big feature films and seeing what studios are using GPU rendering. It's rare to see any of the big studios using GPU rendering, not just because of existing farm hardware, but because none of the current GPUs and GPU renderers are capable of handling the larger scenes required, as well as losing out on some shader functionality like full OSL support among others...
If you look at the ACM breakdown of production renderers, in their rendering special, there's not a single GPU focused renderer there.
Individual artists may want GPU rendering. However your statement of only old school people using CPU rendering is just ignoring the realities of production.
Wrote and sold software into the VFX industry. Most notable Deadline, a fairly popular render manager. Also wrote/sold another dozen or so tools into the industry.
While CPU renderers are popular among the old crowd, no one wants to use them because they are super slow. The cycle time is brutal and it kills artist productivity. Artists want redshift, octane, GPU cycles, and hybrid Pixar Renderman, basically anything with multi-GPU support. Wherever artists can they want GPU-based renders because they are fast.
Large feature film shops are laggards in adopting new technology and often have a ton of workflows/plugins that can not be easily changed. They have their tired and true pipeline. But up and coming studios tend to be nearly completely GPU-based, because it cuts costs, even if it limits things a bit (but much less than before.)
This is also why you see Unreal Engine getting into architectural rendering -- again because it is a much nicer workflow than waiting around for a few hours for V-Ray to finish up.
Many children TV shows are GPU-rendered now right out of game engines, in part because of the cycle time and the lower visual complexity of the scenes.
So I think we are both right. Large studios are using CPU-based rendering -- you are correct. Artists and more nimble studios are using GPU-based rendering as much as they can because it reduces cycle time while being roughly equivalent costs (well except for the last 2 years because of crypto screwing up GPU prices.)
Remember my original statement you took offence to was:
"3D artists these days want their render boxes to be filled with gpus so they can use cycles or redshift or octane to render fast."
and
"The reason why most artists highly prefer gpus is it gets their cycle time down. Farms often use CPU’s still because of cost and less pressure on a single image render time."
I was speaking about what artists want and I am absolutely correct on that. And what artists want will filter into the rest of the industry -- although a bit slower now because of stupid GPU prices.
That is not the part of your statement that I was saying is incorrect. The part that I still say you're wrong about is the statement that people who use CPU renderers do so because they're old school. To claim that is ignoring the realities of how GPU renderers handle large scenes.
You again try and say the big studios are laggard in adopting tech, and have tired pipelines. That's not why they're using CPU rendering. It's because GPU renderers did not scale to meet their needs.
A lot of the big VFX studios are renderer agnostic (e.g ILM), and would have no issue adopting GPU renderers if it met their needs.
You're continuing to project your subjective (and frankly incorrect) opinion that GPU renderers have superseded CPU renderers already onto the industry, without understanding the limitations involved.
You even mention "hybrid Renderman" in which I think you mean XPU, but that's yet another example of a GPU renderer that isn't at parity with the CPU one. The same goes for Arnold GPU etc...
"I was speaking about what artists want" is fine, but you're also dismissing the very real reasons people are still on CPU renderers by saying it's a matter of legacy.
Okay, but you presented it as a whole statement and I clarified multiple times what I was contradicting in your posr? I also presented a nuanced argument based in real world usage, whereas you went off on projecting your uninformed opinions as fact.
Still, the going price for a W-3245 is more than $2,000. That's just the CPU, no memory, motherboard or anything else. A 16" MacBook with 16GB of RAM is $2,499.
The MacBook comes with a free display, input devices and a UPS :)
Apple managed to book entire TSMC 5nm production or close to all last year - that's the main reason for this performance.
and by managed - its passed on consumer ofc.
ill just wait for amd 5nm offering that will also be in ram/cpu upgreadeable form factors and Linux compatible out of the box for half the price.
Sure, it's a reason for the performance. But it's still not a reason for the performance to price comparison that I was making.
Surely a CPU that is on an older, more tested node (albeit an Intel node, take that as you will) should have amortized the costs of producing silicon products on that node.
The cost of an Apple CPU is cheaper than a worse-performing older CPU. The economics here are surely poor? You can't tell me that Apple of all companies are taking a hit on their new CPUs? I mean, they could be for all I know, but that would be very un-Apple like.
Hi! Author here. I don't call the E5-2680s "somewhat old", I call them ancient, because they are... well.. ancient. I included them in the comparison because up until recently, I was still using an E5-2680 workstation. So really the point was that for ME personally, the M1 Max is a huge improvement over what I had previously. I don't really keep piles of the latest Xeons sitting around my house, so I had to make do with what I had. ¯\_(ツ)_/¯
For what it's worth, I also threw in comparisons with the 2019 Mac Pro's Xeon W-3245 and the Threadripper 3990X, both of which might be described as "somewhat new".
That's the thing most people don't get about Intel: nothing has really changed perf-wise since Haswell, which is older than that.
If your old Haswell / Skylake machines still work (and if you don't need Windows 11 which doesn't support them), there's zero technical reason to upgrade to something newer, at least on the desktop.
There's been some progress in terms of power efficiency on laptops, but even there 5+ year old chips are pretty much as good as it gets if you want to stick to Intel. Maybe 10-15% percent slower than the very latest stuff, you won't even notice it. That's why Apple is currently obliterating Intel and AMD - the latter milked their cash cow until it fell over.
And Apple knew that they didn't even need to work particularly hard on core perf - memory bandwidth improvements alone would have delivered the goods. That's another thing people don't get - the CPUs spend quite a bit of their time waiting for bytes to come in or go out. The latencies involved in that are upwards of 200 cycles, and any significant improvement in both the latency and the bandwidth has insane, immediately noticeable benefits for perf, particularly in "modern" languages with poor cache locality. Why Intel / AMD duopoly did so little to address this deficiency is inexplicable, but here we are.
> That's the thing most people don't get about Intel: nothing has really changed perf-wise since Haswell, which is older than that.
Single core performance on current i9 processors benchmarks at twice the performance of the Haswell chips. Multicore performance is ~4x.
They definitely slept for a while and AMD started taking their lunch money with Ryzen and EPYC but "nothing has really changed since Haswell" is blatantly wrong.
> If your old Haswell / Skylake machines still work (and if you don't need Windows 11 which doesn't support them), there's zero technical reason to upgrade to something newer, at least on the desktop.
Wrong again, mostly because quicksync in that old a processor won't handle current popular streaming video codecs so you end up with non-accelerated video decoding unless you've got a discrete GPU.
> memory bandwidth improvements alone would have delivered the goods.
No advantage from people upgrading from a Haswell's DDR3-1600 (13GB/sec) to Comet Lake's DDR4-2933 (94GB/sec), eh?
Either it's not worth upgrading from Haswell or it is.
I'm sure that's what Intel would like you to believe. But if you benchmark this stuff (which I have) you will see that real world single core perf hasn't moved much in a decade, and memory bandwidth is never "94GB/sec" or whatever, especially not in a laptop.
When I viewed Figure 5 [link] at full resolution, the pine tree bark felt to me like a poorly-done bump-mapping tech demo. I think it lacks sharp contrast between the raised and recessed regions (instead having a gradient), and I think the outline/silhouette of the branches are perfectly smooth and unaffected by the bark, and too uniform and slender to look like real trees.
Yeah, some images look very fake, which is why I wrote that only some looked incredibly real.
The one you linked looks great to me. Some parts are more convincing than others, but in general I almost feel as if I could touch those trees. Although I admit I don't have a trained eye to see the rendering tricks used.
It doesn't really matter. Nobody's doing rendering on CPUs.
I get that he's sticking to comparing CPU to CPU rendering to compare apples to apples, but that's like saying "My new buggy whip is amazing!" in a news article written in 1950.
Everyone is doing GPU based rendering. Pascal based GPUs are still (I believe) several times faster than even current mid-tier AMD desktop processors. Pascal is two generations behind current GPUs, which are several times faster than it is.
What will actually be relevant and interesting is how the M1X's full capabilities can be leveraged for rendering, and how that will compare against Ampere GPUs.
Not necessarily true. I know several studios in Los Angeles that do 3D final renders on CPU, such as Disney Animation. GPU rendering works for realtime, but the memory constraints on GPU limit its use in the high end.
Everyone in the field is looking at GPU and some significant advances have been made, but it's definitely not true that "nobody" is doing rendering on CPUs.
I wonder what is the pgbench score for these given M1 notebooks are hitting 100K+ writes per sec. These MBPs might be the best 5K server you can buy for a DB.
One of several tradeoffs is that M1-based stuff has a RAM ceiling, which until a few days ago was 16GB, now it's 64GB. If you need more than that, then you can't use M1.
Does the performance gap close if Intel starts selling similar on-package RAM to consumers?
I suspect yes, and rapidly. They have it, they just apparently don't want to sell it outside of specialized high-margin goods like Xeon Phi.
It partially is. The M1 max comes with integrated hbm, which is way higher bandwidth than ddr4. Modern Intel/AMD processors are memory bottlenecked in multicore workloads (hence why 3d cache for Zen is a 15% improvement). The combination of hbm and on package memory means the M1 has much higher bandwidth and similar latency to anything else on the market (at the cost of not being scalable).
I really thought I stamp out the "Unified Memory" and "On Package Memory" being the reason why M1 is faster on HN.
>The M1 max comes with integrated hbm
It is not HBM, just a very wide LPDDR5.
>hence why 3d cache for Zen is a 15% improvement
Increasing Cache size has nothing to do with Bandwidth. It is the latency and cycle count that matters.
>Modern Intel/AMD processors are memory bottlenecked in multicore workloads
Depending on Workload. The whole reason why Apple has put those bandwidth in place was because of GPU, which are bandwidth sensitive. The M1 Pro has the same MT performance as M1 Max despite only have half the memory bandwidth.
You could have much faster memory on an Intel x86 system, and performance wouldn't even make that much different. Your whole CPU uArch needs to be designed take advantage of the additional bandwidth. Longer pipeline with better prefetch.
>and on package memory means the M1 has much higher bandwidth
Again. There are no relationship between on package memory and high memory bandwidth. You could have achieve the same with DDR5 with DIMM slot. At the expense of much higher energy usage.
Do board partners want this, though? Apple only has their own margin to worry about, but when Intel starts putting DIMMs on-chip, now the system integrator doesn't know exactly how much Intel's markup is on their CPU versus their memory modules, so they end up surrendering more margin to Intel, even if it's small (eg. paying 5% more dollars per GB with Intel versus sourcing DIMMs yourself).
Stop spreading this myth, please just look at a Vega GPU die shot and you'll know what HBM looks like. You cannot put HBM on a PCB package. HBM is connected with a silicon interposer.
Not a rendering benchmark but Anandtech has a performance review for each. Looking on the massively multithreaded workload side having 64 cores seems to far outweigh having 8+2
I mean apple have it "easy", they control the hardware and the software, they don't need to to something that work with pci, old ram, thousands of different configs, different disk etc ...
They can do whatever they want with 0 compatibility or backward compatibility.
Why is the test not testing against some AMD CPU, CPU that cost $400, 5800x or 5900x for example.
You’ve correctly identified one reason Apple’s hardware is so effective: they have a hardware-accelerated standard library, something that requires cooperation between their software and silicon developers. The nearest comparison is if Dell released a line of servers that, when paired with Ubuntu LTS, provided a hardware-accelerated glibc. They could do this at any time, but it’s not easy; the capital cost is incredibly high and demand is incredibly low. Apple has managed to deliver, and now owns a niche last claimed by Amiga decades ago. The advantages of that niche are exactly as you describe them, but if it were as easy to profit from that niche as your dismissiveness implies, there wouldn’t have been a twenty year gap between Amiga and Apple Silicon.
A lot of good it does them. MacOS is an incredibly poor performing OS. For years, Windows and especially Linux have been more responsive and load faster than macOS on similar hardware, despite not being in a vertically-integrated hardware/software system.
It's difficult to imagine how anyone can think that M1 is "competing" with Intel or AMD because of this. You won't be getting macOS on any new x86 processor, and you won't be getting Windows on an M1.
For the vast majority of people, M1 provides no meaningful decision point. It's not like I'm going to stop using Linux on my servers, pull out of the cloud, and build a closet full of M1 Mac Minis. I could write my day job software from a Mac, but again, I'm not really getting much of a benefit from that because of the rest of the issues with the OS. And gaming is a no-go. So what is the point?
You are absolutely right, with one key alteration: There is no point for you in what Apple's doing. As you've identified, you're not the target market for their products. There is, however, a point for others, who have different priorities and compromises than you.
M1 is not a decision point for their consumer base, because most people don't care what CPU architecture is inside their laptop. However, M1 notably improves the decision points that do matter to their customer base: weight, noise, and battery capacity for each customer's workload.
This comes at the cost of macOS and the restrictions inherent in it. If that cost is acceptable, then the benefits are significant. If that cost is unacceptable, then the benefits are irrelevant. As you indicate, that cost isn't acceptable to you, and you do not indicate interest in any of the other benefits of their platform, which specifically exclude Windows, Linux, and PC gaming. This leaves me unable to determine why you're participating in this discussion.
So, then; why does the topic of 'M1 MAX' matter to you at all? I'm happy to continue discussing, but without that information, there's not a lot left to say.
"But the Linux desktop experience has been ... frustrating to say the least. And I say this as someone who has used linux in the terminal at my job every day for 5+ years now. It's ridiculous that the software experience still cannot match my 9 year old laptop running macOS Sierra on a 2.5GHz Core i5 and 8GB of DDR3 RAM!"
> They can do whatever they want with 0 compatibility or backward compatibility.
Apple put in quite a bit of effort to make sure x86 binaries still run the ARM Macs. It works for 99.5% of things in my experience. They have even been observed putting in compatibility tweaks for specific applications.
Also fun fact: The x86 compatibility software, Rosetta, isn’t distributed with macOS and has to be downloaded once it’s needed. This is likely Apple hedging their bets against possible future patent claims.
Building from scratch isn't "easy". They still buy a lot of their components off the shelf so they do need to interact with vendors so they can benefit from the economies of scale on the PC side of things.
Yes, they can optimize a lot of pipelines but this goes both ways. E.g. they had lightning connector which was good. USB-C beat the pants out of it and now has a much better eco-system. Apple is stuck with their sub-par connector for phones and an unclear strategy (USB-C on macs/iPads but not on phones and low end iPads).
It’s slightly faster at considerably greater power usage — and, more importantly, it hasn’t shipped yet so we don’t know what the true availability or performance will be when it arrives next year. Apple soured on Intel in part due to repeated delays so while I hope that they ship on schedule at full volume, it certainly would not be unprecedented for them to be late or limited quantity.
Assuming a 100-watt battery and/or a 3.5-pound weight restriction, how many hours of battery life would you get from an Alder Lake laptop in those rendering tasks, in comparison to the M1 MAX laptop?
How many pounds of battery would an Alder Lake laptop required to match the uptime of the M1 MAX laptop performing those rendering tasks?
My mental napkin math suggests extremely unfavorable answers for an Alder Lake laptop, to the degree that the laptop's viability in the market would be severely compromised by sustaining the power and cooling required by Alder Lake for long-duration workloads such as rendering.
And it devours, what, 5X the power and generates far more heat and fan noise requiring a much thicker machine to house it? Also the integrated graphics will (going from Intel) be nowhere near as impressive as the M1 Max's graphics, so it's hardly competing on that front either.
Not really a victory there. Also, it hasn't even been released yet, so comparing hypothetical unreleased products isn't a good comparison to call released products you can actually buy "nothing special."
M1 MAX is not a low-power CPU. It consumes the whole 30 WATTs (!!!) and has an active cooling system. H-series CPUs from Intel have a TPD of 35 - 45 WATTS and this is almost the same as M1 MAX. According to your calculation, Intel CPU should consume 150 WATTs which is completely wrong.
By the way, Apple said their graphic card in M1 MAX has the same performance as mobile 3080 using much less power. But this was a lie. M1 MAX has only ~10 TFLOPS and 3080 has ~19 TFLOPS, so M1 MAX is almost twice slower than mobile 3080. So the whole superiority of M1 MAX is greatly overstated by fake numbers. and also don't forget - 3080 is almost a year old GPU.
"said their graphic card in M1 MAX has the same performance as mobile 3080 using much less power"
Apple never said it compared to a 3080 in Gaming Performance. People made conjectures that it could be roughly in that arena. In Productivity applications (work applications that use a GPU), there it's very close to a 3080.
Also, TFLOPS is a terrible number for performance because it depends on the level of precision and is highly variable on workload. Good luck loading, say, a 48GB 3D Model on a RTX 3090 - but a M1 Max would handle it fine. Similarly, the 10.8 TLOPs PlayStation 5 had better performance than the 12 TFLOPs Xbox Series X for months after launch.
Finally, as for many of the games that were compared - notice how many of them use MoltenVK, which is a Vulkan -> Metal translator that impacts performance. Then look at how many were x86, needing Rosetta translation on top of that. These weren't exactly the most native of game ports.
this is really interesting