The perf/W differences here are 4-6x in favour of the M1 Max, all whilst posting significantly better performance, meaning the perf/W at ISO-perf would be even higher than this.
and
>On the GPU side, the GE76 Raider comes with a GTX 3080 mobile. On Aztec High, this uses a total of 200W power for 266fps, while the M1 Max beats it at 307fps with just 70W wall active power.
The sad thing is that what you really want to compare is how their GPU is doing against nVidia, but then they pair it with Intel's CPU which is known to have very poor power efficiency vs. AMD.
It's pretty remarkable that now we're not only comparing Apple's SoC to the best CPUs from dedicated makers, we're comparing it to the best GPUs.
Could you qualify what you mean regarding double precision, though? nvidia consumer GPUs have pretty terrible double precision (usually in the range of 1/64th single precision). And FWIW, the normal cores in the M1 (Max|Pro) have fantastic double precision performance, and comprise the bulk of the SPECfp dominance.
> It's pretty remarkable that now we're not only comparing Apple's SoC to the best CPUs from dedicated makers, we're comparing it to the best GPUs.
Is it? Apple has 5nm on lockdown right now. Process is nearly everything in performance/watt.
If you want to compare architectures, you compare it on the same process. 5nm vs 5nm is only fair. 5nm vs 7nm is going to be 2x more power efficient from a process level.
When every transistor uses 1/2 the power at the same speed, of course you're going to have a performance/watt advantage. That's almost... not a surprise at all. It is this process advantage that Intel wielded for so long over its rivals.
Now that TSMC owns the process advantage, and now that Apple is the only one rich enough to get "first dibs" on the leading node, its no small surprise to me that Apple has the most power efficient chips. If anything, it shows off how efficient the 7nm designs are that they can compete against a 5nm design.
Qualcomm has loads of 5nm chips. They're pretty solidly beaten by Apple's entrants, but they've been using them for over a year now. Huawei, Marvell, Samsung and others have 5nm products too.
This notion that Apple just bullied everyone out of 5nm is not backed by fact. For that matter, Apple's efficiency holds even at the same node.
There is this weird thing where some demand that we put an asterisk on everything Apple does. I remember the whole "sure it's faster but that's just because of a big cache" (as if that negated the whole faster / more efficient thing, or as if competing makers were somehow forbidden from using larger caches so it was all so unfair). Now it's all waved away as just a node advantage, when any analysis at all reveals that to be nonsensical.
> This notion that Apple just bullied everyone out of 5nm is not backed by fact.
In the context of laptops, its true. Neither Intel or AMD has chips being built on TSMC N5 or a comparable process. AMD is on TSMC N7, and Intel is currently on their own 10 nm process, moving to "Intel 7" with Alder Lake which is getting formally introduced in 2 days.
Intel wasn't in competition for TSMC's processes at all, and AMD was in absolutely no hurry to 5nm (especially given that they were targeting cost effectiveness). The fact that Apple readied a 5nm design, and decided that it was worth it for their customers, in no way indicates that they "bullied" to the front.
Quite contrary, for years Intel made their mobile/ "low power" parts on some of their older processes. It was a low profit part for them and they saved the best for their high end Xeons and so on (where the process benefit was entirely spent on speed -- note that there is a lot of BS about the benefit of process nodes where people claim ridiculous benefits when in reality you can have a small efficiency improvement, or a small performance improvement, but not both. The biggest real benefit is that you can pack more on a given silicon space, in Apple's case loads of cores a fat GPU, big caches, etc). If Apple upset their business model, well tough beans for them.
As an aside, note that the other initial customer of 5nm was HiSilicon (a subsidiary of Huawei) with the Kirin 9000. That's a pretty sad day when AMD and Intel are supposedly sad also-rans to Huawei. Or, more reality based, they simply weren't even in competition for that space, had zero 5nm designs ready, and didn't prioritize the process.
Well... Intel not having 5nm is entirely Intel's fault. They used process to their advantage and, well, when they messed up their process cadence, the advantage evaporated.
AMD could, but they seem to be very happy where they are. They also have to decide on which fronts they want to outcompete Intel and, it seems, process isn't one of them.
TSMC 5nm is not the same process as Samsung 5nm though?
All the processes are the company's secret sauce. They aren't sharing the details. Ultimately, Samsung comes out and says "5nm technology", but that doesn't mean its necessarily competitive with TSMC 5nm.
Indeed, Intel 10nm is somewhat competitive against TSMC 7nm. The specific "nm" is largely a marketing thing at this point... and Intel is going through a rebranding effort. (Don't get me wrong: Intel is still far behind because it tripped up in 2016. But the Intel 14nm process was the best-in-the-world at that timeframe)
But you compare power-efficiency by how efficient each transistor is.
TSMC N5p is 10% more efficient and 5% higher clocks than TSMC N5. The same 5nm __BY THE SAME COMPANY__ can change 15.5% in just a year, as manufacturing issues are figured out.
Making every transistor 10% less power and 5% more GHz across the entire chip, while keeping the same size, is a huge bonus that cannot be ignored. I don't know what magic these chip engineers are doing, but they're surely spending some supercomputer on brute forcing all sorts of shapes/sizes of transistors to find the best density/power/speed tradeoffs per transistor.
This is part of the reason why Intel stuck with 14nm for so long. 14nm+++++++ kept increasing clock speeds, yields, and power efficiency (but not density), so it never really was "worth it" for Intel to switch to 10nm (which Intel had some customer silicon tapped out for years, but only at low clock speeds IIRC).
It isn't until recently that Intel seems to have figured out the clock speed issue and has begun offering mainstream chips at 10nm.
> Process is nearly everything in performance/watt.
Not really. Apple's A15 and A14 phone chips are on the same process node.
>Apple A15 performance cores are extremely impressive here – usually increases in performance always come with some sort of deficit in efficiency, or at least flat efficiency. Apple here instead has managed to reduce power whilst increasing performance, meaning energy efficiency is improved by 17%
The efficiency cores of the A15 have also seen massive gains, this time around with Apple mostly investing them back into performance, with the new cores showcasing +23-28% absolute performance improvements
> Not really. Apple's A15 and A14 phone chips are on the same process node.
Yeah, you're talking about 20% performance changes on the same node.
Meanwhile, advancing a process from 7nm to 5nm TSMC is something like 45% better density (aka: 45% more transistors per mm^2) and 50% to 100% better power-efficiency at the same performance levels, and closer to the 100%-side of power-efficiency if you're focusing on idle / near-zero-GHz side of performance. (Pushing to 3GHz is less power of a power difference, but lower idles do have a sizable contribution in practice)
-----
Oh right: and TSMC N5P is 10% less power and 5% speed improvement over TSMC N5 (aka: what TSMC figured out in a year). There's the bulk of your 17% difference from A15 and A14.
Are you saying that if another company, say AMD, had access to TSMC's 5nm process than it would easily achieve comparable performance/watt to what Apple has done with the M1 series?
I'm saying that 15.5% of the 17% difference from Apple A14 to Apple A15 is accounted for in the TSMC N5 to TSMC N5p upgrade (Aka: 10% fewer watts at 5% higher clock rates).
The bulk of efficiency gains has been, and for the foreseeable future will be, the efficiency of the underlying manufacturing process itself.
There's still a difference in efficiency above-and-beyond the 15.5% from the A14 to the A15. But its a small fraction of what the __process__ has given.
---------
Traditionally, AMD never was known for very efficient designs. AMD is more well known for "libraries", and more a plug-and-play style of chip-making. AMD often can switch to different nodes faster and play around with modular parts (see the Zen "chiplets"). I'd expect AMD to come around with some kind of chiplet strategy (or something along those lines) before I expect them in particular to take the efficiency crown.
NVidia probably would be better at getting high-efficiency designs. They're on a weaker 8nm Samsung process yet still have extremely good power/efficiency curves.
I like AMD's chiplet strategy though as a business, and as a customer. Its a bit of a softer benefit, and AMD clearly has made the "Infinity Fabric" more efficient than anyone expected it could get.
Zen4 will be going head-to-head with Apple A16 on N5P next year and it's pretty doubtful we see Zen4 come out ahead on perf/watt let alone IPC.
It's not all node advantage - Apple designed a much wider core than is feasible with x86. They have an insanely wide reorder buffer and many execution units, and can decode more instructions to keep it all fed. Even with the node shrink allowing you to throw more transistors at it, x86 poses obstacles to using the same approaches as Apple, and they've exhausted most of their own approaches.
> Process is nearly everything in performance/watt.
ARM has consistently beat x86 in performance/watt at larger node sizes since the beginning. The first Archimedes had better floating point performance without a dedicated FPU than the then market-leading Compaq 386 WITH an 80387 FPU.
A lot of the extra performance of the M1 family has nothing to do with node, but with the fact the ARM ISA is much more amenable to a lot of optimizations that allow these chips to have surreally large reordering buffer, which, in turn, keep more of the execution ports busy at any given time, resulting in a very high ICP. Less silicon used to deal with a complicated ISA also leaves more space for caches, which are easier to manage (remember the more regular instructions), putting less stress on the main memory bus (which is insanely wide here, BTW). On top of that, the M1 family has some instructions that help make JavaScript code faster.
So, assume that Intel and AMD, when they get 5nm designs, will have to use more threads and cores to extract the same level of parallelism that the M1 does with an arm (no pun intended) tied behind its back.
> optimizations that allow these chips to have surreally large reordering buffer
But only Apple's chip has a large reordering buffer. ARM Neoverse V1 / N1 / N2 don't have it, no one else is doing it.
Apple made a bet and went very wide. I'm not 100% sure if that bet is worth the tradeoffs. I'm certain that if other companies thought that a larger reordering buffer was useful, they'd have done it.
I'll give credit to Apple for deciding that width still had places to grow. But its a very weird design. Despite all that width, Apple CPUs don't have SMT, so I'd expect that a lot of the performance is "wasted" with idle pipelines, and that SMT would really help out the design.
Like, who makes an 8-wide chip that supports only 1 thread? Apple but... no one else. IBM's 8-wide decode is on a SMT4 chip (4-threads per core).
SMT is a good way to extract parallelism when your ISA makes it more difficult to do (with speculative execution/register renaming). ARM, it seems, makes it easier to the point I don't think any ARM CPU has been using multiple threads per core.
I would expect POWER to be more amenable to it, but x86 borrows heavily from the 8085 ISA and was designed at a time the best IPC you could hope to get was 1.
> I don't expect the M1 Pro to have very good double-precision GPU-speeds.
Compared to what? There are no laptops quite like these new Apple laptops. Anything with faster graphics also uses LOADS more power and runs WAY hotter.
> Compared to what? There are no laptops quite like these new Apple laptops. Anything with faster graphics also uses LOADS more power and runs WAY hotter.
Using 2x the power for 2x the bandwidth (on top of significantly more compute power) is a good tradeoff, when the NVidia chip is 8nm Samsung vs Apple's 5nm TSMC.
In any case, the actual video game performance is much much worse on the M1 Pro. The benchmarks show that the chip has potential, but games need to come to the system first before Apple can decidedly claim a victory.
> If the native version doesn't exist then... gamers don't care?
I don't think it's a fair assessment of the machine capabilities. Also, games WILL be ported to the platform AND if you really need your games running at full speed, you can keep the current computer and postpone the purchase of your Mac until the games you need are available.
Next-generation games will be made on the platform. Current-generation and last-generation games no longer have much support / developers, and no sane company will spend precious developer time porting over a year-old or 5-year-old game to a new platform in the hopes of a slim set of sales. (Except maybe Skyrim. Apparently those ports keep making money)
Your typical game studio doesn't work on Skyrim though. They put in a bunch of developer work into a game, then by the time the game is released, all the developers are on a new project.
And that's why gamers are buying the Surface Book instead?
The "gamer" community (or really, community-of-communities) only cares if their particular game runs quickly on a particular platform.
Gamers don't really care about the advanced technology details, aside from the underlying "which system will run my game faster, with higher-quality images" (4k / raytracing / etc. etc.)?
No, that's why having x86 emulation performance be this good is a minor miracle.
Native performance would be expected to be inline with what the benchmarks are showing.
The MacBook Pro Max would beat the 100 watt mobile variant of the 3080, especially if you unplug both laptops from the wall where the 3080 has to throttle down and the MacBook does not.
> No, that's why having x86 emulation performance be this good is a minor miracle.
No gamer is going to pay $3000+ for a laptop with emulation when $2000+ gamer laptops are faster at the task (aka: video games are faster on the $2000 laptop).
------
Look, gamers don't care about all games. They only care about that one or two games that they play. If you want to attract Call of Duty players, you need to port Call-of-Duty over to the Mac, native, so that the game actually runs faster on the system.
It doesn't need to be an all-or-nothing deal. Emulation is probably good enough for casuals / non-gamers who maybe put in 20 hours or less into any particular game. But anyone putting 100-hours or more into a game will probably want the better experience.
> No gamer is going to pay $3000+ for a laptop with emulation
They pay $3000 for a laptop whose fans hit 55 decibels at load and that has to throttle way down slower than the MacBook if you use it like a laptop and go somewhere without a power outlet.
The Mac doesn't even do raytracing, does it? So you're already looking at a sizable quality downgrade over AMD, NVidia, PS5, and XBox Series X.
I think the eSports gamers will prefer FPS over graphical fidelity, so maybe that's the target audience for this chip ironically.
But adventure gamers who want to explore raytraced worlds / prettier games will prefer the cards with raytracing, better shadows, etc. etc. (See the Minecraft RTX demo for instance: https://www.youtube.com/watch?v=1bb7wKIHpgY)
Look, my Vega64 raytraces all the time when I hit the "Render" button on Blender.
But video-game raytracing is about hardware-dedicated raytracing units. Software (even GPU-software rendering) is an order of magnitude slower. Its still useful to implement, but what PS5 / XBox Series X / AMD / NVidia has implemented are specific raytracing cores (or in AMD's case: raytracing instructions) to traverse a BVH-tree and accelerate the raytracing process.
"Can do Raytracing" or "Has an API for GPU-software that does raytracing" is just not the same as "we built a raytracing core into this new GPU". I'm sure Apple is working on their raytracing cores but I haven't seen anything yet that suggests that its ready yet.
> the actual video game performance is much much worse on the M1 Pro
This is a workstation. For games one should look for a Playstation ;-)
2x power also means half the battery life. Remember this is a portable computer that's thin and light beyond what would be reasonable considering its performance. Also, remember the GPU has full 400GBps access to all of the RAM, which means models of up to 64GB won't need to pass over the PCIe bus.
This article shows why - actual performance in games isn't great. Yes, its partially held back by x86->ARM translation, but if you are a gamer, this isnt a particularly compelling system.
This part of the article was particularly cringe. People don't seem to realize how much of a tech prowess it is that x86 builds of those two games are even running at an acceptable framerate.
That said, it's been a year since the M1 release, and Apple could have paid a few hundreds/millions for a few AAA game ports. They didn't, and that says how much they care about gaming on these devices.
> that says how much they care about gaming on these devices
Apple's bread and butter is iOS gaming (where it takes in more game profit than Sony, Microsoft, Nintendo and Activision combined) rather than Windows PC game ports to macOS.
> Apple's bread and butter is iOS gaming (where it takes in more game profit than Sony, Microsoft, Nintendo and Activision combined) rather than
Any source on this "game profit" metric ? It seems plausible, but I was looking for the details of this calculation/estimation. For example, what percentage of App Store revenue is game related ?
> Windows PC game ports to macOS.
I specifically said AAA games, not Windows PC games.
>This part of the article was particularly cringe. People don't seem to realize how much of a tech prowess it is that x86 builds of those two games are even running at an acceptable framerate.
Spoken like someone who has never seen how poorly Microsoft's x86 to ARM emulation works.
Reason for it not being a compelling system aren't inherently obvious to most people:
Games are developed currently to run on x86 Windows machines. Predominantly Windows machines. If the game is designed to run on MacOS, then more likely you'll see a performant experience. The issue isn't inherently architecture or Apple's chips, it's the lack of software choices available on the platform. Though you can now argue that Apple's platform has increased significantly due to the compatibility with the App Store bringing games from mobile to the desktop.
Gamers would like to play the games they've bought (or free to play that they've dedicated time to) on platforms those games support, and most of those games do not support MacOS, or Linux.
However, with Proton emulation, and EasyAC & BattlEye now working with Valve to improve Anti-Cheat on Linux, we may see a greater compatibility with the aforementioned systems enabling cross-platform play.
For a laptop chip, single threaded integer performance is on par.
Multi-threaded integer and floating point performance is not.
>In the aggregate scores – there’s two sides. On the SPECint work suite, the M1 Max lies +37% ahead of the best competition, it’s a very clear win here and given the power levels and TDPs, the performance per watt advantages is clear. The M1 Max is also able to outperform desktop chips such as the 11900K, or AMD’s 5800X.
In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of.
>In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market.
How much of that performance is due to the M1 Pro/Max having way more memory bandwidth than the Intel/AMD chips, and also being specifically designed from the ground up to make use of all that bandwidth? AFAIK the RAM used by the M1 Pro/Max is more similar in performance to the GDDR used in graphics cards vs the slow-ish ageing DDR4 used in Intel/AMD systems that are designed to prioritize compatibility with RAM of varying quality, speeds and latencies instead of raw performance at a specific high speed.
So I'm curious to know how an X64 chip would perform if we even the playing field not just in node size but also if Intel and AMD would adapt their X64 designs from the ground up with a memory controller, cache architecture and instruction pipeline tuned to feed the CPU with data from such fast RAM.
I'm asking this since AFAIK, Ryzen is very sensitive to memory bandwidth, the more you give it the better it performs to the point where if you take two identical laptops with the same Ryzen chip but one has 33% faster RAM, then that laptop will perform nearly 33% better in most CPU/GPU intensive benchmarks, all things being equal.
> So I'm curious to know how an X64 chip would perform if we even the playing field not just in node size but also if Intel and AMD would adapt their X64 designs from the ground up with a memory controller, cache architecture and instruction pipeline tuned for that kind of fast RAM.
We can get a pretty good idea about this by looking at Threadripper, which has more memory channels:
(This is Zen 2 Ryzen vs. Zen 2 Threadripper because Zen 3 Threadripper isn't out yet.)
In nearly everything it's about the same, because most workloads aren't memory bandwidth limited. But then you get into the multi-threaded SPEC tests where the M1 Max is doing really well, and Threadripper does really well. And this is still with less memory bandwidth than the M1 Max because it's using DDR4 instead of LPDDR5.
The lesson here is that memory bandwidth limited workloads are limited by memory bandwidth.
Should be fine to compare to an EPYC 72F3 or 73F3 (Zen3, 8/16 cores, 8 channel DDR4-3200 RDIMM (204.8 GB/s theoretical ceiling), $2500/$3500).
If memory latency is that important, one would AFAIK have to compare to Zen3 Threadripper, because EPYC (and afaik Threadripper Pro, the other 8-channel variant of Zen3) is rather locked down in regards to memory "overclock"s.
(Note that this is more of a latency/efficiency tradeoff, an any moderate OC is trivial to cool.)
Not if you actually understand the technical constraints vs the business applications of each product in their specific market segments.
A Ford F-150 can tow more weight than a Lamborghini despite having less power and being significantly cheaper but each is geared towards a different use case so direct comparisons are just splitting hairs.
That's also a good point. EPYC has 2TB or 4TB of DDR4 RAM support.
That being said: its amusing to me to see the x86 market move into the "server-like" arguments of the 90s. x86 back then was the "little guy" and all the big server folks talked about how much bigger the DEC Alpha was and how that changes assumptions.
It seems like "standard" servers / systems have grown to outstanding sizes. The big question in my mind is if 64GB RAM is large enough?
Moana scene (https://www.disneyanimation.com/resources/moana-island-scene...) is 131 GBs of RAM for example. It literally wouldn't fit on the M1 Max. And that's a 2016-era movie, more modern movies will use even more space. The amount of RAM modern 3d artists need is ridiculous!!
Instinctively, I feel like 64GB is enough for power-users, but not in fact, the digital artists who have primarily been the "Pro" level customers of Apple.
> Moana scene (https://www.disneyanimation.com/resources/moana-island-scene...) is 131 GBs of RAM for example. It literally wouldn't fit on the M1 Max. And that's a 2016-era movie, more modern movies will use even more space. The amount of RAM modern 3d artists need is ridiculous!!
I doubt there was any laptop available in 2016 that could be loaded with enough RAM to handle those Moana scenes. I doubt such beasts exist in 2021.
It seems the M1s are showing that Apple can just increase the number of cores and memory interfaces to beef up the performance. While there's obviously practical limits to such horizontal scaling, a theoretical M1 Pro Max Plus for a Mac Pro could have another doubling of memory interfaces (over the Mac) or add in an interconnect to do multi-socket configurations.
That's all just horizontal scaling before new cores or a smaller node process becomes available. A 3NM process could get roughly double the current M1 Max circuitry into the same footprint as today's Max.
> That's all just horizontal scaling before new cores or a smaller node process becomes available. A 3NM process could get roughly double the current M1 Max circuitry into the same footprint as today's Max.
I/O / off-chip SERDES doesn't scale very easily.
If you need more pins, you need to go to advanced packaging like HBM or whatnot. 512-bit means 512-pins on the CPU, that's a lot of pins. Doubling to 16-channel (1024-bit bus) means more pins.
You'll run out of pins on your chip, not without micro-bumps that are on HBM. That's why HBM can be 1024-bit or 4096-bit, because it uses advanced packaging / microbumps to communicate across a substrate.
I am waiting for the photography pros of YouTube to weigh in on that last bit but the Disney bit about 131GB of ram usage is intense. Surely a speedy disk can page but likely not enough to make the 64GB of ram a bottleneck. Maybe things like optane or ssds will get so much quicker that we'll see a further fusion of io down to disks and one day we'll really have a chip that thinks all it's storage is ram/storage and doesn't really distinguish between it. Sure it's unlikely SSDs will get to 400GB/s in speed but if they could get to 10GB/s or more sustained that latency could be handled by smart software probably.
I think for that cohort any future iMac Pro or Mac Pro with the future revisions of these chips will surely increase the ram to 128GB maybe even 256GB or more.
I am super curious how Apple will tackle those folks who want to put 1TB of ram or more into their systems if they'll do an SOC with some ram plus extra slotted ram as another layer?
> Sure it's unlikely SSDs will get to 400GB/s in speed but if they could get to 10GB/s or more sustained that latency could be handled by smart software probably.
Yeah, I'm not part of the field, but the 3d guys always point to "Moana" to show off how much RAM they're using on their workstations. Especially since Disney has given away the Moana scene as a free-download, so anyone can analyze it.
The 131GB is the animation data (ie: trees swaying in the wind). 93GBs are needed per frame (roughly). So the 131GB can be effectively paged, it takes several seconds (or much much longer) to render a single frame. So really, the data is over 220GBs needed for the whole scene.
In practice, a computer would generate the wind and calculate the effects on the geometry. So the 131 GB animation data could very well be procedural and "not even stored".
The 93GB "single frame" data however, is where all the rays are bouncing (!!!) and likely needs to be all in RAM.
That's the thing: that water-wave over there will reflect your rays basically anywhere in the scene. Your rays are, in practice, bouncing around randomly in that 93GB of scene data. Someone managed to make an out-of-core GPU raytracer using 8GB of GPU-VRAM (they were using a very cheap GPU) to cache where the rays are going, but it still required keeping all 93GB of scene data in the CPU-RAM.
> Instinctively, I feel like 64GB is enough for power-users, but not in fact, the digital artists who have primarily been the "Pro" level customers of Apple.
I'd assume the Mac Pro will have _significantly_ more RAM. Artists who need >64GB were _never_ particularly well-served by the MacBook Pro (or just about any other laptop).
EPYC almost certainly has more compute power though.
Honestly, the memory bandwidth is imbalanced. Its mostly there to support the GPU, but the CPU also gains benefits from it. Its hard enough to push an EPYC to use all 200GBps in practice.
EDIT: For workstation tasks however, 64GB is huge for a GPU, while 400GBps is huge for a CPU. Seems like win/win for the CPU and GPU. Its a very intriguing combination. GPU devs usually have to work with much less VRAM, while CPU devs usually have to work with much less bandwidth.
64GB is small for CPU workstation tasks however. Its certainly a strange tradeoff.
Which is why I suggested comparing to the EPYC 73F3, which is a 5950X clocked at 3.5 - 4 GHz, with 4x the L3$, 4x the memory bandwidth (if you don't overclock it), and 5~6x the IO bandwidth.
We know a 5950X is roughly on-par with an M1 Max (at least ignoring the latter's 2 efficiency cores).
If the occasional wins of the M1 Max are due to memory bandwidth, this should more-or-less turn the tables.
HBM's downside is that it requires many, many, many pins. Each channel is 1024-pins of communications (and more pins for power). In practice, the only thing that can make HBM work are substrates. (Typical chips have 4x to 6x HBM stacks, for well over 4096 pins to communicate, plus more pins for power / other purposes)
But HBM is among the lowest power technologies available. Turns out that clocking every pin at like 500MHz (while LPDDR5 is probably a 3200 MHz clock) saves a lot on power. Because DRAM has such high latency, the channel speed is more for parallelism more so than anything else. (DDR4 parallelizes RAM into 4-bank groups, each with 4-banks. All 16 can be accessed in parallel across the channel).
HBM just does this parallel access thing at a lower clock rate, to save on power. But spends way more pins to do so.
For Ryzen, its Infinity Fabric clock speed is linked to RAM clock speed, and Infinity Fabric speed is important for performance. It's not same thing as that RAM bandwidth is important.
The laptop ones are already 20mm. That likely isn't very good for yield. Going even larger would likely be ruinous in the cost department. Wouldn't putting multiple M1 Max be a better idea?
>Codenamed Jade 2C-Die and Jade 4C-Die, a redesigned Mac Pro is planned to come in 20 or 40 computing core variations, made up of 16 high-performance or 32 high-performance cores and four or eight high-efficiency cores. The chips would also include either 64 core or 128 core options for graphics.
Gurman at Bloomberg - his article mailed the M1 Pro/Max (in March!), and this is what it says about the next stage:
"Codenamed Jade 2C-Die and Jade 4C-Die, a redesigned Mac Pro is planned to come in 20 or 40 computing core variations, made up of 16 high-performance or 32 high-performance cores and four or eight high-efficiency cores. The chips would also include either 64 core or 128 core options for graphics."
Yes, this year's iPhone chip did get a newer version of the performance core.
>with a score of 7.28 in the integer suite, Apple’s A15 P-core is on equal footing with AMD’s Zen3-based Ryzen 5950X with a score of 7.29, and ahead of M1 with a score of 6.66
You'll have to look in the charts, but on single threaded floating point the scores are 10.15 for the A15, 9.58 for the 11900K, and 9.79 for the 5950X.
Having your phone chip match or beat Intel and AMD's desktop variants on single core performance (with a phone's memory bandwidth) is fairly impressive in itself.
Is ist? I Tought that glueing memory in the die will always yield a better memory bandwidth? Also the new phone Uses ddr5 which is not possible in the desktop (yet)
ah yeah LPDDR4X, well it's LPDDR4X memory which is fater than conventionell DDR4 memory, I tought it's ddr5 already since lpddr memory comes way before next gen dram/sdram
I don't think these efficiencies are just from the node advantage. The fact is that Apple chips follow mobile designs and are highly integrated SoCs where Apple can optimize every aspect of the system in exchange for losing flexibility (no mixing and matching of components).
In contrast to mobile processors, x86 processors live in a world where flexibility is demanded. I need to be able to pick how much RAM I want, which WIFI modem, which graphics, and so on (where I is a combination of the consumer and laptop manufacturer). Sure, laptop processors have gotten more integrated lately, but it's not to the same degree. Competition from Apple might pressure Intel and AMD to integrate much more and sacrifice this flexibility in order to squeeze out better power efficiencies.
As a teenager in 2005/ 2006, had an obsession with overclocking AMD Opteron 165's. DFI motherboards allowed you to set the ratios for FSB/HTT, LDT/FSB, CPU/FSB, etc.
I'd hunt like crazy for specific OCZ DDR2 RAM modules from specific batches that had the tolerances I was looking for. At a few points, had the highest perf overclocks (even among liquid cooling - mine was passive with a polished heatsink) on various leaderboards and my 4GB DDR2 system frequently could beat out 8GB DDR3 systems (with full stability via MemTest x86) on GeekBench like tests.
WRT Apple Silicon I think about those days a lot - just thinking about the perf AMD, OCZ, and DFI could have repeatedly squeezed out if they all were one company setting the same tolerances on all silicon and power delivery.
I have to imagine a large amount of the perf wins come from having consistent FSB, HTT, and LDT channels that can have the channel relay ratios optimally configured instead of buffering up the "lowest common denominator" silicon manufacturing tolerances.
>Will we get such efficiencies when intel hits 5nm?
Judging from everything we know. Even in the most optimistic scenario. The answer is no.
Note: And Intel doesn't have 5nm, they go to 4nm and then 3nm. But the answer is still the same.
Edit: For those wondering how the conclusion was arrived, you take a look at the Alderlake SPEC_INT and Geekbench score, look at the power usage per core. ( Forget MT benchmarks ), you scale it by target IPC improvement and node improvement. You should see the gap is in terms of efficiencies is still behind.
Isn't AMD still a node behind Apple? They both use TSMC, but my impression was that Apple was the largest customer bankrolling the leading node and therefore got first crack at it.
I am surprised that Apple sources more wafers than AMD from TSMC. Are they really the largest customer in terms of wafers, or are they getting better deals thanks to their enormous cash reserves and financing abilities?
Yep, Apple is the biggest customer. Apple is 25% of TSMC's revenue, AMD is only 10% -- and that's after the recent growth spurt. Just one year ago, AMD was behind Apple, Huawei, Qualcomm, Broadcomm, and NVidia.
Apple can probably also afford to spend more. Like, if going from 7nm to 5nm early puts an extra $30 per laptop chip on their chips, that is probably fine for Apple; they're charging the same for a base-model M1 16" MBP as they did for the Intel one, but their costs are almost certainly quite a lot lower (the markups on higher-end Intel laptop chips are enormous, and they also don't have to pay for a GPU).
For AMD, that might be a much larger deal; on the low-end that sort of price increase would give them a lot of trouble. Apple doesn't really have a low-end (even the chips in the Intel MacBook Air were quite expensive), giving them more flexibility here.
Heck, Sony & Microsoft can't even get PS5s and Xboxes in customer hands 1 year after release. I would expect it is a combination of Apple's volume of phones shipped/quarter and cash reserves.
Intel has shown that you can just store the sizes of instructions in a new cache (in their new architecture) for example.
But even then: it shouldn't be much more than O(n) work / O(log(n)) depth to determine the length of an instruction. Once lengths are known, you can perform parallel instruction decoding rather easily.
Ex: given "Instruction at X, X+1, X+5, and X+10", decode the instructions. Well, simple. Just shove a decoder at X, X+1, X+5, and X+10. 4 instructions, decoded in parallel.
Even with "Dynamic" length (ex: X, X+4, X+6, and X+7 afterwards), its clear how to process these 4 instructions in parallel. Really not a problem IMO.
--------
So solving the length thing is a bit harder to see, but clearly could be implemented as a parallel-regex (which is O(n) work and O(log(n)) depth).
I seriously doubt that decoding is really a problem. I'd expect that Apple just has made many small efficiency gains across the chip, especially the uncore.
I'm personally looking at the L1 / L2 cache hierarchies more so than anything on this Apple chip.
Consider Intel has a more or less infinite amount of money and they don't seem to be able to do that. And they have tried (I even had an Atom-based Android phone for a while).
If you want an easy way to build a reorder buffer, you'll need to push every instruction in a structure that fits, IIRC, 15 bytes, which is the longest x86 instruction possible (for now - mwahahaha). This alone will make it twice as large as a similar arm64 one. Now factor in that the dependencies between instructions are defined in bits that can pretty much be all over the place in those 15 bytes and you end up with a nightmare most engineers would consider suicide before having to work on it.
Or maybe, the problem isn't as hard as you think it is.
Look, I started programming in GPUs a year or two ago. I've begun to "think in parallel", and now I'm beginning to see all sorts of efficient patterns all over the place.
The actual CPU-architects have known about kogge-stone carry-lookahead longer than I have. I'm still a newbie to this mindset of parallel computations... but I enjoy reading the papers on PDEP / PEXT / other parallel structures these CPU designers are doing (and these structures have gross implications to how GPU code should be structured).
But I've had enough practice with Kogge-stone / Carry-lookahead / Prefix-sum / scan pattern (yeah, its __all__ the same parallelism), and this pattern has been well published since the 1970s. I have to assume that engineers know about this stuff.
Instruction length decoding is very clearly a kogge-stone pattern / prefix sum / scan problem to me. Now, I'm not a chip architect and maybe there's some weird fanout / chip level thing going on that my ignorance is keeping me out of... but... based on my understanding of parallel systems + very, very common patterns well known to that community, I'd expect that chip-designers would just Kogge-stone their way out of this decoding problem.
-------
Like, I'm coming in from the reverse here. I suddenly realized that chip-designers have incredibly active minds about the layout and structure of parallel computing mechanisms, and have now taken an interest in studying some CPU-level parallelism techniques to apply to my GPU code.
The CPU-designers are way ahead of us in "parallel thinking". I'm a visitor to their subject, they do this stuff for breakfast every day. They have to see the Kogge-stone solution to the decoding problem. If not, they've thought of something better.
Freaking downvotes. What ever happened to actually discussing the technicals here on Hacker News?
I'll help ya out. Kogge-Stone is the paper from March 1972 that states that ANY recurrence relation "x_i = f(b_i, g(a_i, x_i-1))" can be parallelized, as long as f and g satisfy distributive and associative-like properties, "executing in time proportional to log2(n)" (aka: O(log2(n)), but this paper was probably before O-notation was popular).
The if the Kogge-Stone algorithm is executed sequentially, time is proportional to n.
As you can see, Kogge-Stone can efficiently parallelize (aka: execute O(log2(N)) depth, O(N) total-work) a wide variety of recurrence relationships that initially seem sequential.
-----------
GPU programmers know this pattern as prefix-sum. In CPU-architecture books, its called Kogge-Stone.
----------
Maybe this will help? Lets say "1" is "this byte is a start-of-instruction" and "0" is "this is not the start-of-instruction".
For example: "1 0 0 1 0 1 1 0" would contain instructions at i_0, i_3, i_5, and i_6.
So lets say we have 4-parallel decoders. Decoder #1 gets index 0, 1, 2. Decoder#2 gets byte 3 and 4. Decoder#3 gets byte 5, decoder #4 gets byte 6 and 7.
See how that works? Just Kogge-Stone it up.
----------'
I assert that "f" itself can also be computed using Kogge Stone, but maybe splitting it up like this is easier? Or will I also have to prove "f" for yall to see it?
L1 $I cache is already read-only / Harvard architecture. There's only two states here: size == unknown, and size == known. This is a simple size = 0 (default), and size=X (where X is the known size of the instruction) situation.
x86 architecture states that if you write to those instructions, you need to flush L1 cache (ex: JIT Java) before the state is updated. L1 instructions are non-cohesive, so it isn't very hard. Upon the flush, set sizes back to 0 and you're done.
> This complicated decoding makes a pipeline longer. This means that in case of branch misprediction there would be a large penalty.
PDEP / PEXT are single-clock tick instructions and are far more complex than what I'm proposing here. As is AESRound.
I think you're underestimating the number of gates you can put in parallel and execute in a single stage of the pipeline. 64-bit PDEP / PEXT are more complicated than say... a 64-byte parallel adder in terms of depth. (PDEP / PEXT need both a butterfly circuit forward + inverse butterfly back + a decoder in parallel. 64-byte prefix sum is just one butterfly forward).
When you get to instruction X, "size cache" says "size 2", and this allows you to process instruction X+2 in parallel.
You look at instruction X+2, and "size cache" says "size 4", which allows you to look at instruction X+6 in parallel. Finally, you look at instruction X+10, and it says "size 8", which ends with +18 as where the instruction pointer ends.
This was sequentially described at first, but the parallel version is called Prefix Sum: https://en.wikipedia.org/wiki/Prefix_sum . This allows you to take a set of sums (like say 0, 2, 4, 8) and in parallel, figure out [2, 6, 10, 18], with 18 being the new location of the instruction pointer, and [0, 2, 6, 10] being the 4 instructions you process this clock tick.
A parallel adder across say, 32-bytes would be able to perform this prefix sum very quickly, probably within a clock tick. These sorts of parallel structures (aka: butterfly circuits) are extremely common in practice, your carry-lookahead adders need them, as well as PDEP / PEXT. Intel's single-cycle PDEP/PEXT is way more complicated than what I'm proposing here, I kid you not. (Seriously, the dude who decided to make single-clock cycle PDEP/PEXT or single-clock cycle AESRound would have spent more time than the size-cache that Intel is now using on instruction decoding)
> The problem is that you don't know the instruction length until you've decoded it so where is the size cache getting the size?
You are aware that x86 _CURRENTLY_ (ie: without size-cache) decodes 4-instructions per clock tick, right? That's in parallel, as currently implemented.
Intel just seems to think the size-cache is a potential solution for going faster. I've given it some thought and it seems like it could very well be worth the 4-bits (or so) per byte it'd cost.
----------
A parallel size-calculator would also be O(n) work and O(log(n)) depth, by using a parallel regex/finite automata on calculating the sizes for arbitrary lengths upwards.
That's only second stage decoding. In the fetch/pre-decode stage there is a specific piece of hardware that partially decodes 16 byte chunks of instruction streams before inserting instructions into the instruction queue for the decode from macro-op to uop. It can only handle 6 instructions per clock and 16 bytes per clock. If you have 7 instructions in a 16 byte block it takes two cycles to process that block. If you have only 2 instructions in that 16 byte block you only get those two instructions. It also only looks for instruction length and branching to feed the branch predictor, spitting the same instructions back out albeit fused and tagged with index numbers for insertion into the instruction queue ready for the second stage decoders.
This is the length that Intel has to go to in order to keep the EUs fed. Apple/ARM? Every 4 bytes there's an instruction.
The 2nd stage uop can go 6-per or 7-per clock tick. But 1st stage (which executes in practice when the uop cache is thrashing) would still go 4-instructions per clock tick just fine.
> This is the length that Intel has to go to in order to keep the EUs fed.
Yeah. And the task described is O(n) total work and O(log(n)) depth. So... not a big difference? I'd have my doubts that the instruction-length portion of the decoder was taking up a significant amount of power.
Sort of... RISC and CISC are really misnomers. The problem with X86 is not the number of instructions (ARM has a lot too!) but the variable length and difficult to decode instruction format. It's fine to have tons of instructions if they are trivial to decode and decoding can be easily parallelized.
CISC was not a mistake when RAM was hundreds of dollars per kilobyte in the 70's and early 80s. Made sense to get as much out of a byte of memory instruction-wise as possible.
Not exactly a mistake, more of a compounding factor of market forces, who could execute and deliver (and who could not), and the rise of worse is better.
The 80s was a time of research and experimentation, occasionally getting pretty wild. Intel had an «object-oriented» CPU, iAPX 432, which has slightly faster than immediately flopped. And they also had a hybrid RISC/VLIW design (i860) that could be outperform every other design out there – if the stars up in the sky would converge in the right space and time sequence, and the compiler could bundle instructions up efficiently.
Intel also had a very good i960 RISC design. Which was, in fact, so good that the boys with MBA over at Intel have got stumped not knowing what to do with it and chickened out (i960 was considered as a x86 replacement for a time and the future would look quite different from today).
Motorola was flirting with CPU's for a long while as well, but never took them seriously enough to see them through to the end as a serious business, for defence contracts and the field radio equipment (and later mobile phones) were their two major cash cows.
By the way, Motorola also had cool DSP designs with the true Harvard architecture with multiple data buses, and a single 3 operand instruction, e.g. «ADD A, B, C» could transfer A, B, C using three separate data buses – simultaneously. Data transfers were insanely fast.
INMOS got busy building transputers that could be «grown» infinitely (theoretically) into a gigantic computing «thing», and were programmed in Occam, with no assembly language even being available for the chip. They did not get anywhere.
The Japanese built the Smalltalk VM byte codes into a Katana processor, and were mulling over the idea of doing the same for Prolog. Prolog and AI started getting really big back then until both faded into obscurity for a couple of decades.
There was no shortage of great designs and ideas, but no-one had an idea of what the future of personal computing was going to be, therefore everyone was hedging varying bets. Digital had great hardware but they bet on minicomputers and snobbishly continued to ignore PC's until it was too late. They actually missed the server market boat as well as, by the time when they arrived, the bed had already become crowded with new lovers, and only the ones who had more money could win.
RISC vendors were riding the wave out with servers and workstations, until around the mid-aughts, when Intel finally ramped up the production of cheap and dirty Pentium 4 CPU's that could be used in a simple SMP set up, and Google / Facebook were quick to proceed with building out their own server farms filled with cheap and disposable commodity x86 blades. It quickly became too expensive for vertically integrated RISC vendors to keep their own CPU design teams on the payroll. One by one, the RISC vendors have all, too, gradually faded into obscurity.
One thing that absolutely sucked back them was that nearly no one could afford any of those amazing toys, unless it was a business with a fat budget. You could read, but you could not touch most of them. The documentation was gorgeous, though, and also prepared by professional technical writers in Adobe FrameMaker. Now, I can spin up a VM in a public cloud with a custom built TPU (or, NPU) within a few moments, use for as long as I have to, and I won't have to leave my own desk chair and it will be mine for as long as I can afford paying for it.
All of that has left one with pretty much one living CISC fossil and multiple off-shots of the load-and-store architecture, of which RISC is one.
Now all ARM and Apple need to do is to persuade governments to ban x86 as not being energy efficient. Just like they want to ban diesel cars etc. Could be a tough battle for Intel.
I’m not a gamer. And even if the CPU can’t take advantage of the 400GB/s 250 or so is very good indeed.
This is all low hanging fruit for future revisions of this chip. The A15 based cores will further improve single core IPC and in turn make MT workloads even better. Basically if this is the floor then the sky won’t be high enough to contain where we go next.
Actually, TDP stands for "Thermal Design Power" and is not a range. It means "I, the designer, designed it so that this is maximum amount of waste heat as it can safely produce continuously in normal use". It is mainly limited by its physical package and maximum temperature at which the internal components can run.
That you can't observe that max power is due to the fact that those various applications stress the CPU in various ways, not always being able to exercise all internal structures to their maximum potential, at the same time.
> One should probably assume a 90% efficiency figure in the AC-to-DC conversion chain from 230V wall to 28V USB-C MagSafe to whatever the internal PMIC usage voltage of the device is.
(This was regarding idle power usage)
Highly unlikely. I design AC switching power supplies from first principles (and stacks of books). Efficiencies above 90% are normal for newer designs but the PSUs are designed to achieve these efficiencies above significant percentage of their design power. High efficiency at design power is important because it limits worst case waste heat which in turn makes it possible to create smaller PSU. But as PSU is a lot of tradeoffs, one of the tradeoffs that is taken is lesser efficiency at lower power where it doesn't matter as much.
Typically, the lower load on the PSU as portion of design power the lower the efficiency. If the PSU is designed for 140 watt 90% efficiency, I would expect that at 7 watt it is actually much less efficient probably somewhere between 70 and 80 percent.
If people are wondering about why some people in the comments are reporting 3080 vs 3060 levels of performance, it's based on the workload. On synthetic (native I assume) benchmarks, the M1 Max reaches 3080 levels, but in gaming benchmarks (using x86) it reaches 3060 levels.
It's interesting especially since it sounds like the reason it doesn't reach "close to 3080" in many games is because it's CPU bound, specifically because it's emulating x86.
Once we get more benchmarks with non-rosetta apps the picture may be rosier? That said, it's not like Apple was every the company for gaming machines so perhaps that will just be the state of things.
TFA also compares games at 4k, where it is very much GPU bound, and it is about half the speed of a laptop 6800. Which is not great. (And I am speaking as someone whose M1 Max arrives tomorrow).
The M1 GPU is vastly different than an AMD or nVidia GPU, and I suspect it will have not-great scores until someone writes a game and optimizes it specifically for the M1. Which is most likely never.
>and I suspect it will have not-great scores until someone writes a game and optimizes it specifically for the M1. Which is most likely never.
Don't forget that these days few demanding games are "optimized for Platform XYZ", they're generally using one a small set of middleware/engine they license. So it's more a question of if Unreal Engine and Unity and so on get optimizations aimed at Apple's M-series chips. That isn't out of the question at all, given that they definitely have optimization aimed at Apple's A-series chips. Once they do, everything going forward that uses them will be better "for free". Even if they don't hit the performance of something truly hand tuned just to make max use of the arch it won't be entirely ignored either. May not even be that much work.
That's another potential non-technical performance advantage in moving the Mac from x86, they get to piggyback in some areas off the enormously higher market share iDevices. We'll see how it works out of course.
It's been a while since I bought a "Pro" computer from Apple. I am kind of wondering about the perf-per-$$$ factor. With a starting price of $2000, these are expensive computers. But maybe they are worth it!
The M1 computers seemed like an absolute bargain for the performance.
They are tools. If you have specific workloads that these excel at in your job/hobby/money-making venture, then the price shouldn't be a concern.
Depending on workload, they are comparable to $1000 PC laptops in CPU performance... or $3000 PCs. Or PCs that don't exist yet!
As someone who uses a laptop for gaming, my $1000 laptop is infinitely superior to a $6000 Macbook Pro (for the games that I play). For almost every other use, the Macbook Pro is likely far superior!
If you do Final Cut or Xcode work, these are the best tool available to you.
Yea, I originally ordered an M1 Max model after the presentation, then cancelled it when I realized for what I would use a GPU for (gaming and 3D development) a RTX 3080 laptop would be a much better choice. I also don’t care about performance/watt as much since I use my iPad for non-work stuff.
But the technology nerd in me still wants to buy one for completely irrational reasons.
Me too. I really want an M1* mac but I also realize that I just want to run Linux, so it's kind of pointless right now (I'm not a Linux kernel level developer, so yeah).
Yep, I reached pretty much the same conclusion as Dave: these are machines for very specific people running very specific software. Apple got their wish: they made the computer disappear, and now the Macbook Pro is a tool. For better or worse, this is the best way to experience the Apple ecosystem.
But also, if you're a developer without any interest in Apple (and maybe someone who wants to play games), the case for using Linux for general-purpose computing is stronger than ever. It will be interesting to see how Apple addresses their own issues in MacOS over the next few months, I've really got my fingers crossed for Vulkan support or 32-bit libraries making a return.
These MacBooks are definitely worth the money. They cost a lot but they are not overpriced.
You don’t have to consider just the CPU and GPU but the whole SoC.
CPU is impressive and GPU is good, but for standard workloads some PCs may give you slightly better performances (on the GPU side), at the cost of needing the power adapter to show that.
However, for some specific workloads (especially the ones involving ProRaw video) the custom modules in it make it perform better not only than a Mac Pro, but than every other machine in the market.
There is also the Neural Engine that could be more important in the future.
You may not need those modules, but it seems like we are forgetting this are laptops with screens, inputs and more.
These machines have one of the best screens, with high DPI, high refresh rate but most important the miniLED technology which brings true HDR. And that’s something very pricey.
Far from defending Apple, they could sell this laptops for less and we would all be happier, but at the end of the day these machines are worthy in every aspect (specific cases aside).
Displays are often undervalued part on Mac. Sometimes it's significantly cheaper than available on market, or can't buy. Example: MBP Retina 15/13 2012, iMac 5K 2015, MBP miniLED 2021.
A problem is that such great display is bundled even if don't need.
- I would never spend $400 for a 24" display, nor $400 for keyboard + mouse + cooling $400. I mainly use computers for HPC, so that's a totally unnecessary spending to me, you can get all those for just around $200-$300, so your estimate is unnecessarily inflated by at least $500-$600.
- You can't pair 5950x with DDR5 RAM. If future devices are on table, consider comparing against Alder Lake S CPUs, which beat M1 Max in all metrics. You can pair 5950x with DDR4 only, and 64GB DDR4 RAM prices start from just $215 (which is more than $500 less than what you quote).
- The GPU and DDR5 prices you are quoting are not their normal vendor prices (GPU shortage + virtually no RAM vendors started selling DDR5 and no CPUs or motherboards that can utilize them are sold as of today). MSRP price for RTX 3060 is $329, which is $400 less than the current unstable price. Remember that this pricing did not affect OEMs in the same manner, so you can buy computers from Dell/IBM/etc without inflated GPU prices.
- Not to mention the elephant in the room of lacking software support. Why willingly become vendor locked into using ARM Mac OS? Moreover, HPC software relies on Intel CPUs with Intel MKL and even using AMD's CPU can cripple performance. With an ARM CPU or x86/amd64 translation layer, that's far worse (which might also apply to gaming).
- Apple's offering still is a weaker computer in terms of raw hardware performance (especially on multi-core loads, by a large margin) for $500 (or $1000 if we cut luxurious spending on monitors, cooling, and an additional ~$1000 if we replace your DDR5 guesstimate with actually DDR4 prices and use MSRP price for the GPU) more by your estimation. The desktop RTX 3060 is also a better GPU.
So, why should I pay $2000 more for a weaker computer? (with broken-at-best HPC software/Linux support, no less!) That's just nonsense.
Given how difficult it's been to get GPUs recently due to supply chain crunch, are you sure you can actually get your order fulfilled without a significant wait period?
Having shopped it recently, the desktop monitor market is abysmal right now. To get a monitor remotely comparable to the new MBP internal displays you're looking at spending $3k on one of a tiny handful of FALD displays which run hot enough to need a fan and have worse DPI, or rearranging your desk setup to accommodate a 48" LG OLED TV which is subject to burnin with serious usage.
Apple can't release those lower-priced 27" displays that've been rumored soon enough.
I hope the next large iMac is at least a little bit above 27", after all, the entry-level one went from 21.5 to 24". A 30" iMac with mini-led and 120Hz would be just too sweet.
That would be pretty great, but unless they restore display passthrough capabilities with the M-series 27"+ iMacs, I'm really hoping for a standalone display that can be used both with a laptop and the rumored upcoming "Mac Pro Mini"/G4 Cube MKII.
Speaking about nostalgia. I hope they bring back the cube. The thing was amazing in design just the G4 was too hot. Now with Apple Silicon and that’s not an issue. Tons of Apple fans from back in the day like myself would
buy one just for nostalgia if nothing else.
That would be nice. Some of the rumors about the Mini refresh point to something looking like the cube did.
When all the Mac line is on the M2 generation, I would love Apple to add a Mac Nano based on the M1 to the lineup. Tiny, at the original price point of the Mini, like $499. They could greatly increase the market share of the Mac that way while keeping their other product tiers intact.
You went and found the cheapest versions of components you could find to make a false comparison. This has always been the difference between pc laptops and mac laptops. People say "I can get this cpu with this graphics card in a pc laptop for WAAYY cheaper" ignoring the memory, quality of the chassis, quality of the screen, quality of the speakers, the battery and just about everything else.
The NVMe drive I found is 2600MB/s. It also isn't the bottleneck in general.
Both the Samsung and the Apple SSDs are just a bad fit for a machine like this. For a non-I/O bound workload it's paying money for nothing. For a read-bound workload, the machine has 64GB of RAM, so you'd need a working set larger than that to have to care, and that's pretty rare. For a write-bound workload, at those speeds, you're going to melt a consumer-grade SSD and need an enterprise one. So you don't want the faster one in this machine; either it's not worth the price or it won't survive that amount of write load.
Good. Now you have an NVMe drive but a motherboard that doesn't support NVMe. That cheap-ass cooler will unlikely be able to run the x5950; these usually should be coupled with water-cooling as they get stupidly hot.
Also, if you are getting a $15 mouse/keyboard, why are you even gaming/doing graphic work in the first place? Might as well get a raspberry-pi.
That cheap-ass cooler isn't that bad. It's pretty middle of the road. Personally I'd have the Hyper 212 Evo for $39 because it's not that much more expensive, but that plus a ~$60 case and a ~$20 keyboard and mouse still isn't $400.
The main difference between the plain keyboard and your average "gamer" keyboard is what, RGB LEDs? You can pay the money for that if you like, but there are no RGB LEDs on the Mac.
I fail to find that anywhere in the specs. It says 8 SATA3 slots and 2 PCIe 4 slots. It is not clear if you can boot/configure them from the BIOS.
> That cheap-ass cooler isn't that bad. It's pretty middle of the road. Personally I'd have the Hyper 212 Evo for $39 because it's not that much more expensive
The cooler is critical if you are doing CPU intensive work. A hot CPU will get throttled; so you'd better get a very good one if you are already paying a lot for your CPU.
> The main difference between the plain keyboard and your average "gamer" keyboard is what, RGB LEDs? You can pay the money for that if you like, but there are no RGB LEDs on the Mac.
No. The grip and precision are night and day. Even more so for mouses that I can't move back to normal ones (I had the G900 and now have the razer viper ultimate; this thing made the carpal tunnel syndrome a thing of the past and I use the mouse for 10+ hours per day).
> I fail to find that anywhere in the specs. It says 8 SATA3 slots and 2 PCIe 4 slots. It is not clear if you can boot/configure them from the BIOS.
1 x Hyper M.2 Socket (M2_1), supports M Key type 2230/2242/2260/2280/22110 M.2 PCI Express module up to Gen4 x4 (64Gb/s) (with Matisse) or Gen3 x4 (32Gb/s) (with Pinnacle Ridge and Picasso)*
1 x Hyper M.2 Socket (M2_3), supports M Key type 2230/2242/2260/2280/22110 M.2 SATA3 6.0Gb/s module and M.2 PCI Express module up to Gen4 x4 (64Gb/s) (with Matisse) or Gen3 x4 (32Gb/s) (with Pinnacle Ridge and Picasso)*
* Supports NVMe SSD as boot disks
Look under the Specs tab. You can also see them in the pictures.
> No. The grip and precision are night and day. Even more so for mouses that I can't move back to normal ones (I had the G900 and now have the razer viper ultimate; this thing made the carpal tunnel syndrome a thing of the past and I use the mouse for 10+ hours per day).
So that's completely reasonable, but now you're adding whatever that costs to the price of the Mac too.
There's definitely cheaper options for the various components, but I would personally choose a nicer motherboard if I were doing a 5950X build. One of the new ASUS boards with no chipset fan and 2x Thunderbolt 4 ports, like the ProArt X570 Creator Wifi, is a likely choice.
I mean this is the other advantage of the PC. If you want to pay more and get the Thunderbolt ports, you can. If you don't need them for anything, you don't have to pay for them.
The specs we're looking at here are pretty general. Most workloads are either going to be CPU bound or GPU bound, not both. Do you need the 5950X? Then you're CPU bound and can save $500 with another GPU. Do you need the RTX 3060? Then you're GPU bound and can save $500 with another CPU.
If you need the fast GPU in the Mac, it comes as one piece in a machine that starts at $3500.
That SSD is 2600mbs while the MacBook is 7400mbs. The Samsung Pro is the only SSD in that territory.
Buy a cheap motherboard and you'll pay the price later. I didn't spec a $800 motherboard (though those are amazing). I went quite middle-of-the-road for an x570.
A keyboard with a fingerprint reader will set you back at least $50 with a Surface keyboard costing $100. A comparable trackpad would be over $100, but even a midrange $50 mouse.
A non-garbage case will be around $100 plus or minus a little.
A decent air CPU cooler that will keep your CPU from throttling way back is going to run close to $80-120. I also didn't bother to price out all the little things.
I forgot to add a PSU by the way. A name-brand, modular PSU with midrange internals that is just big enough (around 5-600w) is another $100-120.
> That RAM is DDR4 while the MBP uses DDR5 (low power). MSI has stated that DDR5 will cost at least 60% more. Known prices are actually 3x higher.
Naturally it's DDR4. The 5950X only supports DDR4. If it had DDR5 it would be twice as fast on all the things the M1 Max is doing well on as a result of having more memory bandwidth.
> That SSD is 2600mbs while the MacBook is 7400mbs. The Samsung Pro is the only SSD in that territory.
This is kind of fair, but then that's the other problem. For most workloads a 2600MBps read speed is already going to move the bottleneck somewhere else, especially on a machine with 64GB memory to use as cache. If you're the rare exception who actually benefits from it, the Samsung one is available, but for everybody else they get to save $170 by not coupling the fast CPU they actually want with an expensive SSD that wasn't their bottleneck and has a poor cost benefit ratio.
> Buy a cheap motherboard and you'll pay the price later.
How do you mean? At best it won't have some ports you eventually want and then you buy the add-in card later and spend half as much on it because by then the price is lower.
$155 is a fairly high price for a motherboard. $300 is a severe price. The most common ones are like $70.
> A keyboard with a fingerprint reader will set you back at least $50 with a Surface keyboard costing $100. A comparable trackpad would be over $100, but even a midrange $50 mouse.
The logitech keyboard and mouse are perfectly serviceable and on par with anything you get when you buy a complete PC from the store. I would take them over the chiclet thing that Apple makes.
You can have a $100 keyboard and a trackpad to use with your desktop, but now you have an advantage over the Macbook, because you get to buy it once and use it forever instead of it being permanently attached to a machine that will be obsolete before the keyboard is. So you get to amortize the cost over several hardware generations.
The same goes for the monitor for that matter.
> A non-garbage case will be around $100 plus or minus a little.
How is the thing provided a garbage case? What am I not getting from it that I actually care about?
> A decent air CPU cooler that will keep your CPU from throttling way back is going to run close to $80-120.
That was a decent air CPU cooler. It has copper heat pipes and a 92mm fan. The crappy ones are like $13:
For the cooler, I wouldn't put that Zalman in anything much more demanding than a commodity office box with a low end APU. The Hyper 212 is more or less entry level for a CPU an enthusiast might actually select (Ryzen 5600X and up) and even for that you're going to be leaving a decent chunk of performance on the table due to Ryzen's temperature-based scaling behavior.
The coolers bundled with some CPUs have similar issues, and are noisy to boot. Unless budget is hyper-restrictive, one is doing themselves a disservice by not bumping cooler budget to at least $50-$60 for something with a 120mm fan, decent number of heatpipes, and decent fin surface area like a Scythe Mugen 5 Rev B, Scythe FUMA 2, Be Quiet! Shadow Rock 3, or Noctua NH-U12S Redux which will all be significantly more quiet and better performing.
> For the cooler, I wouldn't put that Zalman in anything much more demanding than a commodity office box with a low end APU.
I had a Xeon workstation from one of the big name OEMs. Dual socket, each CPU had a TDP of 135W. The fans they shipped it with were rated for 140W. They were essentially their version of that Zalman thing, but with 80mm fans instead of 92mm. The rating was the real limit for what they could handle.
Which worked fine for one 135W CPU, but the sockets were positioned so the hot air from the first socket would blow over the second one. The second CPU would get just over temperature and thermally throttle. So I replaced them both with the Hyper 212 and the temperatures dropped 25 degrees.
What I gather from this is that the smaller one was good for a 135W CPU as long as its inlet temperature wasn't already high and the bigger one was good for it even if it was.
The 5950X has a TDP of 105W and all AM4 boards have one socket.
Do the even more expensive fans cool even better? Probably. And if you're the sort who likes to pay more to have a little slack, do as you like. But it's not what you'll get from an OEM.
The cheaper coolers will work fine yes, but as I said with Ryzen specifically they won't allow the user to take full advantage of their hardware.
Put simply, it has a particular temperature target that it tries to stay at as much as possible and it'll adjust its clockspeed boosting dynamically to achieve that. The more easily it can maintain that temperature, the higher the CPU will boost. With a lesser cooler, this means the chip may barely be boosting at all depending on the cooler's capabilities, especially for hotter chips like the 5800X, 5900X, and 5950X (which interestingly all share the same 105W TDP despite considerably different power consumption characteristics). Depending on how bad the mismatch is between CPU and cooler you may be better off saving money and buying a cheaper CPU with performance closer to what your cooler is capping more expensive CPUs to.
And yes you're right, you're probably not going to get cooling that competent from a major OEM. That's why it's not a great idea for home users to buy from a major Windows PC OEM unless their needs are fairly pedestrian.
They're definitely pricey, but CPU performance is out of this world. GPU's pretty good too, though not as impressive, I understand. Alas, it'll probably be a few years before I can get one (I use my work machine for almost everything and I just got this one).
Fair enough! It doesn't really bother me either way, since I don't really make use of the GPU on my laptop. Curious to try out Tensorflow on one of the new machines, though.
They’re also compact, well-built(keyboards are fixed now) and with good battery life. Last time I was shopping, I found direct competition(XPS 13 and the likes) to be about the same price.
Oh absolutely, they're great. My last MacBook Pro was around $2500 or $3000, if I remember correctly, and it was a very good price for what you got.
Don't think that I'm being overly negative here. These look like outstanding computers that are worth a premium price. My question here is about how much of a premium and I'm not trying to give an opinion with a question - I really don't know the answer :)
Years ago I read someone express the maxim "the computer you want is always around $5k". This has stuck with me. It's been approximately true throughout my life.
Depends what you need them for. On a music video set my 2010 MBP survived a drop from a second story balcony (three stories in a normal home), onto a marble floor. Got dented as hell, but only functional damage was to the ethernet port. I'm excited to get one of these new ones. Imagine it'll last at least 5 years.
Completely depends on what your use case is. For me personally a desktop + mid-range laptop combo is cheaper, more powerful and an all-round better fit than a single $2K+ laptop (which realistically will become $3K+ after adding a few options).
Seems the CPU cluster saturates at about 240 GB/s and can't utilize the full memory bandwidth. This bodes well for future clusters with double the number of CPU cores at a node shrink (M2 Max?) or for a Mac Pro (Mac Quadra?).
Maybe. This seems like a cluster-wide limitation - the individual CPU cores can utilize enough memory bandwidth that together they should be able to saturate the bus, but there's some kind of bottleneck on the entire CPU section of the SoCs and who knows how easy or difficult it would be to alleviate that.
Sure, keep in mind that most competing laptops max at 70GB/sec (the never to exceed number), and most desktops are slower than that with 2 channels of DDR4-3200 to 4200 (41-67GB/sec).
So while it's "only" 240, that's an excellent number. Keep in mind that you generally never see 100% of theoretical bandwidth.
Now that Apple has taken the lead in performance/battery life tradeoff, are there any machines which come close to the M1 for software dev? Specifically, compiling Rust, Android development etc. without giving up too much on battery life?
Also, the last time I checked, CPUs were reporting high performance but only under light load. Has the whole throttling situation changed or should I just expect to get 2 hours battery life in exchange for extreme CPU performance?
Edit: I should have specified machines that can run Linux.
Panzerino tested WebKit compiles on the first run of M1 machines last year, and it seems like the battery held up really well on those.
> After a single build of WebKit, the M1 MacBook Pro had a massive 91% of its battery left. I tried multiple tests here and I could have easily run a full build of WebKit 8-9 times on one charge of the M1 MacBook’s battery.
Intel's Alder Lake is moving to a Performance + Efficiency core setup, which should help overall with battery life. But they are still behind on manufacturing process (Alder Lake is "Intel 7", supposedly roughly comparable to TSMC N7), so Apple will quite likely maintain their lead in power consumption.
Alder Lake is getting announced in 2 days, but rumors have it as a desktop-first product launch, so laptops may be another quarter or two out.
> ...any machines which come close to the M1 for software dev?
If you can stand using macOS, that is.
Personally, I'll continue using Linux because that's where all my software gets deployed to and macOS simply can't approach the value of that or the value of open source. On a Mac, you'll be fighting the OS the whole time.
If speed was all that mattered, Mac users would have left Apple a long time ago because this is the first time they're faster than a PC.
Yeah, the Max has exactly the same CPU as the Pro. Only reason I picked it is because I wanted the 32GB RAM, and only the Max has that by default — customised orders take a long time deliver in India.
Believe the Max also has twice the memory bandwidth as the Pro.
EDIT, upon reading the bottom part of the first page of the linked article: And double the cache, more cores on the Neural Engine, and maybe doubled media processing, assuming I'm reading all of that correctly.
Yep, this is what pushed me to the Max. I think the GPU is going to be overkill for my other needs (though I'm looking forward to trying some gaming) but first and foremost I needed support for my monitors (the only thing that kept me from the first M1s).
3x2K (2560x1440) and 1x1080p. I have my 3 2K monitors in a "tie-fighter" formation which really just means 1 2K in landscape and the other 2 2Ks are portrait mode on either side of the middle 2K. Then I have my 1080p monitor just on top of my middle 2K. The 1080 is really just used for my Home Assistant (home automation software) dashboard and occasionally reference material (Adobe XD/Figma-type stuff).
I spit my 2 portrait monitors into 3rds and have hotkeys to resize windows to snap them to one of the 3 "positions" (also hot keys to expand them to 2/3rd for things like my code editor).
This setup works really for me personally and at some point I'll update to 4K+ but I just couldn't afford it when I first set this all up.
My normal usage is (`+` separates 3rds, `/` represents 1 of these apps in in this "slot"):
Left 2K: Discord + DataGrip/Android Studio/Drafts + Slack
EDIT: Just to note, I know I'm not going to be scratching the surface of what's possible with the M1 Max monitor-wise since it can do 3x6K (Pro Display XDR) + 1x4K
That definitely looks cool - including your LEDs. Not sure I would want to work in that configuration (my neck would get hurt), but it definitely looks great :)
> That definitely looks cool - including your LEDs.
Thank you!
> Not sure I would want to work in that configuration (my neck would get hurt)
Up and down issues or side to side? I tried all 3 2K monitors in landscape when I first got them but that was way too much side to side movement to get to the edges of the far left/right screen so I begrudgingly switched them to portrait. I had always thought portrait mode was silly for a monitor but I absolutely love it now and I'm super happy with the setup. I don't have to move my head very much at all to scan across my setup and normally I am pretty focused on just 2 monitors (center for Chrome and right for IDEA) so there isn't a ton of movement. As for up/down movement I really only ever move my head to look at the top 1080p screen which is fine since I don't use it regularly.
For me it is up and down, I try to limit the vertical head movement and wouldn't want to look high up. I am using up to 3 screens horizontally, that is quite fine. Though I am using mostly the "middle" display and put less used stuff on the other screens (mail, chats, reference documentation)
Docker uses almost 6gb on my M1 Air. Even with 16GB, I have to be careful to avoid swapping (which trashes your SSD).
I'm looking at buying a machine for higher RAM. 64GB is a pretty good deal (look at desktop DDR5 prices). If I spend 4k on a machine, I plan on using it for at least 3-4 years before I upgrade. 64GB seems to make a lot of sense if you can stomach the price of the M1 Max.
> 64GB is a pretty good deal (look at desktop DDR5 prices).
Looking DDR5 price isn't valid because no CPU is released now, and DDR5 vs DDR4 performance difference will be negligible at the beginning of DDR5 unless you use iGPU. Apple always takes huge for upgrading RAM/SSD.
A shame that thermal/power limitations aren't investigated. That is the most deciding factor for me getting a Pro or Max. And something Apple has historically had a lot of trouble with.
If I'm understanding you correctly, you're thinking of previous issues with thermals and throttling, but this has been an issue over the past several years due to Intel falling behind AMD and TSMC, and thus driving power through their chips in order to stay competitive, but that generates heat, and ultimately ends up triggering throttling.
If you read about these particular chips, it should be startlingly clear that they are much more efficient than the Intel chips they replace.
In this article:
> Apple doesn’t advertise any TDP for the chips of the devices – it’s our understanding that simply doesn’t exist, and the only limitation to the power draw of the chips and laptops are simply thermals. As long as temperature is kept in check, the silicon will not throttle or not limit itself in terms of power draw.
> The perf/W differences here are 4-6x in favour of the M1 Max, all whilst posting significantly better performance
Read page 3 of this article. They really do cover a lot of this.
> As long as temperature is kept in check, the silicon will not throttle or not limit itself in terms of power draw.
Apple laptops has, for as long as I know, had issues with just thermals. Sometimes they get so hot you can't even have them in your lap, so they are just "tops" at that point.
This wasn't really a _huge_ problem if you didn't need fast graphics until the 2016 15"es. With the switch to Ice Lake, they had to add a discrete GPU even to the base model of the 15" (due to lack of appropriate GPUs in Intel's bigger mobile chips in that generation), and things got very bad.
I've got a personal 13" 2016 MBP, and a work 15" one. The 13" is _fine_; it doesn't show significant thermal throttling in normal use. The 15"... not so much.
I had every model of the MBP that was released between 2012-2016 and they consistently got too warm to have in my lap, every single one of them. Some of them even when just browsing the web. Fresh install every time I got a new model. After that I simply got tired of not being able to use my laptop as a laptop and jumped ship.
Since M1 is designed to low power (but still impressive performance), it's unsure that M1 Max machine getting very hotter like Intel era or not. It uses 145W AC adapter so I suspect they getting very hotter if it utilized extremely.
It is much more efficient. It's also more powerful. You can see in one of their benchmarks they hit 90+W of package power. I doubt it can sustain this. The "turbo-mode" Apple has announced for the 16-inch version also indicates that it will be significantly thermally limited.
In my Lenovo Legion 5, in performance mode, the CPU is configured to draw up to 70W, and the GPU is configured to draw up to 115W. It's able to do this just fine for gaming sessions. Yes, the fans are quite audible while doing this. I think in contrast, having to handle about half that power draw overall should be attainable. For sure, it's now a very large SoC, so the heat might be a bit more concentrated and require some engineering to cool. But it doesn't seem like it should be a showstopping concern. Of course, you can wait for additional reviews and see if anyone addresses longer, more sustained load testing.
I don't doubt it's physically possible. I doubt Apple implemented it. So I rather wait till there is some hard data. Apple could also have implemented proper cooling for their Intel laptops but they didn't.
I haven't seen too many benchmarks yet, but Dave2D ran a "Cinebench R23" test in a both a single benchmark and thirty minute loop. [1] He saw that the score remained the same after the 30 minutes.
He also reported that the loudest fan noise he could get was 38dB, with typical loads under 30dB. [2]
That's perfectly understandable. It's a major purchase decision.
But also, I'm not aware of anyone that implemented "proper cooling" sufficient to handle the last few generations of Intel chips (at least at the high end.) I read reviews of a variety of machines. All of them had issues with throttling.
I was so happy when the Ryzen 4000 mobile chips were reviewed and did not require elaborate cooling systems just to perform their regular duties. I would be shocked if the 2021 Macbook Pro 14/16" have issues with thermal throttling.
Can you please quote the part which discusses power and thermal limitations backed up with tests because I don't see any of it on that page.
As far as I read it these test only report package power and wall power used under certain loads. It doesn't say anything about any limitations. No long term tests or temperature graphs. No information at what temperature throttling kicks in. Is CPU temperature truly the only limiting factor or are the VRMs also a pain point? I could go on but I think this illustrates enough of what I want to see.
Someone saying without any data to back it up doesn't exactly inspire confidence. Like I said nothing concrete.
> Any pure CPU or GPU workload doesn't come close to the thermal limits of the machine.
So does that mean it is thermally limited on a CPU + GPU workload? What about a CPU + GPU + Media engine workload. What about using the NPU? Does SSD load have an impact? SSDs nowadays can consume 15+W of power. dozens of questions, unanswered by such a short sentence. Please just test it out and give us the data.
>Someone saying without any data to back it up doesn't exactly inspire confidence.
That "someone" is the editor in chief for Anandtech.
He says you need a workload that stresses both the CPU and GPU as much as possible and let that run overnight before the ability to crank the fans higher than normal is handy.
> That "someone" is the editor in chief for Anandtech.
I don't see how that is relevant.
> He says you need a workload that stresses both the CPU and GPU as much as possible and let that run overnight before the ability to crank the fans higher than normal is handy.
Then why is there no data? This is much more interesting then what some of the tests done in the report.
What you are looking for is a Laptop Review, how M1 Pro / Max under the MacBook Pro 14 / 16 cooling works with maximum possible TDP and cooling.
But this isn't a Laptop Review, it is an SoC review. My guess Dave2D will look into these sort of thing since he cares about it before anyone on the internet actually test these sort of things for Laptop. ( Possibly due to various PR restrictions )
>A shame that thermal/power limitations aren't investigated.
It's covered in the comments, along with when the "crank up the fans" mode would be useful.
>Any pure CPU or GPU workload doesn't come close to the thermal limits of the machine. And even a moderate mixed workload like Premiere Pro didn't benefit from High Power mode.
It has a reason to exist, but that reason is close to rendering a video overnight - as in a very hard and very sustained total system workload.
Except for, say, Purism and Framework and a few others, every company sucks with ethics.
Except Purism even had the idea to make their own products which were based on free, open-source code, charge for them, and then give no attribution before apologizing after blowback.
So even for those companies their Ethics are suspect.
> Except for, say, Purism and Framework and a few others, every company sucks with ethics.
Basically. If I wanted to stay ethically pure in all my purchases I'd be in trouble, unfortunately. It's not like Apple is ethically "worse" than Google or Samsung or other major phone manufacturers.
Arguably approaching 100% ethical purity would also approach involve not being born in the first place.
Related note: I think that leading large numbers of people as well as having very large numbers of customers also reduces your chances of doing well for all of them.
> Except Purism even had the idea to make their own products which were based on free, open-source code, charge for them
And then charging 2000 dollars for a barely functional brick. I think the only ethical hardware companies I know would be System76, Framework, and Fairphone.
Honest question: for someone whose priority is maintaining an ecosystem of general computing devices with freedom to run software of my choice (rather than privacy paranoia), what companies should I get behind and/or avoid? Between custom processors, comprehensive SoC, and secure boot, I'm a little afraid getting caged in over the long term with practically any offerings out there right now.
https://frame.work is getting good reviews and seems very reputable. Of course its just a laptop, so you have to figure out the rest of your computing world then.
The perf/W differences here are 4-6x in favour of the M1 Max, all whilst posting significantly better performance, meaning the perf/W at ISO-perf would be even higher than this.
and
>On the GPU side, the GE76 Raider comes with a GTX 3080 mobile. On Aztec High, this uses a total of 200W power for 266fps, while the M1 Max beats it at 307fps with just 70W wall active power.
https://www.anandtech.com/print/17024/apple-m1-max-performan...