Hacker News new | past | comments | ask | show | jobs | submit login
Benchmarking the Apple M1 Max (tlkh.dev)
203 points by xrayarx 6 months ago | hide | past | favorite | 210 comments



Great detailed benchmarking.

This mirrors my experience with my M1 Max: Absolutely amazing battery life and performance in a laptop. I’m thrilled to have it. Huge step up from last gen Apple laptops.

But at the same time, it feels like some of the rhetoric around the performance claims got a little out of hand in the wake of the launch. It’s fast, but it’s not actually crushing my AMD/nVidia desktop like a lot of news outlets were suggesting it would.

In fact, a lot of the GPU tests here show more or less what I’ve seen: That Apple has matched the power/performance of other leading-edge GPU hardware:

> Pretty much what we would expect, with the M1 Max having about 8x less performance, but at 8x less power, so performance per watt is surprisingly quite comparable between the two.

This is actually an impressive accomplishment out of Apple. I’m just afraid it might get overshadowed by the fact that it doesn’t live up to some of the fairly extreme performance claims that got tossed around in the days following the launch.


But the fact that you're even mentioning your desktop in the same breath is kind of the whole reason it is amazing. Like that's the rhetoric--holy shit your laptop is doing this.

Apple hasn't even released the Pro desktop stuff yet.


> But the fact that you're even mentioning your desktop in the same breath is kind of the whole reason it is amazing. Like that's the rhetoric--holy shit your laptop is doing this.

The problem with this take is this:

https://www.anandtech.com/bench/product/2685?vs=2613

This is mobile vs. desktop Zen 2 with the same number of cores. You give a CPU three times the power budget and it gets marginally faster. Because the last few hundred MHz of clock speed uses a ton of power, and that's it.

But then you have this:

https://www.anandtech.com/bench/product/2613?vs=2666

Nearly double the multi-threaded performance at the same TDP, because it has twice as many cores and it only has to give back the last few hundred MHz of base clock to do it.

A laptop with performance largely equivalent to <= 8-core desktops wasn't a novelty. What a desktop really gets you is the ability to have a lot more cores.


I don't really understand your point. The impressiveness of M1/Pro/Max is what they do holistically for the power draw. The CPU + GPU + other accelerators combined and across the board approach what powerful desktops can do even with discrete power guzzling components. This is not about a single aspect of the silicon.

So sure, you can cherry pick one part of the SoC, match it up against one aspect of Chip X and try to make a point. OK, you win that one. I'm talking about what the SoC as a whole does while running near-silently off a battery.

Yes, a desktop gives you the ability to have more CPU cores. I'm sure that's what we'll get from whatever Apple decides to ship for its true Pro desktops.


The point is that we already had this. Both the CPU and GPU are competitive with existing CPUs and GPUs. Which is a first in a long time for anybody outside of AMD/Intel/Nvidia. But "competitive" and "dramatically better" are not the same thing.

People were primarily comparing it to desktop CPUs because of where we were in the release cycle. How does it compare to Zen 3? Well, Zen 3 laptops were still a few months away when the M1 was released, but Zen 2 desktops were only a single digit percentage faster than Zen 2 laptops, so how does it compare to Zen 3 desktops?

Meanwhile the primary real advantage of the M1 is that it uses less power. Compared to PC laptops, it's a somewhat less power. There are already PC laptops with 9+ hours of battery life (and no matter how little the CPU uses, you still have to run the screen and everything else), so the practical impact of this isn't enormous, but it's there. But if you compare the power consumption to PC desktops, OMFG! Except that nobody really cares about the power consumption of PC desktops. That's why they triple the power consumption to eek out 8% more performance.

A lot of this is also attributable to how good Apple's marketing is. I don't know if they did this on purpose, but people keep publishing SPEC benchmarks for comparison with PCs. Part of this is that a lot of the ordinary benchmark software either doesn't run on Mac at all or there was no native ARM version available yet. But SPEC was put together by a consortium of server vendors. Their incentive is show that their POWER and Xeon processors are worth the fat premium over regular desktop CPUs. So the suite skews heavy to benchmarks limited by memory bandwidth, because big servers will have like 16 times more memory channels than a desktop.

Then Apple shows up with this SoC that has a ton of memory bandwidth because it's necessary to feed the GPU and it stomps all over those specific benchmarks. Then it's OMFG again even though those results aren't actually typical because most workloads aren't limited by memory bandwidth.

It's not a bad chip. The level of hype is just extraordinary.


If you don't like SPEC then look at workload benchmarks like video rendering or code compilation. The M series chips acquit themselves quite well.

> There are already PC laptops with 9+ hours of battery life

Not when they're actually doing anything. To compete with what the M series chips bring, you need a discrete GPU and a very hungry CPU. Such a machine does not get 9+ hours of battery life.

> The point is that we already had this. Both the CPU and GPU are competitive with existing CPUs and GPUs. Which is a first in a long time for anybody outside of AMD/Intel/Nvidia. But "competitive" and "dramatically better" are not the same thing.

Right, you already had something like this in multiple discrete components that consume a huge amount of battery life when cobbled together into one system. My entire point has been that the impressiveness of the M series chips comes from their efficiency and what they are able to do holistically in a single SoC.

> A lot of this is also attributable to how good Apple's marketing is

Sigh. Alright I know what I'm dealing with now. Good day to you sir!


> If you don't like SPEC then look at workload benchmarks like video rendering or code compilation. The M series chips acquit themselves quite well.

Then you get results that are equivalent rather than superior.

> Not when they're actually doing anything.

Yes, when they're actually doing anything. The M1 Macbooks get around 16 hours. Under the same kind of load a similarly performing PC laptop might get around 10 hours. Other PC laptops will get 16 hours (or more) by having a lower TDP and then being slower, especially on multi-threaded workloads.

This obviously only matters to people who care about not just long, but very long battery life, and very high performance, and aren't willing to make the weight trade off of getting one with a bigger battery.

> My entire point has been that the impressiveness of the M series chips comes from their efficiency and what they are able to do holistically in a single SoC.

It has very little to do with being in a single SoC. The CPU being able to use the memory bandwidth which is there for the GPU is a neat parlor trick but it only matters if memory bandwidth is the bottleneck, which it usually isn't.

> Sigh. Alright I know what I'm dealing with now.

You don't think that Apple has excellent marketing? They've done this for decades. They make a competitive product and then convince their customers that alternatives are dramatically inferior by pointing to narrow edge cases.


> It has very little to do with being in a single SoC.

The part where the efficiency is radically better than competing parts has everything to do with it being an integrated SoC.

> Then you get results that are equivalent rather than superior.

Equivalent to what? Your desktop at 5+ times the wattage? A Windows laptop with half the battery life and twice the weight?

> Yes, when they're actually doing anything. The M1 Macbooks get around 16 hours. Under the same kind of load a similarly performing PC laptop might get around 10 hours. Other PC laptops will get 16 hours (or more) by having a lower TDP and then being slower, especially on multi-threaded workloads.

Ok this is so ambiguous as to be nearly meaningless. 10 hours of what? At the least you're conceding that in this undefined workload the MacBook gets 6 more hours, which again, has been my whole point. The holistic efficiency is the whole story.

> This obviously only matters to people who care about not just long, but very long battery life, and very high performance, and aren't willing to make the weight trade off of getting one with a bigger battery.

"Only people who want better performance and better battery life should get an Apple Laptop" ... is that basically what you're saying here? Because we might just finally be in agreement.

If you don't care about performance or battery life or weight there are plenty of Windows options available. This is true.


> The part where the efficiency is radically better than competing parts has everything to do with it being an integrated SoC.

What makes you think that? The CPU is about the same speed as other CPUs, the GPU is about the same speed as other GPUs, and it uses less power in no small part because it's the first thing to use TSMC 5nm.

Being an SoC allows you to save a certain amount of overlap, e.g. you don't need separate memory controllers for the CPU and GPU, but none of that stuff uses a significant amount of power.

> If you don't care about performance or battery life or weight there are plenty of Windows options available.

It's not matter of not caring. You can get PC laptops with similar performance and similar weight and 10 hours of battery life instead of 16. Ten hours is not exactly oppressive. If you really, really need sixteen, you can trade it against weight or performance at your option -- it's not necessary to do both.

Or you could wait a few months for PC laptops on TSMC 5nm which will have better power efficiency.


> Being an SoC allows you to save a certain amount of overlap, e.g. you don't need separate memory controllers for the CPU and GPU, but none of that stuff uses a significant amount of power.

It absolutely uses more power to have things on separate dies. Why do you think monolithic designs like this are preferred for mobile first products? It's certainly not because it's cheaper or easier. SoCs like this typically have lower yields and higher costs--you do it because in return you can squeeze out better power and efficiency while also saving space on the board.

> It's not matter of not caring. You can get PC laptops with similar performance and similar weight and 10 hours of battery life instead of 16.

This is just so emphatically not true though. Like I'm going to need to see the workload you're referencing where a device with similar benchmarks across the board (CPU and GPU) gets 10 hours of real world battery life without weighing 5 lbs.

Like I have in my possession as a daily driver a 2020 MacBook Pro that uses an Intel IceLake chip--fairly recent tech! If I'm doing my actual work on it--a bit of Docker, Chrome, and an IDE--I'm lucky to make it much past lunch. A fairly recent Intel chip, that gets positively embarrassed performance-wise by an M1, and can barely turn in 6 hours of real usage.

I want to see this system you have that performs like an M1 and turns in 10 hours of real world battery usage.

> Or you could wait a few months for PC laptops on TSMC 5nm which will have better power efficiency.

You could have already had it for well over a year now with Apple. And by the time this mythical laptop you're speaking of arrives, Apple might have already moved on to the next node. But hey, it's your life not mine. Keep waiting if it pleases you.


I doubt that they can make another giant performance leap. I wager that most of their gains are from being a node ahead of the competition (5nm vs 7nm TSMC) and placing RAM on package for much fatter bandwidth. I am, however, very interested in seeing what the competition does in response.


The rumors are that the M1 supersized version they are putting in the new Mac Pro is 20 CPU cores and 128 GPU cores, which would place it at a little under twice the performance of a 3090. Not saying that Nvidia can't catch up with a hypothetical 4090 next year but it'll be a tall order.


The M1 Max 32 core GPU is ~1/8 the performance of the 3090. 4x the core count should put it half of 3090 performance, not double.


The path forward is simple then. 512 GPU cores.


That's just not true. 3dmark shows it having a little under half the performance of a 3090 with the largest GPU, with it getting a score of 18,000 vs a 3090 having around 42,000. Scaling up linearly by a factor of 4 would lead to a comparison of 72k vs 42k, or a little less than twice as much.


It's not going to scale linearly though. The Max has 400mb of memory bandwidth. If you attempt to use as much as possible, as single Max chip uses 220-240mb/sec. The remainder is likely because they planned to use multiple chips in later products. But 2 Maxs will completely saturate it... so that's going to be a bottleneck if they try 4 Maxs. And there would be little point going above 4.

For comparison, the 3090 has almost 1TB of memory bandwidth.


Aren’t you assuming they won’t scale the memory bandwidth?


Yes, that's the way that works. To scale memory bandwidth, you would need a new chip. It wouldn't be an M1 Max anymore. If rumors of them using multiple m1 maxs is true, then they have 400mb/sec of memory bandwidth.


> they have 400mb/sec of memory bandwidth

Per chiplet.


Dumb question here: Can you get more performance from the same number of cores by giving them more power?


Not a dumb question at all. You can to an extent, but the returns diminish quickly. It isn't linear.

This fact however makes "performance per watt" comparisons misleading between different processors designed for different environments (e.g. M1 vs AMD Desktop). It takes a more power to get that much extra perf, conversely and the more under-appreciate part IMO reducing the speed a little can save a ton of power/heat if the chip is currently running at the higher power portion of the curve.

On a desktop machine most people would want them to tune for performance at quite a substantial power efficiency cost so of course a desktop chip most probably is less power efficient per unit of compute. You don't need to power it with a battery after all and there's heaps more cooling capacity in a bigger form factor so why optimise for that?


If they improve the architecture at the same time, perhaps feasible?


I'll believe it when I see it.


Yeah. I'm worried that they may have played their trump cards already, so to speak.

I wonder if future perf gains will come, game console-style, from areas besides general purpose computation -- specialized instructions / cores for specialized tasks.

Imagine an entire core optimized for Safari and its Javascript engine. Their next chip is called the "M1 Marathon Edition" and you get 36 hours of real-world battery life with Safari. And/or the iWork suite. And maybe they have a behind the scenes collab with Slack and select other app makers so that they can be a part of the "Marathon" program too.

I dunno, just spitballing. Not saying that's likely or even what I'm pining for, just one possible avenue once they've plucked all the low-hanging general purpose computing fruit.


The things that take up time and energy in browsers - i.e. things like garbage collection, JIT compilation, and so on - are already things that CPUs are hyper-specialized for. And the code that is ultimately intended to execute is also well-specialized for CPUs.

It's already possible to target GPUs in-browser directly; and most older HTML primitives were recast in terms of GPU operations around the time of the original iPhone. The only thing I can think of that's still been left on the table is rendering vector geometry on-GPU; but most sites don't redraw so much as to make this a huge performance win.

Video decode has been offloaded to hardware for decades as well. The only reason to decode video on-CPU is if you're decoding crazy-old formats[0] that don't have hardware decoder blocks present for them.

The other huge problem is networking - which is also heavily hardware-optimized and has been for a while. Large assets mean keeping your Wi-Fi or LTE baseband on for longer. You could mitigate this with compression; which can be hardware optimized... though I'm not sure how much of a benefit that provides outside of game consoles[1].

[0] In my personal experience: I wrote a Sorenson H.263 decoder for Ruffle. Right now it not only executes on-CPU, but blocks the event loop main thread. However, the video files in question are so low-quality that this isn't a significant problem for most Flash movies and everything works fine (though I do want to try on-GPU video decoding at some point).

[1] Current-gen game consoles (PS5/XSeries) have hardware decompression blocks. However, the intent is to quickly decompress gigabytes worth of data quickly; most websites aren't nearly that bloated.


This is a wonderful and informative reply. Thank you so much - i really appreciate the time you spent writing this.


Sounds like a nightmare to me. Want to change browsers? Buy a new device with a different processor or suffer massive battery or performance penalties. Want to change media players? Buy a new device with a different processor or suffer massive battery or performance penalties.


By that logic, any time anybody optimizes for anything, they're somehow punishing everybody else with "penalties."

What if somebody on the Intel/AMD side of things includes some optimizations for SSE, AVX, etc? Are they punishing everybody else?

At any rate, my idea was less than half baked and again, not exactly something I'm pining for. Was just thinking about things Apple might conceivably be able to try with their unique vertical integration.


SSE/AVX/etc aren't optimizations for a single application, they're optimizations for specific kinds of math instructions. Any application using that processor can use them, and they're overall pretty general instructions.

Hyper-optimizing for a particular application means the way that particular application works gets accelerated, but not things in general. Sure, if a browser behaved identically to Safari, it too would potentially experience the speed up barring any weird microcode/firmware that only allows those extensions to be used by Apple-signed binaries which is a whole 'nother level of nightmare. Stepping outside of that highly optimized path is essentially a penalty though as it by definition wouldn't be so highly optimized. And once that optimized path has been etched into the silicon, there's no real updating it unless we move to CPUs being more like FPGAs which isn't likely to happen.


When they make optimizations for something, and then _artificially limit_ access to those optimizations (for applications that absolutely could utilize them) for purely commercial or marketing reasons, then yes, absolutely they're punishing.

Intel doesn't forbid compilers from generating code for SSE or AVX.

Apple _does_ forbid Chrome from using energy management and other APIs for no other reason than to keep it less efficient than Safari.


I didn't realize Safari made such heavy use of private APIs.

That is unfair and I would not support that behavior whether talking about private software APIs or private hardware instructions. What I was half-assedly imagining wouldn't be anything like that.

(Although, FWIW, Chrome has private APIs too: https://blog.chromium.org/2021/01/limiting-private-api-avail... Different use case, but still.)


That's already the case, though. The battery life difference using Safari versus Firefox/Chrome is staggering. Sure, you don't have an option to buy a FF/Chrome mac atm, but you've definitely bought one where safari is the only decently efficient browser.


Is that really due to the hardware or because of the overall behavior of the two browsers? Isn't Safari better battery-wise than Chrome even on Intel Macs? IIRC Edge has better efficiency in Windows than Chrome, so its not surprising me Chrome is less efficient. Its a bloated shadow of its original self.


macOS has a bunch of energy management and performance-related "private" APIs and libraries that Safari is allowed to call, but Chrome is not.


That's the second time you have made that assertion in this discussion. Can you be more specific please? I have not seen any evidence of that, but I am very interested.


Can you point me to information about this? The Google results are a bit polluted with (hilariously enough) articles about Google's recent restriction of Chrome's private APIs.

I've been having trouble figuring out exactly which private bits and bobs Webkit might be taking advantage of on MacOS.


I would also like to see some documentation on these private apis. Sounds fascinating I wonder what they’re doing differently? Are they exposing the same kind of abstractions but the public api is purposely slower? Why would they not want all apps to use less battery life?


Yeah, that's my question.

Other commenters are suggesting that Apple's intent is to keep Chrome and other third-party apps from matching Safari's energy efficiency.

But as you say, I'm not sure that makes strategic sense. If that is true, Apple is essentially crippling most Mac use cases just to benefit Safari. That really would not seem to be in Apple's best interests.

I am a developer, but not a iOS/MacOS developer. So I have followed this sort of thing only very loosely over the years.

But whenever I have heard about grumblings about private Apple-only APIs being used by blessed first-party Apple apps on iOS, it seems to me that the explanation was always that the private APIs either:

1. posed some sort of security issue

2. were simply not yet stable enough to expose to third-party apps

As every developer knows, publicly exposing any API represents a maintenance burden: you're committing to support that API and keep it stable for X number of years. In general you see this kind of pattern a lot: a platform developer dogfoods APIs internally for some period of time before they're stable enough to expose publicly.


That already happens though given that different hardware has different support for hardware accelerated playback of video.

E.g even on the M1 Pro/Max, you're going to get much better performance and battery if you're using a video editor that supports prores decoding.


I get that, but that's a bit more general than saying Safari will run faster than Chrome because Apple made Safari-specific application instructions rather than H.264 will be more power efficient than Theora on a given CPU. Any browser could support any particular hardware's video decoders, but only Safari behaves exactly like Safari. The person I was replying to was writing specifically about CPU instructions to enhance the performance of Safari in particular, not of all browsers in general.


Most of the time the hardware accelerators are available for everyone to use, if they program to it. Ex ffmpeg and other media encoding engines all have the ability to write coders / decoders against certain hardware accelerators. You can choose CPU coders, Nvidia specific ones, etc.


Except that they locked access to the hardware accelerators via their APIs. The Neural Accelerator is only available through CoreML and even some of the benchmarks show features that are not available to the public (int8)


Sure, that's been the case in the past and maybe that's what JohnBooty meant with his comment, but his comment specifically pointed out a particular browser so I took it to mean CPU extensions targeted for how Safari operates. Like, specific ways Safari handles Javascript or rendering with all of its specific quirks etched into silicon. Other browser manufacturers would have to pretty much behave like Safari or else run as a penalty on legacy instruction sets.


> Imagine an entire core optimized for Safari and its Javascript engine. Their next chip is called the "M1 Marathon Edition" and you get 36 hours of real-world battery life with Safari.

I don't think that would be worth it. What's the point of 36 hours of battery life? How often do you need more than 12 hours of life on battery? Increasing the maximum battery life only makes sense when it also increases battery life for the more power-hungry workload (which is where you're likely to reach the limit), why bother building dedicated hardware for a specific workload that's already beyond what you need?


Many parts of the world don't have stable electricity, reducing electricity usage reduces costs and overall emissions in to the planet, and as you approach much longer periods you massively increase convenience.


There's a gap between no stable electricity and 20+ hours without electricity, so what fraction of Apple customers do you expect to live in this tiny niche, like 0.1%? In such situation you'd use an external battery anyway…

> reducing electricity usage reduces costs and overall emissions in to the planet

You're going to need a looooooooooooooooong time for the electricity usage reduction to compensate for the enormous energy cost of manufacturing a new laptop. If you care about the climate, don't buy a new laptop if your previous one is still running fine, there's just no way energy efficiency is going to be worth it.


I don’t think regions susceptible to black/brown outs and power outages are the intended market of an M1 Mac, though that could change.


Aren’t many companies normally located in the Silicon Valley now migrating to Texas, where the electricity infrastructure is wonky at best.


Texas is pretty average when it comes to electricity reliability in the US even though it experiences some of the most diverse weather in the US. Not too many states experience both tornadoes and hurricanes on a regular basis. Its biggest challenge it currently faces is winterization measures due to climate change. Things just weren't designed to get as cold as it did in February, which was a exceptionally rare event historically. I've lived in Texas nearly my whole life, it has never been that cold for that long since I've been here.


Those parts of the world aren't generally spending multiple thousands of American dollars on a laptop.


You'd be surprised. Additionally, that also includes parts of the US as well.


> I'm worried that they may have played their trump cards already, so to speak.

I'm not. Let them all scramble and pull out the big guns to try to compete. That's capitalism at it's finest, which isn't exactly what we've been seeing in the CPU space for the prior decades. We got lucky that AMD caught Intel with their pants down recently (if only because it strengthens AMD and makes them a better competitor), but a duopoly isn't necessarily what I would consider a good market condition for progress (as we've seen with iOS vs Android). Lots of different experiments with feedback from people on what they find good would be much preferable, and while a constrained CPU such as Apple's isn't perfect, it does represent more choice and pressure on other players to evolve in ways they may have been resistant to previously.

> And maybe they have a behind the scenes collab with Slack and select other app makers so that they can be a part of the "Marathon" program too.

RIP the general purpose computer. :(


    > And maybe they have a behind the scenes collab with 
    > Slack and select other app makers so that they can 
    > be a part of the "Marathon" program too.

    RIP the general purpose computer. :( 
Haha. I was truly truly not thinking along any such lines.

Since heterogenuous cores are now a mainstream thing, with a mixture of full-throttle and performance-minded cores on a single die, and the performance cores may be hitting a wall until we get to smaller processes, perhaps the next frontier could be efficiency.

Remember how some software proudly displayed those "optimized for MMX" or "optimized for 3DNow!" badges back in the day? What if there was something like that for apps that optimized for those efficiency cores, and what if the efficiency cores met them halfway by implementing some app-friendly stuff in hardware? Sort of like how some common Javascript string ops have dedicated CPU hardware dedicated to them now.

Anyway, yeah. I'm probably just re-inventing CISC all over again, badly.


We know (more or less) exactly what the Pro desktop chips will be, in the same way that these laptop chips weren't much of a surprise - they'll be basically exactly this, but bigger. More of it.


They can't get that much bigger. The Max is already over 400mm^2. For comparison, the biggest chips are around 600. Apple can't realistically get much bigger than this since yield already drops off a cliff at 600. Getting bigger than that will be tough for them since it isn't clear that apple has put R&D into interconnect tech needed to make effective multi-chip modules.


It looks like they were designed for chiplets from the start. So desktops could simply be the existing chips stacked.


Is this true? I was curious and searched around for sources, but it looks like all of the news outlets screwed up and thought the M1 Pro was going to be a chiplet based device called the M1X.

All of the M1 devices have been monolithic dies so far.


They are probably going to do two M1 Max chips in the imac, so basically take the existing m1 max, and double it (at least for multicore perf, who knows how single-threaded will play out).


They almost certainly will not do this and it would be a NUMA nightmare if they did.


Not sure why you were downvoted for this.

Considering that some of the magic in these things is the shared, local memory with a very wide bus it would seem obvious that trying to go multi chip would indeed be a massive headache in this regard


Check out this[0] link to see why the thought is Apple is doing exactly that.

[0]: https://architosh.com/2021/10/apples-new-m1-pro-is-chop-vers...


That isn't what that article is saying.


There were a lot of people saying things like this after the M1 too. I'm not expecting another huge leap with whatever comes next in the M-series, but I won't be surprised if there is one either.


They didn't really improve the M1 with the Pro and Max chips though. They really just added more cores to the SIP, which doesn't necessarily improve performance so much as it makes more performance available.


I wouldn't say M1 max is another leap. It's mostly scaling for linear performance gains.


When you look at it from a compute perspective, sure. But from a power perspective, it is indeed a leap.


RAM is not on package in the way iPhones are. Bandwidth should not be meaningfully affected by the placement of RAMs; perhaps slight latency improvement. It simply has many memory channels.


They got something like 15% increase in integer performance, up to 37% increase some benchmarks, and 8% overall performance increase.

Zen 2 was a must-have upgrade over Zen+ and it was a 10% IPC increase.


Maybe not next year, but TSMC’s 3nm isn’t that far off.


They're mentioning their desktop because Apples marketing hype mentions the desktop, not because the comparison is actually valid. As you can see in the benchmark - lower power, proportionally lower performance.

To be clear - Apples M1 hardware is good, even great. It's just that Apples marketing hype is so over the top, and so many people buy into it so hard that people (such as me) still feel the need to bring it down to earth.


> lower power, proportionally lower performance

Except it's not. The performance per watt is significantly better on average. Only one of the examples in this post really talked about power usage and that example also says that the 3090 is using a special compiler and gets to use the special tensor cores in the 3090, but for the M1 Max it wasn't able to use the neural cores or the special compiler and still achieves a similar performance per watt.

If you look at more detailed benchmarks from Anandtech:

> In the SPECfp suite, the M1 Max is in its own category of silicon with no comparison in the market. It completely demolishes any laptop contender, showcasing 2.2x performance of the second-best laptop chip. The M1 Max even manages to outperform the 16-core 5950X – a chip whose package power is at 142W, with rest of system even quite above that. It’s an absolutely absurd comparison and a situation we haven’t seen the likes of.

https://www.anandtech.com/show/17024/apple-m1-max-performanc...


For laptop usage, it's an amazing feat.

But it's kind of a strained comparison. Power usage doesn't scale linearly with performance for any CPU (power consumption scales with the square of voltage) so squeezing the last 20% of performance out of a desktop chip could require doubling the power consumption. Which is actually find for a desktop because I really don't care if my CPU consumes 100-200W for a few minutes while doing a compile. But trying to compare performance-per-watt between two parts tuned for different parts of the power/efficiency curve is always going to be misleading.

If we wanted to compare straight across, comparing to AMD's mobile Ryzen parts would make more sense.

AMD has a 35W mobile part that isn't all that far behind the M1 parts: https://browser.geekbench.com/v5/cpu/11001445


The Anandtech review[1] also compares it to the Ryzen 5980HS, which is the one you linked. The M1 Max absolutely crushes that chip as well, in some of the benchmarks it's more than 2x better. I don't understand the argument people are making saying it doesn't make sense to compare it to a desktop chip. The point is that the M1 Max is actually outperforming desktop chips in a mobile form factor. It may eliminate the need for a separate desktop workstation for a lot of people. Other mobile chips are not competitive at all to the M1 Pro/Max currently.

Even using Geekbench as the benchmark, we see that the M1 Pro multicore score is around 12650, which is 58% better than the Ryzen 9 5980HS you linked.

1. https://www.anandtech.com/show/17024/apple-m1-max-performanc...


The 5980HS is a 35W part and will not sustain more than 42W for 5 minutes and then only 35W. The M1 Max's CPU alone will happily draw and sustain 62W.

In other news, processor that draws more power on a more efficient process is faster. More news at seven.


The m1 max is also not actually faster in general: https://www.notebookcheck.net/M1-Max-vs-R9-5980HS-vs-M1-Pro_...

I assume SPECfp specifically scales really well with memory bandwidth/latency.

Regardless, I was talking about GPU performance in my comment (because GP was talking about GPU performance) which all of the replies seem to be ignoring.


Ah, if you care about GPU performance you have to compare it to a dGPU. The M1 Max GPU by itself can draw 65-70W, which makes it more power hungry than an RTX 3060 Max-Q, which is slightly more powerful (despite 8nm vs 5nm), which is a GPU you'd expect in a 1000$ computer.

The one advantage of the M1 Max's GPU is more memory.


Apple's marketing page for the new MacBook Pros with the M1 Pro and M1 Max strictly mentions laptops and laptop GPUs in its comparisons. Apple makes no comparison to desktop systems directly. At most they make general comparisons to the power curves of PC CPUs and GPUs, but mostly when introducing the M1 and explaining why they built it.


> They're mentioning their desktop because Apples marketing hype mentions the desktop, not because the comparison is actually valid.

The third party testing does show that the comparison is valid.

>The chips here aren’t only able to outclass any competitor laptop design, but also competes against the best desktop systems out there, you’d have to bring out server-class hardware to get ahead of the M1 Max – it’s just generally absurd.

https://www.anandtech.com/show/17024/apple-m1-max-performanc...


There were really similar discussions in mid-2020 when the Ryzen 4000 mobile chips launched, because they were beating Intel desktop chips.

That doesn't diminish how excellent Apple Silicon is for Macbook users (and Mac Mini users).

It's just an exciting time when laptops (with the right silicon) can do more than ever before, reducing the need to have a desktop for a lot of tasks, and also... largely not a great time to be using Intel chips. (Alder Lake is very high performing, but power hungry and desktop only.)


Not just that it's even comparable to a desktop but that it also has excellent battery life. You've been able to get portable desktop replacements before, but they were power hungry even when barely doing anything.


> But the fact that you're even mentioning your desktop in the same breath is kind of the whole reason it is amazing.

Right, and I acknowledged the battery life as amazing.

But currently, I can't buy an M1 Max desktop if I wanted to. So instead I have a 16" MacBook Pro semi-permanently attached to my monitor and keyboard at my desk. It's basically a desktop for me.

I don't expect a laptop to outperform my AMD/nVidia desktop workstation. And sure enough, it doesn't! But it comes surprisingly close for common tasks and I'm very happy to have it.


Laptop Zen and desktop Zen are basically the same in single core. The big difference is you're not more than 8 fast cores. The M1 Max also has basically 8 fast cores.

It used to be different, but now frequency is so close to the absolute limits that adding more power doesn't help single core performance that much.


I don't understand why people keep talking about power performance per watt, do we know for a fact that a 90w Apple GPU would perform 4x faster if it was running at 500w? Does the performance scales linearly?

The other thing is price, the m1 max cost $5k here. You know what kind of PC you can get for that price? A 3080 MSRP is $700 and beat any mac GPU right ( and that's a GPU from last year ). 5900x cost $480.


The obvious answer is that performance/watt is literally everything when talking about laptops. Most people favor laptops these days even if they (like me) leave their machines "docked" 90% of the time.

    do we know for a fact that a 90w Apple GPU would 
    perform 4x faster if it was running at 500w
It's somewhat besides the point IMO, but don't GPU workloads tend to scale well simply by adding more compute units? This should be pretty easy to extrapolate by comparing M1 / M1 Pro / M1 Max power consumption and performance relative to the # of GPU cores.

I would certainly like to see Apple go nuts and make a true, cost-and-power-consumption-are-no-object world destroying GPU. If only as a halo product. But I wouldn't hold my breath.


Performance/watt it not everything. It's not really even something. It's a derivative concern.

The only reason I care about performance/watt is because I actually care about noise, heat and battery life. I'm not running a data center off a laptop so the cost benefits of performance/watt are not even a concern.

Its a nice chip but we shouldn't confuse the marketing for the actual benefits.

If a higher performance/watt but much lower absolute performance chip was released, I wouldn't be interested.


    Performance/watt it not everything. It's not really even something. 
    It's a derivative concern.

    The only reason I care about performance/watt is because I actually 
    care about noise, heat and battery life. 
In 10 years this is one of the more confusing I've ever seen on HN. You care about noise, heat, and battery life but you don't care about performance/watt? Performance/watt is a "derivative" concern, but somehow those metrics aren't?

Presumably, all you really care about is raw performance then?

Well, OK, fair enough. We can all care about different things. But even though, performance/watt still rules: for all practical purposes the watts part constrains the performance part.

For a given power supply and cooling system, you can only draw and dissipate X number of watts. So you are back to the question of how much performance the system can provide given X watts.

Even no-compromises cooling solutions most would consider highly impractical (involving liquid nitrogen etc) run into this issue at some point. So yeah, you actually do care about performance/watt?


> The only reason I care about performance/watt is because I actually care about noise, heat and battery life.

Yeah, all pretty major concerns?


The point is at best I care about perf/watt indirectly and the things I do care about can be improved in other ways so its incorrect to say perf/watt is everything.

I admit it's a pedantic comment but I was pulled in by use of "literally everything".


The other ways to mitigate those things have pretty significant downsides. For instance, maybe you can stick a bigger battery in there to improve battery life, sure. But now it's heavier and more expensive.


    The point is at best I care about perf/watt 
    indirectly and the things I do care about can 
    be improved in other ways 
Not necessarily, at least not for single core performance. Even if you're willing to accept unbounded power consumption and waste heat, the die size of the CPU itself won't be increasing at the same rate.

You fairly quickly learn that dissipating kilowatts of watts of heat from something as tiny as a CPU die (or, more accurately, one particular subset of functional units from in a CPU core) is awfully difficult.

At some point, the CPU/GPU/etc need to be kept under a certain temp, which sets an upper bound on watts. So, no escaping watts/performance....


> and the things I do care about can be improved in other ways so its incorrect to say perf/watt is everything.

Okay, noise, _maybe_, but how are you proposing to improve heat and battery life without improving perf/watt? Your options for battery life are either to reduce performance, or to increase battery size (but, well, good luck with that; the 16" MBP is already as big as a laptop battery can practically be, as if it was any bigger you wouldn't be allowed take it on a plane). Your only option to reduce heat without improving perf/watt would be to reduce performance.


I think increasing cores/power limits just gets you up to some other bottleneck.

At some point you are adding 90% more power use for 3% more performance.


One reason: try to work alongside a 3080 or 3090 running at full power for awhile, and you will understand. Especially if you don't have really good AC in your work area during summer. =)


Why would I run it at full power? I could have M1 Max-beating performance while staying well under 70, and modern Nvidia cards will happily idle around 30c when driving hi-res displays.


But you only need like 12% of the power to match Apple?


You can downclock your GPU as much as you want and get better perf/watt, which is what most crypto miners do


> Does the performance scales linearly?

On GPUs, pretty much. It’s even called embarrassingly parallel for this reason.


At some point you hit bandwidth limitations, right? Unless each core has local, dedicated memory they will need to access the GPU's shared memory for various things right?


>At some point you hit bandwidth limitations

Yes, with HBM3 you could go up to 5TB/s and in the future 6.4TB/s. To give a comparison a 3980 RTX Ti has ~900 GB/s with GDDR6x.

But just with everything we hit TDP or cost ceiling well before any technical barriers.


PCIe 4.0+ is capable of ~31504MB/s on an x16 interface. Besides that, you might run into memory issues, but I honestly haven't seen a memory-bottlenecked workload on GPUs outside of the crypto mining scene.


>do we know for a fact that a 90w Apple GPU would perform 4x faster if it was running at 500w? Does the performance scales linearly?

The answer is yes. Scaling with more cores running at the same frequency giving unlimited bandwidth. Yes it is pretty much linear for GPU.

I guess that is why most of the discussions is so tiring. Because this fact is not known.


Battery life, energy costs, waste heat... take your pick.


I do fully expect to be blown away by the performance of the M1 Max when I finally receive mine as I'm coming from a 2012 MBP. I'm getting tired of the entire system grinding to a halt when I open one of those 25k-line classes in android studio. Probably even the regular M1 would handle this much better.


It's kind of hard to compete with raw power if you don't want a space heater of a device (hint: not wanted in laptop form factor).

However, part of me thinks this is Apple doing iterative rollout best - they could have created a 20-core desktop version of this in 2021 but will likely delay that to next year for the new Mac Pro and iMac Pro.


I wonder if it could perform much better than 1/8 of the 3090 on memory bound applications, given that its 400GB/s bandwidth is 1/2.34 that of the 3090's 936 GB/s...


It's a laptop with integrated graphics, not a desktop with high-end high-wattage discrete graphics. Astounding how far they've come and that's version 1.


I was like you at first, seeing that my chrome speeds don't feel that fast, then I did my first XCode build. Holly.. it is so much faster than my 2017 Macbook Pro.

Super fast builds, and A huge productivity boost. I think it shows that if software is optimized for it, M1 can be/feel crazy fast.

Unfortunately, anything running with Roseta, will not feel that fast, as it's performance will be hobbled by the emulation layer.


Yet even still under Rosetta, my M1 Pro can handle more than twice as much "stuff" in my Ableton projects vs. my 2018 MBP, all while barely heating up and without _any_ fan noise (whereas my 2018 MBP starts sounding like a jet engine almost immediately after opening any sizeable project).

So for me, its performance completely lives up to the hype. I don't think I'll need a new machine for a loooong time.


Imagine how much better it will be with the native build: https://www.ableton.com/en/blog/live-111-apple-silicon-suppo...


The main issue with that is a looot of plugins I rely on day to day still haven't been updated with native support, so they wouldn't be usable while running Live native. I really don't understand the hold up since some of my plugins were updated almost immediately after the original M1 was released.


> so they wouldn't be usable while running Live native

Live still doesn't have built-in bit bridging? (Of course, bit bridging doesn't come free.)


Note that one part of this is that llvm is significantly faster to generate code for arm64 than for x86_64. I'm not saying that accounts for all the difference, but it helps.


Even the M1 port of Chrome is clunky, but then again chrome has always been clunky with MacOS.

Firefox's M1 port actually got me to finally start transitioning out of chrome after being a fairly devoted chrome user since it released. It's the only third party browser on M1 macs that comes close to Safari in speed and handles window size changes (especially when video is playing) leagues better than Chrome.


Chrome completely crashes a few times a week on my M1 Max laptop. Very frustrating. Not even necessarily when I'm doing something actively, or intensively. I can have HN open as my only tab, be reading an article and I look back at the screen and Chrome has gone.


Do you run virtual machines on it?


For M1


Pretty sure the panegyric performance claims are from Mac people comparing it to the rather sad previous generation of Macbooks.


One thing I don't see anyone mention is that the M1Max is the probably the cheapest GPU memory you can buy. The only other way to get >=64Gb GPU memory is with the A100 (?) which is like 20k by itself.

So this would be great specifically for finetuning large transformer models like GPT-J, which requires a lot of memory but not a lot of compute. Just hoping for pytorch support soon..


Do you have any evidence one can use >48GB RAM on transformer tuning on the new M1 Max? That would be for me the only reason to buy it as I dislike notch very much, have a beefy 3080-based notebook already and can use 2x3090 for 48GB transformer tuning.


Would it make sense to mine BTC with it?


BTC mining isn't memory-bound, it's compute-bound. The more sha256 hashes you can compute per second, the better. And I'm highly doubtful that any general-purpose hardware at all could even begin to compete with mining ASICs.


Would you or someone happen to know/guess the hash rate of the M1 Max?


I don't think there's a definite answer to this. There's no existing bitcoin miner software optimized for Apple hardware. So, do you use the CPU? The GPU? Both? If you do use the GPU, which API do you use to program it?


At this point it doesn't make sense to mine BTC on any GPU.


Hmm, technically no, an R7 5700G system and 2x 32GB sticks would be around a fifth of the price. That's if you don't need a lot of compute. It's also right around 1/8th the performance of a 3080 but it doesn't have Tensor cores which is big downside in ML.

In theory you could even push the 5700G to 128GB if you figured out a way to get ECC to work on it.


But that’s system memory. Not GPU memory. M1 shares that memory, so it’s addressable by both directly, but with ryzen (and almost every other consumer platform) the cpu and gpu memory are separate.


No, you misunderstand. The 5700G is an APU. Memory is shared between it's GPU and CPU. Hence the "G".


It's unlike 5700G would push 200-400GB/s for GPU tasks, assuming one gets pytorch/tensorflow to use the shared memory and BIOS allows setting such a large shared window, all of them unlikely unfortunately.


Now that's a completely different argument, and still mostly incorrect. The 5700G will access as much memory throughput from its GPU as you can feed it. The limitation is not the GPU, it's how fast you can clock your RAM.

The BIOS doesn't set the shared memory. The BIOS sets the dedicated memory. The shared memory is set by the OS and driver as you need, and the only limit is how much memory you have and how much is used by other processes.

You can force any program to use shared memory by making dedicated memory low. As I said, these programs don't really choose to use it, it's a driver/OS responsibility.

The 5700Gs memory controller indeed can't go above 100GB/s. However 200-400GB/s is not what the M1 Max GPU can do, it's combined performance. You'd have to substract CPU performance. The M1 Max GPU would still be faster of course. But the premise is that GPU performance doesn't really matter.


Excellent investigation.

I have the first M1 MBP pro 13” and have done a lot of data stuff on it. My experience was also that python flew - cpython on the M1 being almost as fast as pypy on my 2019 i7 laptop - and java compilation was much faster too. The CPU is fast and the memory is really fast.

The performance pain points though was anything involving containers, random 10-30 sec stalls in boot and app startup (I think it’s corporate firewall stuff) and a general preference I have for Linux desktop over OSX (yeah I’m a programmer).


Delay in app start-up might be also related into binary signature checking on the cloud. (Built-in anti-virus…)


Would tasks get faster if the internet was turned off?


I also got an M1 Max. The chip is amazing. Compile times are a lot faster than on the 6 core Intel Mac mini I had before.

But at this point it's really held back by Apple's software.

Anything related to Apple ID and iCloud regularly hangs 30-60 seconds, showing a spinner with no progress indicator whatsoever.

Apps randomly take 20 seconds to launch, maybe because of [1]?

The Open/Save dialog taking 30 seconds to show.

ControlCenter using 8GB of RAM to show a few sliders (I hope they fix that bug soon).

The scanning feature in Preview is so unreliable that I started using my Windows machine for scanning something on my HP all-in-one.

Some of those problems may be issues with 3rd party software (drivers), and others are just things that slipped through QA, and will hopefully be fixed in an update.

But some of the issues are structural issues, where Apple has made questionable decisions that means issues can never be fixed.

Eg. designing a security architecture that requires synchronously checking a binary signature during app startup with a web service is bound to cause performance issues.

Or the design of the XPC system, which uses asynchronous message passing between services that are implicitly launched on demand sounds nice in theory, but it has been the source of so many bugs, causing temporary or permanent app hangs that are impossible to debug. The system was introduced in macOS 10.7 (!) and it still doesn't work reliably! At this point I've lost hope it will ever work properly.

[1]: https://sigpipe.macromates.com/2020/macos-catalina-slow-by-d...


You're not alone. my m1 mac mini is better, but i still see random beach balls, though the duration is less than on the intel mbp.

For anyone saying "I never see this" or "something's wrong with your system" - I've seen these sorts of problems, in some capacity or another, over ... 12 years, multiple macbooks and imacs, multiple OS versions, multiple internet providers, from various parts of the world. I think the folks saying "never affects me" simply do not notice this stuff. I don't know how/why you can't notice stuff like this, but I've been present where I've noticed people getting beach balls, I've pointed it out, and was told "oh, didn't see that". Not saying every single person is missing every single instance 100%, but I've no doubt this interrupts peoples' flow different from mine.

If I've paid $4k for a laptop and click a button, I don't expect to wait for... 1-2 seconds, then see a beach ball, then wait.... then wait some more... before a button click is recognized. It's better today than last year, and the year before, but... wtf... it's still there.


Hardware used to get faster more quickly than software got slower, so you could make forward progress by upgrading hardware. Then it seemed about the same, so following a hardware upgrade things would be about as fast as they had been when my previous hardware was new several years prior.

Now we seem to be at a point where the software is getting slower more quickly than the hardware is getting faster, so that following a hardware upgrade everything is a bit slower than it was following the prior upgrade.

Software sucks, and hardware is amazing.


Frequent random beachballs made me switch from FF to Chrome ~10 years ago. Somewhat-less-frequent random beachballs made me switch to Safari.

Not that I never see them, but that cut them to under 5% the rate I'd been seeing them. Xcode still does it pretty reliably, but luckily I don't work in that much. Now if I see them it's usually because WhatsApp or Slack have gone insane, which they do every few weeks. Or some web app. Point is, there's usually one thing going nuts that causes it, it's not just a constant feature of my desktop even under light-ish load.

Some heavier stuff, like Android Studio with the Emulator running, might still beachball quite a bit. Dunno, been out of Android development for quite a while. I'm sure there are some workflows that still have the problem. The browser thing kept them from being a common feature of my personal experience, though.


"If I've paid $4k for a laptop and click a button, I don't expect to wait for... 1-2 seconds, then see a beach ball,"

Is this even possible? I don't think a 50k computer can guarantee no wait on software. How would any hardware prevent excessive software utilization?


Apple's entire USP is vertical integration. The fact that Linux is significantly more responsive than Mac on the same hardware is an embarrassment to Apple.


The beachballing in Contacts has been driving me crazy for almost two years now. I have 7,413 cards in Contacts and even went so far as to completely delete my address book and reimport it from VCF. There is very clearly some synchronous dependency on iCloud. Turning Wi-Fi off is the only way I can use Contacts without it beachballing.


Are you running any 3rd party kernel extensions, if so which ones? (E.g. littlesnitch, FUSE, etc....).

I've got an M1 and M1 Pro and I've never seen anything like this, macOS in general has some long standing software bugs around the performance of apps like Music and Preview that I've seen hit on the M1 processor but they seem to make less of an impact to the usability compared to the Intel processors.


I'm not running any kernel extensions on this Mac, the first iCloud related hang happened during the setup process before I installed anything at all.


I avoid iCloud problems by not using iCloud. Seems to be a lot more trouble than it’s worth!


Maybe you have DNS issues? It's always DNS, so they say.

XPC is not asynchronous; that's up to the individual caller. The synchronous methods are easier to debug for sure.


> XPC is not asynchronous

It's been some time since I dug into the internals of XPC, but my assumption was that the underlying protocol is asynchronous, and if you do sync calls the wrappers just do the waiting for you.

The problem is that it has a tendency to get stuck in some rare cases, where services just don't reply for some reason. Then the sync calls are the worst -- the UI of the app is completely frozen and there's nothing you can do except restart the app. If the problem is with an Apple service (like Apple ID) then the only way to fix it is to restart the Mac and hope it doesn't happen again.


XPC sync is not in fact async underneath in recent macOS versions. It’s a severe performance pessimization to use async in many cases, because sync propagates thread priority and async often can’t.

You don’t seem to have a full grip on the reasons for the intermittent hangs you’re experiencing. Can I suggest two things?

1 Grab a sysdiagnose during one of the hangs and file a feedback report with Apple

2 Use the `sample` command line tool to see what’s actually hanging a particular process for yourself


Yeah, I've recently done exactly this for one of the hangs. Unfortunately Apple was extremely unhelpful, it took a few back-and-forths with DTS to get a response from someone who even bothered to actually read my bug report. (I just got a few generic replies "please submit an Xcode project" first). But even then they just kept asking me for a way to reproduce the issue, which I couldn't, since it happened on a customers machine, and disappeared after a restart. Even though the issue was reported multiple times over a couple of months by various customers, it took me a lot of time to actually get a usable sample from a customer. But apparently even a sample of the process showing exactly where Apple's frameworks are hanging isn't enough for them to start investigating.

I try to report all of the issues I see to Apple, but at some point there's nothing I can do except complain that Apple's frameworks are buggy.

The biggest problem with these bugs is that customers always seem to think it's an app bug, and there's nothing the app developer can do except hope that Apple fixes the issue. One early sandbox bug took Apple about 3 years to fix. To be honest I don't even know if they fixed it, it was never mentioned in a changelist, it's just that I stopped getting reports of the issue at some point.


> 2 Use the `sample` command line tool to see what’s actually hanging a particular process for yourself

Don't use "sample" if the issue could be multi-process or in the kernel, use "spindump".


> XPC sync is not in fact async underneath

That's interesting! I thought XPC was a wrapper around async mach messages. Do you have any pointers where I can learn more about this?


When you get to kernel land it is "something async plus a wait"; there's no other way for it to be when the remote task is on a different core.

But it's a special wait that the scheduler and other systems know how to benefit from using vouchers/turnstiles/etc. If you look at spindump output you should see it.


My M1 Pro has none of these problems. I would try a fresh install.


Yes, this is the classic solution to problems with Mac (and Windows): reformat and start over from scratch.


If software is the issue, can't you run Linux on it?


I'm a Mac app developer, I'm stuck with Xcode.

But it's not all or nothing. For example I've started using Syncthing instead of iCloud Drive for some use cases and that works surprisingly well (Syncthing isn't without flaws either, but at least it shows exactly what it's doing making it a lot easier to debug).


AFAIR if you have Mac, you are legally allowed to run MacOS in VM on that computer.


Running macOS in a VM is okay for testing purposes, but in my experience it's not a great experience for productive work. There are a lot of graphics glitches because macOS assumes that you have hardware graphics acceleration, but no VM that I know supports that for macOS guests.


I'm not sure why a VM was supposed to help your situation but in regards to GPU in a VM Parallels can provide paravirtualized GPU to enable acceleration for macOS (metal), Windows (DX 9/10/11), and Linux (Virgil) guests.


It doesn't help at all on notebooks which of course are the subject of this discussion, or even nearly any Apple hardware at all at this point. But FWIW on systems that can support multiple GPUs using PCIe passthrough to a macOS Guest VM will make it perform very well. Of course legally that means only the now highly mediocre multi-year old price-unchanged Mac Pro. But running a hackintosh virtualized can work quite nicely. While very unlikely, perhaps this will become an officially possible option again someday if Apple ever does another expandable system.


Does anyone really think that running Linux on custom Apple hardware is actually going to make things _better_?


I have an M1 Air and can’t wait to install Linux on it. Backed marcan’s efforts and counting days until I go back to Linux again.


Same here. My M1 Air is good, and I use it for light purposes, but once Asahi becomes viable, OMG.


I don't know how it holds with the current M1s but I was much happier with Linux than macOS under my 2014 MBP, everything was incredibly much snappier.


It certainly rescued my a1502 from the trash bin.


The issue is that Linux isn’t running well enough on the apple silicon to be a viable option (yet).


Was playing around with it just this weekend. It's... not there yet. It will be, but it's a work in progress.

You could run it today if you wanted to, but there are a bunch of things that are missing.


Meh, every OS has its problems. The issues I have had with Linux have been consistently more annoying than those I have had with MacOS. I'm sure it's the opposite for some people.


>The Open/Save dialog taking 30 seconds to show.

This is _infuriating_ as a user. There is no possible good reason for this.


I wonder if he has nfs shares or if he’s running a pihole or similar. Apple may be expecting certain things in his configuration that are timing out.

I’ve never experienced any of these issues (multiple macs in various profiles).


I use multiple Macs, plus several iOS devices, behind a pihole, with iCloud. Never seen this. IIRC I do just use the default block lists, or something pretty close, though.


I'm on a slower setup and don't encounter this at all. You may want to get Apple to replace your laptop.


Interesting. For what it's worth, I've never had any of these issues on my Intel-based MacBooks (I've been using them since 2007).


That sounds terrible, but none of that is happening on mine.

The only slowness I’ve noticed has been related to the beta private relay


Chiming in to say I've also got an M1 Max. I was running High Sierra until I got my M1 with Monterey. Holy heck! If you've been updating gradually, maybe you didn't notice. But woweee stability is a train-wreck now.

Issues I've reported via Feedback Assistant:

1. Issue waking external displays. Typically 10+ seconds to wake my HDMI display, Thunderbolt is faster, but still a bit slow.

Slow wake of external displays might sound like a minor issue, and it is. What's not a minor issue is having all my windows slammed on top of each other on the Macbook's built-in display every time I wake from sleep.

2. No scaling for external non-4K monitors. I've got a 1440p 144Hz display, my options are tiny text, pixelated large text (decreased resolution) or install a third-party work around (https://github.com/waydabber/BetterDummy). The third-party work around breaks screen recording. Seems there's another bug in macOS where you can't use display mirroring and record screen at once - it crashes the screen capture tool.

3. Unresponsive Finder. Sometimes I can't drag and drop files. The rest of the UI works, but you cannot click and drag. No idea why.

4. Black screen (with cursor only) after wake from sleep. Think this is somehow related to clamshell mode, as I believe it enabled briefly when I plugged external displays in before opening the lid. Had to hard reset.

5. Audio popping. System wide. Interestingly there's no category for reporting sound issues in Feedback Assistant. Pretty crazy when you think about modern Apple's origins. Seems this is a pretty wide-spread issue. Theories I've seen have been related to Rosetta. However, I can confirm that arm64 binaries (e.g. Firefox) do occasionally lead to popping, but it may be that Intel binaries are running at the same time. It comes and goes. It's almost certainly a driver issue, doesn't sound like bad speakers, sounds like corrupt buffers.

Issues I haven't reported:

1. Text boxes not being responsive. As in, you can't click in them. I don't know if this is somehow related to the Finder issue. It may well be app specific, hence why I haven't reported it. However, I've observed it in Firefox and Jetbrains products.

2. Garbled rendering, particularly around fonts. Again, might be app specific. Main culprit is Slack. Restarting the app doesn't fix it, rebooting does though.

There's probably more that I'm forgetting. Honestly, this is a bit depressing. The M1 is a beast. If I worked on Apple's hardware team I'd be pretty peeved, because they've achieved something amazing and cruddy software is compromising the entire experience.


>Anything related to Apple ID and iCloud regularly hangs 30-60 seconds, showing a spinner with no progress indicator whatsoever.

On Safari that happens when you have lots of Bookmarks, History, Open Tabs etc.


I have an M1 MacBook Air and the Open/Save dialog opens in less than a second. I think your system has a defect.


My mother-in-laws m1 macbook air exhibits the same issue, and she hasn't installed anything but apple's own software on it.

Not every time, mind you. Only about once in 50 times. Sometimes she goes for weeks without it. Sometimes it happens several times in a day.

My 2016 imac has some weird issues as well, like not supporting some of my keyboards when I use a nonstandard layout file (swedish dvorak)- even though they work just fine with apples own keyboards.

Apple has some issues with quality control. Things have gotten steadily worse since 10.5, even though I do enjoy the new features.

Not just weird things like the one above, but basic things like standard shortcuts not working when you have Swedish layout. Now I cant remember which one it was, but it was something that should do a common thing, but instead brought up the "search in the help files" instead.

I have been bitten by it every time I use a new mac or do a fresh install.


These issues are intermittent and only happen sometimes. My system is not broken, it's a design defect/tradeoff in macOS sandbox. I've seen it on at least 5 different Macs on a lot of different versions of macOS. It's been an issue ever since they use a separate process to show the dialog.


I don't know if your system has a defect, but I've used many, many Macs over fifteen years on all released versions of macOS, and I've never once encountered the issues you're raising.


I think it's probably something between Mojave and Big Sur based on my recent upgrades. Sandbox is as good of an explanation as any.


It sounds like you may have a undiagnosed local networking issue. I’ve been using Macs since 1991 and the last time I had the sort of issue you’re talking about was classic macOS, where AppleTalk would literally hang the entire OS when it went into the weeds.


Network issues, diagnosed or not, should not affect the performance of local UI operations. "Don't perform I/O on the UI thread" has been a maxim for decades.


This is intermittent and has been an issue since Catalina (on x86) and occurs on my (fresh install, no migration, nothing weird) M1 mini on occasion.


for those curious about running their own matmul benchmarks, I wrote a script a while back that works with both linux and MacOS that should make comparison easy.

https://jott.live/code/blas_test.cc

I saw ~1.2tflops on the regular M1


On Linux I had to use `-lcblas` instead of `-lblas`. "6.63171 gflops" with a 24-core AMD EPYC 74F3


I used OpenBLAS on my cheap last-generation AMD Ryzen 7 4700U laptop like so:

git clone https://github.com/xianyi/OpenBLAS && cd OpenBLAS && make PREFIX=/opt/openblas install && curl https://jott.live/code/blas_test.cc | sed -n "/<code>/,/code>/p" | tail -n +2 | head -n -1 > blas_test.cpp

inspect blas_test.cpp file, and then...

g++ -I/opt/openblas/include/ blas_test.cc -lopenblas -std=c++11 -O3 -L/opt/openblas/lib/ -o blas_test && ./blas_test 512 512 512 100 100

and got a peak of about 192 gflops, averaging closer to 180. So yeah, the M1 is > 6x faster in this simple single-precision matrix test.


541 gflops here, following those steps. Well done Apple for making a laptop CPU over 2x faster than a 250W server CPU released this year :)


With my Ryzen 7 5800U laptop I get around 530 gflops, with a peak of 596 if I compile the test against MKL with

g++ -I/opt/intel/mkl/include/ blas_test.cc -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -std=c++11 -O3 -march=native -L/opt/intel/mkl/lib/intel64 -o blas_test_mkl


If you swap out “code” in the url for “raw” you get the raw text itself without a need to use sed

https://jott.live/raw/blas_test.cc


2.1 TFlops on 8(6 efficiency) core M1 pro


Finally some benchmarks beyond just encoding video (which is admittedly a huge use case for these CPUs). I've been on Windows for decades, and this step change in computing performance is a Siren's call for me to switch, I've never wanted an Apple product so much as this before.


Didn't numpy remove Apple Accelerate support recently because it has numerical problems? Their docs are still warning against it.


Yes, but then they brought it back here https://github.com/numpy/numpy/pull/18874 - apparently Apple developers fixed these issues.


Glad to see benchmarks and a writeup for an audience doing something other than video editing. It feels like Youtubers only make review videos for other Youtubers these days. Even if the methodology here isn't very scientific, it's at least useful as anecdotal info.


Did I get the worst M1 Max in the world? In my one month with this computer so far -- its been problematic. It has this fun issue where it freezes up for a couple seconds randomly when watching youtube. Its frozen for a few seconds in other situations also. One time it just rebooted completely right in the middle of using it. Add to that I've never gotten more that 4 hours out of this battery.

Personally I'm kind of regretting the purchase. My 2015 Macbook Pro is faster than it.


Did you use Migration Assistant to move your stuff over? You might be using Intel versions of your apps.


If you haven't already, speak to Apple about replacement under warranty. Those numbers aren't justified by anything apart from hardcore video editing 100% of the time on battery.


Please don't open sharing options when I highlight text - that's how many people including me keep track of reading position. I had to try and adblock the element which pops up to read this.


> We already know that the M1 Max CPU should have really strong matrix multiplication performance due to Apple's "hidden"/undocumented AMX co-processor embedded in the CPU complex, and that it is leveraged when you use Apple's Accelerate framework

Does this hold for the M1 Pro?


I see no reason why not considering the core CPU component of the SOCs are very similar.


Good to see a detailed benchmark. I’m pretty impressed by the performance in real-world applications as well - the machine is easily 2.5—3x as fast at running various builds and processing jobs as my 15” from 2018 was, and it’s cool and quiet while doing it.

The performance claims have been a bit overblown in some quarters - it’s not going to replace a 5950X with a big GPU, and some of the rhetoric is a bit silly. But it’s surprisingly close - watching a silent laptop rip through a build faster than the 125W TDP i9-10900K we have in the office is pretty cool!


Regarding training ResNet50, even though img/sec is less than the 3090, could a 64gb m1 max accommodate larger image sizes than the 24gb 3090?


Probably they would be close - M1 still needs to use memory for the OS and other stuff, while 3090 can use fp16/mixed precision, which in many cases almost doubles effective memory. Also if we talk about training, then a more mature CUDA implementation of things like batch normalization and optimizers can also result in lower memory usage compared to a likely less mature TF Metal support.


I wonder if they'll somehow include the AMX 'instruction' (or whatever it is) into BLIS kernels. GEMM isn't everything, but it is a pretty important building block in linear algebra. (I mean that's the big observation of these fancy tile based BLAS implementations).



Good post. One thing to note is that the 3090 being 8 times faster is not very correct statement. The author is comparing FP16 3090 with FP32 M1. The difference between them is more like 3-4 times for FP32.

Even that is not true FP32 for 3090 as tensorflow uses Nvidia's AI32 by default for convolution.


The memory bandwidth result is impressive.


For all the (truly) amazing performance from the M1.*, how much of the benefits are just coming from Mac users not realising how laggy their OS is on non-extreme hardware? I used MacOS for two years on a '15 i5 MBP and didn't realise how persistently sluggish it was until i blew everything away and chucked on Xubuntu. (Nothing magic about linux here, Gnome and KDE were as bad as MacOS)

Is the incredible performance of the M1 just going to enable a whole new generation of inefficient software?


This article literally benchmarks a variety of low-level compute heavy operations and compares them with a Ryzen. It’s about as far away from your take as possible.


You might have missed the point - he means that more we get performance from the hardware, less we care about optimising software. And this is already a problem with Electron for example.


I get that point, thanks. Neither that glittering generality nor the supposition that Mac users are blown away by the M1 simply because macOS was running on creaky Intel hardware have anything to do with the article though.


Yes.. software gets slower faster than hardware gets faster




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: