Hacker News new | past | comments | ask | show | jobs | submit login
How M1 Macs feel faster than Intel models: it’s about QoS (eclecticlight.co)
512 points by giuliomagnifico on May 17, 2021 | hide | past | favorite | 410 comments



It does not just feel faster, in many cases it is faster.

E.g. compiling Rust code is so much faster that it is not even funny. cargo install -f ripgrep takes 22 seconds on a Mac Book Air with M1, same on a hexacore 2020 Dell XPS 17 with 64GB RAM takes 34 seconds.


I did a side-by-side comparing with my friends' M1 macbook and my (more expensive, nearly brand new) ryzen 5800x workstation compiling rust. The ryzen was faster - but it was really close. And the macbook beats my ryzen chip in single threaded performance. For reference, the ryzen compiles ripgrep in 19.7 seconds.

The comparison is way too close given the ryzen workstation guzzles power, and the macbook is cheaper, portable and lasts 22 hours on a single charge. If Apple can keep their momentum going year over year with CPU improvements, they'll be unstoppable. For now it looks like its not a question of if I'll get one, but when.


I've done a side by side with my Ryzen 3700x compiling a Go project.

6 seconds on the Ryzen, vs 9 seconds on the M1 air.

`time go build -a`, so not very scientific. Could be attributed to the multicore performance of the Ryzen.

Starting applications on the M1 seems to have significant delays too, but I'm not sure if that's a Mac OSX thing. Overall it's very impressive, I just don't see the same lunch eating performance as everyone else.

The battery life and lack of fans is wonderful.

edit: Updated with the arm build on OSX. 16s -> 9 seconds.


> `time go build -a`, so not very scientific.

A tool I have enjoyed using to make these measurements more accurate is hyperfine[0].

In general, the 3700X beating the M1 should be the expected result... it has double the number of high performance cores, several times as much TDP, and access to way more RAM and faster SSDs.

The fact that the M1 is able to be neck and neck with the 3700X is impressive, and the M1 definitely can achieve some unlikely victories. It'll be interesting to see what happens with the M1X (or M2, whatever they label it).

[0]: https://github.com/sharkdp/hyperfine


M1 S Pro Max


Faster SSDs on the Ryzen machine? The SSD on even my 2019 Macbook pro is ridic


I find M1 optimized apps load quickly while others vary. But what really varies is website load times. Sometimes instant, sometimes delayed.

All this said, there really is no comparison. I don’t even think about battery for the M1, I leave the charger at home, and it’s 100% cool and silent. It’s a giant leap for portable creative computing.


> Starting applications on the M1 seems to have significant delays too, but I'm not sure if that's a Mac OSX thing.

That's a Mac OSX thing. It's verifying the apps.

https://appletoolbox.com/why-is-macos-catalina-verifying-app...


If it's Intel-compiled apps, it's also that Rosetta is translating them to run on M1 when they're first run, as I understand it.

Both of these are first-time launch issues, so comparing launch times on the second launch is probably more reasonable.


Good point. I'm really surprised they don't translate during installation (ahead of time).


I believe they do.


In certain cases, such as when they come from the App Store. I believe other apps do not get this.


I think Installer.app triggers the AOT translation, too?


I’d like to point out that compiling stuff is usually disk/io intensive. Could this not just be that the Apple machine has a faster hard drive/memory?


It doesn't. It has the same 4 lanes, PCIe 3.0 flash as everyone else.


They have an extremely heavily integrated flash controller. Part of it is what they bought from Annobit.


I know, but the flash controller isn't the slow part (usually ;) ).


It does for real world tasks, rather than benchmarks. The FTL plays a big part in real world, small access latency.


We still don't know how good is the FTL in the Apple controller; all the devices are still too new and haven't been dragged through the coal as all the other controllers. It is still in the "easy job" part of it's lifecycle, with brand new flash cells.

However, to quote @Dylan16807 from similar discussion few weeks ago (https://news.ycombinator.com/item?id=26118415):

> The analog parts are the slow parts.


They've been using custom controllers for over a decade.

And the Annobit IP includes the analog parts as a large piece of their value add.


We are getting too deep into irrelevant things. We don't know how much of Anobit IP was used in M1 macs; they may own it, but they might not use it all. They purchase their NAND and it may be not compatible with the current gen, just like when it was not compatible with Samsung's V-NAND/3D TLC.

In practice, the I/O performance of M1-based Macs is comparable to random PCIE 3.0 NVMe drive. (I'm typing this comment on M1 MBP, I'm well aware how it performs).


PCIe 4.0 is the current standard.


update:

time GOARCH=amd GOOS=linux go build -a

6s for the M1 too!


Do keep in mind that go heavily caches for "go build":

https://golang.org/cmd/go/#hdr-Build_and_test_caching


the "-a" directive in "go build -a" should cause a clean rebuild, which is what they were using


Did both machines compile to the same target architecture? If you did native compiles, then perhaps LLVM's ARM backend is simply faster than its x86 backend...


I think this is potentially a huge part of it - I'd do the benchmark by doing a native compile and a cross compile of each, and also do the same on a RAM disk instead of SSD (a large compile can end up just being a disk speed test if the CPU is waiting around for the disk to find files).


Maybe that would have been the case if only compilation times were reported to be good. But no, this is across many different kinds of workloads. Even Blender running in Rosetta can beat native x86 processors, which is bonkers.


I think these performance metrics are somewhat limited in their usefulness. A Ryzen workstation might not have the same single-core performance or energy efficiency—however, a ryzen workstation can have gobs of memory for massive data-intensive workloads for the same cost as a baseline M1 device.

In addition: let’s talk upgradability or repairability. Oh wait, Apple doesn’t play that game. You’ll get more mileage on the workstation hands-down.

The only win for those those chips I think is battery efficient for a laptop. But, then why not just VNC into a beastmode machine on a netbook and compile remotely? After all, that’s what CI/CD pipeline is for.


granted they charge exorbitant prices for their hardware, but I can’t believe how my 2010 MacBook Pro is still functioning perfectly fine.. except for them making it unsupported. I can’t say that about any other pc/laptop I have had. Not even desktops


I don't know, I feel other laptops at the same price point as Apple Macbooks do this too, sometimes even better. I bought a HP 8530w in 2009 or so and it still works. Replacing the DVD drive for a SSD required just a common Philips screwdriver and battery replacements are sold by HP themselves or many others.


Exactly. Too many people compare a $400 cheap Windows laptop to a $1200 Macbook. Compare like for like, and the thing is likely to last until it's absolutely obsolete. And, while I don't really support this, some might find it an advantage to replace the computer three times for the same price. But people should be comparing to a well built, upgradable laptop (especially those that support not just RAM and disk but also display upgrades and adequate ports), running an operating system that has no arbitrary end of life.


Lenovo or Dell displays are much worse than Apple displays even though the machine costs the same.


Sure, but my macbook pro has cooked it's display /twice/ now. Didn't go to sleep properly, and it overheated in my laptop bag.

No good way to check for it, because no LEDs on the outside. Only way to check is to see if the fans switched on after five minutes in the bag.


You don't want to know what the Dells are capable of doing. My XPS 15 2020 literally got on fire somewhere on the motherboard - not even a battery thing. Then I decided to go Apple only.


I would believe that, because Dell XPS laptops are bad. (I'm in group full of laptop needs who check these things. Anecdotal and "scientifically".)

You are better of with a business or workstation laptop.


I’d love to see an actual serious comparison between an M1 Mac and a $400 laptop. That would be hilarious. Since there are so many of them, can you direct me to one, or even a few?


The point is that with mid-2010s apple laptops, >5 year lifespans are the norm. With the majority of other, even comparably priced laptops, that is the exception.

There are other laptops that are similar or superior build quality to those from Apple (N.B. - older MacBooks, not the newer ones) but those are also easy to spot. They’ll usually be ThinkPads or some XPS models from dell.


> With the majority of other, even comparably priced laptops, that is the exception.

Consumer grade PC hardware has terrible build quality, and regardless of the price of your unit, the consumer build spec is just inferior to the business/professional lines. Asus, MSI, Sony, Acer, etc laptops all have consumer grade build quality and they just aren't designed to last a decade.

> They’ll usually be ThinkPads or some XPS models from dell.

Precision/XPS and Thinkpad models (with the exception of the L and E series) are almost always in the same price range as a MacBook. Any business-class machine (Thinkpad, Precision/Latitude, Elitebook) should easily last >5 years. These are vendors which will sell you 3-5 year on-site warranties for their laptops.

This is why you can find so many off-lease corporate laptops on eBay from any model year in the last 10 years or so. The hardware doesn't break, it just becomes obsolete.


For Dell, at least the business class desktops, they're trash, and are barely useable after 2-3 years and usually have some kind of problem long before that. I'm pretty sure Dell expects most businesses to buy new ones in that time frame.


I really want to like Dell's XPS line. I really do. But their technical support is atrocious. My XPS trackpad stopped working months after purchase, and getting them to repair it was an utter nightmare. Their tech support seemingly hasn't improved at all in the past decade (which is when I last vowed to never buy a Dell again due to their horrible tech support). They may fool me twice, but never again.

(I do hear that their business support is pretty good though)


> and getting them to repair it was an utter nightmare

~8 years ago; within 48h of the laptop breaking - had a Dell repair tech sitting at my kitchen table replacing mainboard on an XPS laptop. Has turnaround when you have the proper support contracts gotten that much worse?

(admittedly, we did pay for the top support tier for a personal device as it was expensed for work. I wouldn't do anything else from any manufacturer though unless I had on-site tech support/replacement.)


Not sure about consumer side, but as a business we have 24h turnaround service with Dell.


Oh that's baloney. There's nothing special about Apple laptops besides the metal case. Arguably they have worse cooling than most PC laptops. My 2018 MBP runs like it's trying cook an egg and has since day one. My Brother's 2012 MBP suffered complete logic board failure after 4 or 5 years.

If it wasn't for the replacement keyboard warranty offered by Apple a good chunk of butterfly keyboard Macs would be useless junk due to the fact it's so hard to replace them. Frayed MagSafe adapters were a regular occurrence. And swollen batteries pushing up the top case not that rare either.

I think maybe people keep MacBooks longer, but it probably has more to do with the fact they spent so much on them that they feel it's worthwhile to repair/pay for AppleCare than them actually being magically more durable.


Except that Apple considers these devices ’vintage' and will not provide OS updates or repairs.

https://support.apple.com/en-ca/HT201624


I was using my Dad's old ThinkPad 385XD from 1998 in 2009. Battery was unsurprisingly dead but every other piece was stock and worked although at some point I swapped the worn down trackpoint nub with one of the included spares we still had.


My "writing desk" PC is a Thinkpad X201 tablet from 2010, with the same SSD upgrade I put in my own 2010 Macbook Pro (a dedicated Logic Pro machine these days). There have always been manufacturers for whom that's the case on the PC side of things--you just kinda had to pay for it up front.


My two main PCs are a Phenom II-based desktop and a Thinkpad X220i (with the lowly Core i3, even!). Both are perfectly functional and usable today, with a few minor upgrades here and there, the usual SSDs, more RAM and a Radeon RX560 for the desktop.

The Thinkpad is obviously no powerhouse, but still works great for general desktop use, ie. browsing, email, document editing, music, video (1080p h264 is no problem). The desktop plays GTA V at around 40-50 FPS at 1080p with maximum settings. And this isn't some premium build, it's a pretty standard Asrock motherboard with Kingston ValueRAM and a Samsung SSD.

Decade-old hardware is still perfectly viable today.


I just had storage fail on my first gen touchbar macbook. It's a PITA, the storage is soldered onto a board. They replace the board, the can't recover the data (didn't expect them to). I'd pay the extra mm or two it would require them to just use a standard like m2. SSD storage just fails after awhile, especially if you do lots of things that thrash the disk.


Using 2011 sandybridge motherboard with a xeon-1230 i bought in 2012. I Had to replace 2 HDD + started using ssd for OS partition. It's working great, need to replace my nvidia GPU that is EOL but still working great.


I have an old gaming ASUS laptop from 2010. Still works like a charm after hard drive was switched to SSD. I have an even older Asus Netbook (15 years old eee PC I think) that still works. Netbook is too slow for modern software and I do not really use it but it works.


> But, then why not just VNC into a beastmode machine on a netbook and compile remotely? After all, that’s what CI/CD pipeline is for.

Is this how you work?


> Is this how you work?

This is exactly how I've worked for a number of years now, for my home/personal/freelance work. Usually using a Chromebook netbook ssh'ing into my high spec home server. I'd do the same for work, but work usually requires using a work laptop (MacBook).


I've worked that way for 10 years. My current desktop is a 5 year old Intel i3 NUC with a paltry 8G of memory. Granted, it uses all that memory (and a bit more) for a browser and slack, and the fan spins up any time a video plays. But usually it's silent, can drive a 4k monitor, and most of the time I'm just using mosh and a terminal, which require nearly nothing.

OTOH, the machine that I'm connecting to has 32c/64t, half a terabyte of RAM and dozens of TB of storage.


> the machine that I'm connecting to has 32c/64t, half a terabyte of RAM

Ok I'll bite, what do you do? Do you think halving the number of cores / RAM would impact your productivity?


A lot of what I do is compiling, so for that I'd still be fine with fewer cores and a lot less RAM. But I also do backtesting of trading strategies, and for that I can use all the cores I can get. The memory is needed to cache the massive amount of data that is being read from a pair of 2T NVME SSDs. Without adequate caching, I/O can easily become the bottleneck, even though the SSDs are pretty fast.


My work takes place at a beefy desktop machine. I wouldn't want it any other way... I get to plug in as many displays as I need, I get all the memory I need, I can add internal drives, there's no shortage of USB ports or expansion - and I get them cheap. For meetings or any kind of work away from my desk I'll remote in from one of my laptops.

All that and my preferred OS (Manjaro/XFCE), which runs on anything, has been more stable than any Mac I've ever owned. Every update to macOS has broken something or changed the UI drastically and in a way I have no control over...

If I ever switch away from desktops, it will be for a Framework laptop or something similar.


This is interesting - in the sense that you are someone who doesn’t want the UI to change, but it’s really not clear what this has to do with the question or the article.


I'm not the guy above, but I concur with the sentiments. After a while, adjusting to trivial UI changes becomes a huge chore and unnecessary cognitive overhead. It's relevant, because in order to use the M1, you have to buy into Apple's caprice.


Caprice seems like a weird way to characterize an aspect of the Mac a lot of people like.

I think it’s valid to want not to have to deal with the cognitive overhead of UI evolution.

It’s equally valid not to want to deal with the cognitive overhead of various attributes of Linux.

What’s not obvious is why people sneer about it.


Well, actually I have a beastmode mobile workstation that gets maybe 3 hours of battery life on high intensity. And when the battery is depleted I find a table with an outlet and I plug it in.

Everything in the machine can be upgraded/fixed so it should be good for a while.

I’m not saying this to be snarky. I just want to emphasize that while M1 is great innovation, I put repairability/maintainability and longevity on a higher pedestal than other things. I also highly value many things a computer has to offer: disk, memory, CPU, GPU, etc. I want to be able to interchange those pieces; and I want to have a lot of each category at my disposal. Given this, battery life is not as important as the potential functionality a given machine can provide.


Which 2021 laptop has replaceable CPU?


That's 90% how I've worked in ~15 years as an SWE.


> The only win for those those chips

I suspect the number of people, even developers, for whom 16GB memory is plenty probably greatly exceeds the number who need a beast mode Ryzen. But even then, a large proportion of the devs who might need a Build farm on the back end would be doing that anyway so they might as well have an M1 Mac laptop regardless.

Anyway Mac Pro models will come.


Power consumption is a good point. I wonder what the M1's power consumption is during those 19.7 seconds of compiling ripgrep compared to other platforms.


Low.

For me, SoC power draw when compiling apps is usually around 15 to 18W on a M1 mac mini.

My example app (deadbeef) compiles in about 60 seconds on that mac mini.

On a 2019 i7 16” MBP (six core), it takes about 90 seconds, and draws ~65W for that period.

So… radically more power efficient.

edit: this is the same version of macOS (big sur), xcode, and both building a universal app (intel and ARM).


I'm seeing a big different when compiling our project, with the M1 Macbook Pro beating the iMac Pro when building our projects (23min vs 38 mins).

Is it all CPU though, or is building ARM artefacts less resource intensive than bulding ones for Intel ?


iMac Pro has a large variety of performance, going from 8 to 18 cores and it uses a 4+ year old Xeon CPU. Unsurprisingly the 18 core handily beats the M1 in multi-core benchmarks:

18 core iMac Pro 13339: https://browser.geekbench.com/macs/imac-pro-late-2017-intel-...

M1 Mac mini 7408: https://browser.geekbench.com/macs/mac-mini-late-2020


>For now it looks like its not a question of if I'll get one, but when.

My exact sentiments. I've been looking for a gateway into Apple and the M1 Air seems like it. It has now become a matter of time and not just a fleeting thought.


> If Apple can keep their momentum going year over year with CPU improvements

I’ve been blown away with games running on M1. If Apple could up their GPU game as well, that’d be really cool.


Are you sure Rust compilation can use multiple cores to the fullest?

I don't use Rust, but a quick search returned for example this issue:

https://github.com/rust-lang/rust/issues/64913


My cpu usage graph shows all cores at 100% for most of the compilation. But near the end it uses fewer and fewer cores. Then linking is a single core affair. It seems like a reasonable all-round benchmark because it leans on both single- and multi- core performance.

And I really care about rust compile times because I spend a lot of small moments waiting for the compiler to run.


How long is the battery life on both if compiling non-stop? Assuming both keep similar compile from start of battery to end it would be interesting to see if the ryzen is truly guzzling batteries.


I'm guessing the 5800x work station is a desktop, not battery powered...


You are right. Missed then when i was angerly responding.


I have a Ryzen 7 4700g. I wanted to compare this to the GPU side of the M1. On Geekbench OpenGL test, it was slightly faster than the M1. I would like to find a better test.


> macbook … and lasts 22 hours on a single charge

I keep hearing this but have not experienced this in person. I usually get about 5 hours of life out of it. My usual open programs are CLion, WebStorm, Firefox (and Little Snitch running in the background).

However, even with not having IDEs open all the time, and switching over from Firefox to Safari, I’m only seeing about 8 hours of battery life (which is still nice compared with my 2013 MBP that has about 30 minutes of battery life).


>However, even with not having IDEs open all the time, and switching over from Firefox to Safari, I’m only seeing about 8 hours of battery life (which is still nice compared with my 2013 MBP that has about 30 minutes of battery life).

I would consider getting a warranty replacement. Something is wrong.

For reference, my M1 Air averages exactly 12 hours of screen-on time (yes, I've been keeping track), and the absolute worst battery life I've experienced is 8.5 hours, when I was doing some more intense dev workflows.


The Macbook Air does not get 22 hours but the Macbook Pro gets kinda close under the conditions specified in the test.


I'm still unconvinced it's the M1's design and not TSMC's fab process.


When you move to a smaller process node, you have a choice between improving performance or cutting power. (or some mix of both)

Apple seems to have taken the power reduction with the A14 and M1 on TSMC 5nm, not the performance increase.

>The one explanation and theory I have is that Apple might have finally pulled back on their excessive peak power draw at the maximum performance states of the CPUs and GPUs, and thus peak performance wouldn’t have seen such a large jump this generation, but favour more sustainable thermal figures.

https://www.anandtech.com/show/16088/apple-announces-5nm-a14...


I think the latest Ryzen 5800x CPUs kind of prove it's the TSMC fab process. You've now got M1s, Graviton2s, and Ryzens all crushing it to similar levels.


I dunnoh... the Ryzen 5800x laptops seem to be able to stay ahead of the M1's for most tasks.


Huh? 5800x laptop? The 5800x is a desktop chip.


The 'X' was meant to be "various letter extensions on the 5800 series", such as 5800U, 5800H, 5800HS. I probably should have used different terminology from the model number, as there are other Zen 3 mobile processors like the 5900HX, 5980HS and 5980HX that if anything make the point stronger.


After buying my M1 and benchmarking it against the top of the line i9 I considered shorting Intel's stock, alas they're so large it'll take a while for the decline to catch up with them.


In previous discussions about this it was pointed out that LLVM may be significantly faster at producing ARM code versus x86. The comparison may still be the one that actually matters to you but at least in part be an advantage of ARM in general and not just the M1. rust is very good at cross-compiling so compiling to ARM on Dell and to x86 on the M1 may add some interesting comparisons.


So then the next question is, is this really an apples-to-apples comparison? It wouldn't surprise me at all if the x86 back-end takes more time to run because it implements more optimizations.


Are you compiling for ARM in both cases (or x86 in both cases)? Otherwise you are building 2 completely different programs, one is ripgrep for ARM and the other is ripgrep for Intel.


The programs would be pretty much identical until they are lowered, which is generally not a bottleneck for Rust compiles…


In native Typescript compile of a very large angular app I see an even more dramatic 1:20s to 40s compared to a desktop i9. I feel as if the M1 may have been designed around a detailed and careful look at how computers and compliers work and then designed a CPU around that rather than the other way around.

It's like buying a car and modifying to take it racing vs buying a race car, the race car was designed to do this.


>> I feel as if the M1 may have been designed around a detailed and careful look at how computers and compliers work and then designed a CPU around that rather than the other way around.

This is the position that Apple have set up for themselves with their philosophy and process. It would seem that Intel and AMD have to play a very conservative game with compatibility and building a product that increments support for x86 and x64. They can't make some sweeping change because they have to think about Linux and Windows.

Apple own their ecosystem and can move everything at once (to a large degree.) This also gives an opportunity to design how the components should interact. Incompatibility won't be heavily penalized unless really important apps get left behind. The improvements also incentivize app makers to be there since their developer experience will improve.


> It would seem that Intel and AMD have to play a very conservative game with compatibility and building a product that increments support for x86 and x64.

Talk to people who design chips. The compatibility barely impacts the chip transistor budget these days, and since the underlying CPU isn't running x86 or x64 instructions, it really doesn't impact the CPU design. There may be some intrinsic overhead coming from limitations of the ISA itself, but even there they keep adding new instructions for specialized operations when opportunities allow.


Exactly this. Apple has spent decades drilling a message into third-party developers: Update your apps regularly or get left behind.

Everyone who develops for one of their platforms is just used to running that treadmill. An ARM transition is just another thing to update your apps for.


App developers are incentivized in another way: software that takes advantage of new features or performance are often what Apple chooses to promote in a keynote or in an App Store’s featured section.


Typescript compilation doesn't use all the cores available on the CPU; I think it maxes out at 4 as of now [1]. This might be working well for M1, which has 4 high performance cores and 4 efficiency cores.

[1]: https://github.com/microsoft/TypeScript/issues/30235


What OS are you running on the Dell though? Windows is notoriously slower for file system operations even without AV and the default there is to prefer foreground apps vs something like compile tasks.


Yep, see below. With WSL and an ext4 virtual disk it takes only 29 seconds. Still quite a bit more than the 22 seconds on the Macbook.


WSL2 right? Given the virtualization overhead for IO I would expect native Linux to be faster.


WSL2, correct.


Hard to compare without mentioning the actual CPU in it. FWIW my Ryzen 3900X completes that compile in 15s.

To be fair, that is a relatively high-end desktop CPU, but it's also a massive margin for a last-gen model.


We also have to consider the i7 and M1 run at about 1/7 the TDP of the Ryzen. Just underscores the good design behind the QoS and judicious use of the Performance cores vs Efficiency cores.


Yeah, but that's previous generation. Get a 5850U or even a 5980HX...


My water-cooled 3900XT is locked to 4.4ghz all cores with relatively fast double banked 2x16GB c15 3600mhz memory. (It never drops under 4.4 on any of the 12/24 cores.)

Also has a Samsung 980 1TB drive, and a 3070 GPU; the system truly is extremely quick. (Win10+WSL2).

(I'll have to try that compile in the next day or so; will reply to myself here).


OS matters for this comparison. If your Dell XPS was running Windows, that might explain the discrepancy.

For instance Windows on NTFS etc. is notoriously slow at operations that involve a lot of small files compared to Linux or whatever on ext4.

Unless your Dell was also running OSX, you're probably not comparing hardware here.


For me, it takes 34 seconds on M1 (MBP13):

    Finished release [optimized + debuginfo] target(s) in 34.50s
    ...
    cargo install -f ripgrep  137,66s user 13,00s system 435% cpu 34,632 total
For comparison TR1900x (yeah, desktop, power-guzzler, but also several years old), Fedora 34:

    Finished release [optimized + debuginfo] target(s) in 25.15s
    ...
    real        0m25,271s
    user        4m41,553s
    sys         0m7,216s


Thank you for the perspective.

Here's r7 4800HS (35w) on linux:

Finished release [optimized + debuginfo] target(s) in 24.98s

real 0m25.151s

user 4m54.987s

sys 0m6.764s


I just did a test on my old Lenovo laptop... Intel i7-8565U CPU @ 1.80GHz

# git clone https://github.com/BurntSushi/ripgrep && cd ripgrep

# cargo clean

# time cargo build

real 0m23,805s

user 1m11,260s

sys 0m3,806s

How is my crappy laptop on par with your M1? :)


You are not doing a release build.


thanks, problem solved - now it's 1 minute...


Wow. The fact that my Sandy Bridge laptop from 2011 does it in only 1:41 is pretty indicative of how badly processor improvements stalled out last decade. My processor (like yours, 4c8t): https://ark.intel.com/content/www/us/en/ark/products/52219/i...


You compare TDP 45W CPU vs 15W. I think this is great improvement. (But latter one is also 25W cTDP and turbo boost provides more power, many factors depends on how the laptop implemented)


I think the point is even a laptop from 2011 is acceptable for development. I wouldn't mind the power draw since its plugged in, the compile time either, cold compile 20s-2min is much the same for me, recompile where it counts is much faster


Are you counting download time as well? It took 8.30 sec on hexacore now when I tried it.


Nope, pure compile time on a Core i7-10750H with Windows and Rust 1.52.1, antivirus disabled. WSL did not make much of a difference though (29 seconds).


The Windows filesystem is extremely slow, even without WSL. It can be mitigated to some extent by using completion ports, but I doubt the Rust compiler is architectured that way--

You should benchmark against Linux as well.


cargo install on WSL2 uses the non-mapped home directory, which is on an ext4 virtual disk. That is probably one reason why it is six seconds faster.


> Windows

There's your problem.

Even Microsoft concedes that Windows is legacy technology now.


You must be joking? If not, provide us with a source


And here I thought that having a desktop CPU like Ryzen 5 2400G with loads of RAM would take me somewhere. It took the machine around 71 seconds.

EDIT: could you measure C compilation. For example:

  git clone --depth 1 --branch v5.12.4 "git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git"
  cd linux
  make defconfig
  time make -j$(getconf _NPROCESSORS_ONLN)
For me it's slightly below 5 minutes.


It took the machine around 71 seconds.

Are you including download time? On my Ryzen 7 3700X, it's done in 17 seconds.


Specifically I did it two times to not see the download/syncing time. Also to a fresh home. 3700X has 16 threads, 2400G has 8, that probably makes up for the most of the difference. But M1 has 8 cores (4× high-performance + 4× high-efficiency. Maybe rust version also counts? I'm on rust 1.48.

EDIT: I checked on 1.52.1, which is latest stable and it went down to 54 seconds. So that also makes up a significant difference.


It's been amazing for me.... EXCEPT for when resuming from sleep, for some reason it would be very slow for a minute.

I have since FIXED THIS! The solution is to just not let it sleep. I'm using Amphetamine and since then it has been amazingly fast all the time.


> I'm using Amphetamine and since then it has been amazingly fast all the time.

I think these app makers might need to start focus grouping some of these app names. Not only is it going to be harder to search for this app, but it'll inevitably also lead to some humorous misunderstandings.


They already got pulled from the App Store (although Apple reversed course after significant badgering): https://news.ycombinator.com/item?id=25605880


Is 22 seconds insetad of 34 seconds so much faster ? The difference does not feel huge to me


yes, it compiled the project in 35% less time. doesn't really matter much for such a small project, but imagine if on a larger project the intel part took an hour and the m1 did it in 39 minutes. that's a substantial speedup. of course, we might see different results after leaving the turbo window.


You're assuming that the scaling is linear though. What if there's a fixed 10 second ding on x86? That 39 minute compile on the m1 would only be 39:10 on the x86.


> What if there's a fixed 10 second ding on x86?

There isn't.


And yet it is extremely unlikely that it's linear or anywhere near it.


What is "linear" in this context? If the Y-axis is performance, what is the X-axis? That statement doesn't seem to make much sense without any additional explanation.

If I run a benchmark for a program on the M1, that's a single data point. It's hard to call a single data point "linear", "quadratic", or anything else... and you can't really put multiple benchmarks on the X-axis, because they're measuring different things.

What would even make it (whatever "it" is) "extremely unlikely"? People have had over 6 months to run benchmarks on the M1. Surely you can find a concrete answer to your question that doesn't involve random speculation on internet forums?

Based on my own experiences, I have no reason to believe that the M1's performance starts tanking if you run it for longer than a few seconds, in case you're implying that the other processors will "catch up" if they have longer to get up to speed... why would they? That makes no sense either. Longer compilations are faster on M1 too, relative to my Intel hexacore MBP, from what I've seen in the past. I mean, obviously, right? Why wouldn't they be? Intel's processors change frequencies in milliseconds... it doesn't take them minutes to warm up.

The M1 isn't a silver bullet. AMD makes laptop processors that are more powerful. But the M1 is still really good for what it is, and those AMD processors consume notably more power than the M1 to achieve their performance.


> I have no reason to believe that the M1's performance starts tanking if you run it for longer than a few seconds

This statement speaks volumes about your bias. You're immediately on the defensive and assuming we're attacking the M1 rather than the assumptions made in a test and it's methodology.

No one was implying that the M1's performance would tank. I specifically conjectured that perhaps x86, not M1, might have a penalty.

The original assertion made was that all x86 compiles were ~30% slower than the M1, but the benchmark was a ~20 compile. What happens if the compile is hours?

If the penalty is ~30% linear then a 60 minute compile on M1 is 80 minutes on x86. But if there's just a 10 second warmup penalty on x86 the compile time might only be 60 minutes + 10 seconds.

The truth is probably somewhere in between.


> But if there's just a 10 second warmup penalty on x86 the compile time might only be 60 minutes + 10 seconds.

But that doesn't make any sense, as I previously asserted. That's just not how any of this works. x86 is not suffering a "penalty". It doesn't need time to "warmup". I already discussed my views (based on actual real world experience) on all of this, but you conveniently bypassed those statements.

> This statement speaks volumes about your bias. You're immediately on the defensive and assuming we're attacking the M1 rather than the assumptions made in a test and it's methodology.

It really doesn't speak volumes. The default position I've seen is for people to assume that new technology is just a gimmick, and that results are only being cherry picked. And that's exactly what you just said you were assuming ("the truth is probably somewhere in between"), so I was right to reach that conclusion. No one here (that I've seen) is claiming the M1 is the unequivocal performance champion in all things, but it does really well in some use cases that impress me as a developer.

If it's really just about test methodology, you would just look up other tests, or run your own. Putting out completely unfounded statements like "what if there's a fixed 10 second ding on x86?" doesn't move the conversation forward when the hardware and results are so readily available. It's just a way of attempting to cast doubt on results. There's little need for speculation at this point, but you continue to speculate in defense of your previous comment without apparently having any hands on experience with M1.


> But that doesn't make any sense, as I previously asserted.

Correct. It is an extreme used as an illustration to point out that you can not simply take the time savings on one compile and assume it will save the same percentage on a longer compile. I am not sure why that is so difficult for you to comprehend, particularly since cptskippy has specifically said "The truth is probably somewhere in between."


> But that doesn't make any sense, as I previously asserted.

Where did you assert that in our thread?

> It really doesn't speak volumes.

It does, you immediately assume someone is attacking the M1 because we question someone's assertion that the M1 is 30 something percent faster based on one small compilation benchmark.


> Where did you assert that in our thread?

https://news.ycombinator.com/item?id=27189764

"[...] in case you're implying that the other processors will "catch up" if they have longer to get up to speed... why would they? That makes no sense either. Longer compilations are faster on M1 too, relative to my Intel hexacore MBP, from what I've seen in the past. I mean, obviously, right? Why wouldn't they be? Intel's processors change frequencies in milliseconds... it doesn't take them minutes to warm up."

This is where I asserted that.

The only way there could be a "10 second penalty" is if the Intel processor just takes 10 seconds to warm up, and then it's on even footing after that point.

Intel processors takes milliseconds to change frequencies, so... clearly this is not a logical discussion point.

> It does, you immediately assume [...]

You actually misinterpreted my comment, possibly because you didn't read the rest of the paragraph that I copied above in this comment.

I didn't assume in that comment that anyone was "attacking M1" in the way you keep implying, based on the sentence fragment that you had cherry picked in your previous response. When I said that the M1 performance wouldn't tank, I clarified in that same paragraph that I meant this to be a relative measure compared to the Intel processor, since the Intel processor had its time to warm up and suddenly be super fast. If you read the whole paragraph, you should be able to see what I actually said.

If anything, I was defending Intel's Turbo Boost technology... which is the opposite of the conclusion you apparently jumped to.


> in case you're implying that the other processors will "catch up" if they have longer to get up to speed... why would they?

I never implied that x86 would catch up or even surpass the M1. I was suggesting that x86 might have some sort of delayed start but that actual compilation might occur at a similar pace to the M1.

The OP had a single data point based on a sub 60 second execution of a small project and extrapolated it out to a 30+ minute compilation of a larger project for comparison. He suggested that on a graph of Compile Time based on Project Size the divergence in performance might be linear.

I was merely suggesting there might be something else at play causing a delayed start of compilation on the x86 side. You came in and authoritative shut me down and then tried to suggest that we were being vague with our use of the term linear.

Imagine two runners that can keep similar pace but one always starts the race with his shoes untied and has to lace up as part of the race. Once he's laced up he can start running and will keep pace with the other runner, who has a head start, but will always be behind. That's what I was suggesting. In this analogy the length of the race is irrelevant.


The X-axis is complexity. Please go back and re-read the context of my comment.


I'd gladly spend 14 seconds if it helps me avoid Apple products!


How long does it take on a Ryzen 9 5950X


Note that's a 16 core CPU with 105W TDP compared to a quad core < 10W M1


5850U is an 8 core CPU with a TDP that is adjustable between 10-25w. If you want to keep it to 4 cores just so it is a fair fight, there's the 5400U/5450U. AMD does pretty well...


Yeah I'd say the M1 and latest AMD CPUs are virtually tied for fastest single threaded performance, edging out the best by Intel.

But for something like compilation which is multithreaded obviously the higher core count will win.


Yup, and the new macbook pro will probably be twice as fast


Source?


Gut feeling ;-)


Ran a few cycles on my 5950x hackintosh:

  invaderfizz@FIZZ-5950X:~$ hyperfine 'cargo -q install -f ripgrep'
  Benchmark #1: cargo -q install -f ripgrep
    Time (mean ± σ):     19.190 s ±  0.392 s    [User: 294.890 s, System: 17.144 s]
    Range (min … max):   18.352 s … 19.803 s    10 runs
The CPU was averaging about 50% load, the dependencies go really fast when they can all parallel compile, but then the larger portions are stuck with single-thread performance.


My 5950X came in faster running in WSL2, but then I do have a Samsung 980 PRO running in a PCIe 4.0 slot.

  Benchmark #1: cargo -q install -f ripgrep
    Time (mean ± σ):     11.585 s ±  0.474 s    [User: 176.733 s, System: 5.677 s]
    Range (min … max):   11.271 s … 12.867 s    10 runs
Which I suspect leans towards an IO bottleneck.


Just for reference (5950x too): ran 3 times just measuring with `time` on void linux using cargo from xbps. Mean of the last two runs (first was syncing and downloading) was ~13.8 s.


Okay, compiling is faster. But you compile once and run many times. What about running it?


Well, they don't just "feel" faster. They also complete lots of real-world tasks in shorter times, often significantly shorter than the last Intel Macbook Pro.


It’s funny how many articles that just won’t say the obvious - it’s a faster computer.

I have two laptops side by side, one is an M1, the other a HP in a slightly higher price point (more memory, bigger SSD). The challenges Intel has in the form factor are obvious — it’s a 1.2Ghz chip that turbos almost 2x as heat allows.

In any dimension that it can do what you need, the Apple wins. Cheaper, faster, cooler, longer battery endurance. The detriments are the things it cannot do — run Windows/Linux or VMs or tasks that need more memory than is currently available.


It can't run Intel-based OSs or VMs, of course, but the latest version of Parallels Desktop runs ARM-based Linux guests on M1 Macs, as well as the Windows 10 ARM preview. (VMware has implied that the next release of Fusion will support Apple silicon as well.)


QEMU already supports x86_64 on Apple silicon. It was just slow last time I checked not sure of the performance now.


As far as I know, besides the generating slow code, qemu also does not support multi threading when emulating strong memory ordering architectures (x86) on weak hosts (ARM).


linux is comming


I was an unbeliever, but we got some M1 Mac Mini's and this is the result. They even beat the dedicated workstation many of our people have set up. I get when people say it won't quite match a new ryzen for compute power, but in all of our tests, the M1 beat out all our workstations.


What workloads do you run? I’ve got a 16GiB M1 mbp and a 2019 maxed intel and when compiling Java/sbt projects the intel is significantly quicker, albeit also much loader and power hungry.


As the sibling comment mentions, if you're running Intel JDK on M1 it will be slow. You can find M1 native JDK builds here: https://www.azul.com/downloads/?version=java-11-lts&os=macos...


Just for fun, Oracle has also started providing natively compiled EA builds of JDK 17: https://jdk.java.net/17/


Wow, based on the comments here I decided try out the native M1 builds of Azul.

I see a 4.1x speedup in compiling (including tests), excluding dependency resolution:

  Azul Native M1 java JDK11: 
   ./sbt "clean;compile;test:compile"  
     254.86s user 
     11.84s system 
     519% cpu 
     51.375 total
  Azul java JDK11: 
    ./sbt "clean;compile;test:compile"  
     490.04s user 
     59.48s system 
     269% cpu 
     3:23.81 total


That compiler most probably not native to m1 arm processor.


Indeed, I was using the GraalVM jdk11 build which wasn’t available in a native version indeed.


> when compiling Java/sbt projects

Are you comparing a binary being run under dynamic binary translation with a native binary?

Not really an honest comparison, if that's the case.


No idea if that's the case, but I wouldn't have expected Java of all things to be run under binary translation.


> I wouldn't have expected Java of all things to be run under binary translation

Why? The Java community has only just been working on ARM64 support at all over the last few years, and it's still a little limited, and macOS ARM64 support is only out a couple of weeks and only from non-Oracle third-party builders I believe.


You have to install a Java VM compiled for ARM such as the one made by Azul. If you just get the openjdk from the main website it is compiled for Intel and will be much slower.


> I wouldn't have expected Java of all things to be run under binary translation

Depends which version you have installed. It's a taken a while for the native versions to reach the mainstream channels, so unless you've specifically installed an M1 build you probably have an x86 build being run under translation.


Native support in the regular builds will arrive in september

https://openjdk.java.net/projects/jdk/17/


The author’s analysis sounds less like an explanation of how the M1 is faster, and more like an explanation of how it gets such amazing battery life.

If someone could figure out a way to get all MacOS apps —including the system — to use the performance cores, perhaps the battery life would be back down to Intel levels?


>If someone could figure out a way to get all MacOS apps —including the system — to use the performance cores, perhaps the battery life would be back down to Intel levels?

Other than to prove an utterly useless point - why would you want to even remotely do that?!?

If my time machine backup takes four times as long but my battery lasts longer still why would I care? The overall experience is a HUGE net improvement.

That's the point being glossed over by the majority of commenters and the point the original author is making - benchmarks are interesting, but nothing beats real world experience and in real world experience there are a suite of factors contributing to the M1 and Apples SOC approach spanking the crap out of their competitors.

There is more to life than rAw PoWeR ;)


curiosity? proving the thesis? knowledge that could improve competing laptops?

yah never mind I don't care why these computers are impressive let’s just play macos chess and browse facebook jfc if you’re not interested why did you even join this conversation


I don't care? "That's the point being glossed over by the majority of commenters and the point the original author is making - benchmarks are interesting, but nothing beats real world experience and in real world experience there are a suite of factors contributing to the M1 and Apples SOC approach spanking the crap out of their competitors."

I don't know what point would be proved by maxing out the cores other than confirming what seems to be pretty obvious - a hybrid approach has multiple benefits - not just in power efficiency but the user experience as well.

It's not just one aspect of the design choices - but all of them in concert.


I'm sure someone has run the battery dead using benchmarks that exercise the performance cores - I suspect that Apple designed a fast chip and an efficient chip and then married them together - that an M1Max with only performance cores wouldn't be that interesting power-budget-wise (though likely still an improvement).


Anecdotes of very heavy users suggest 4-6 hours if CPU use remains high.


Sounds like the consensus here is that the M1 feels faster, and is actually faster, for a lot of tasks. So, for a lot of engineers, the M1 is the better choice over Intels. (Faster compile times are a good thing for devs, among other things)

In that regard, the moment feels similar to the mid 2000s, when you suddenly saw a huge uptick in Macbooks at technical conferences: software developers seemed to gravitate in droves to the Mac platform (over windows or linux). Over the last 5+ years, that's cooled a bit. But I wonder if the M1 will solidify or grow Apple's (perceived) dominance among developers?


I switched away from Macs to Thinkpads when the butterfly keyboards came out. Just happily switched back to Macs and loving my M1 Air.


It’s amazing how much of a dumpster fire the butterfly keyboard was, but perhaps more bewilderingly how long they persisted with it. That keyboard might well have done more than any other product design decision in the last decade to push people away from the Mac platform.


If the M1 MBA (or early 2020 Intel MBA) didn't exist, when my 2013 MBP started dying last month I would have dropped Apple for an Ubuntu XPS.

I don't understand how a company of that size kept up a blunder of that magnitude for half a decade, but a half-broken keyboard with an entire missing row of keys was just a nonstarter.


I think it did, and the touch bar was a close second. Thankfully, whoever was in charge of those decisions seems to have lost decision-making power and Apple has walked back both.


> whoever was in charge of those decisions seems to have lost decision-making power

I guess we're both speculating here, but even if the same people were in charge, I think they breathed a sigh of relief when they realized that the personal computing segment had spiced up again. From about 2014 to 2019, Mac revenue basically plateaued in line with the entire laptop market. People were crazy about phones but laptops had hit a wall.

When you have to sell laptops to people whose yesteryear laptops do everything they need, you start adding random bullshit to the product because you have to capture the market's attention somehow. I think this is how we ended up with the touch bar. It's a step backward, but it's flashy and made the product look fresh(er) despite the form factor being identical to what they were selling in 2013.


the touch bar still ships with every current-gen MBP though; the Air never had one


It's going away though


That really confused me until I discovered that Macs had something called a "butterfly keyboard" as well. At first I thought you switched to Thinkpads in the mid 90s:

https://en.wikipedia.org/wiki/IBM_ThinkPad_701


> So, for a lot of engineers, the M1 is the better choice over Intels.

I don't know which engineers you hang out with, but the only "engineer" I know who uses an M1 Mac does web design. Besides that, the M1 doesn't support most of the software that most engineers are using (unless you're working in a field already focused on ARM), and the fragility of a Macbook isn't really suited for a workshop environment. Among them, Thinkpads still reign supreme (though one guy has a Hackintosh, so maybe you were right all along?)

> But I wonder if the M1 will solidify or grow Apple's (perceived) dominance among developers?

Developers are going to be extremely split on this one. ARM is still in it's infancy right now, especially for desktop applications, so unless your business exclusively relies on MacOS customers, there's not much of a value proposition in compiling a special version of your program for 7% of your users. Because of that, MacOS on ARM has one of the worst package databases in recent memory. Besides, there's a lightning round full of other reasons why MacOS is a pretty poor fit for high-performance computing:

- BSD-styled memory management causes frequent page faults and thrashes memory/swap

- Quartz and other necessary kernel processes consume inordinate amounts of compute

- MacPorts and Brew are both pretty miserable package managers compared to the industry standard options.

- Macs have notoriously terrible hardware compatability

- Abstracting execution makes it harder to package software, harder to run it, and harder to debug it when something goes wrong

...and many more!

In other words, I doubt the M1 will do much to corroborate Apple's perceived superiority among the tech-y crowd. Unless people were really that worried about Twitter pulling up a few dozen milliseconds faster, I fail to see how it's any better of an "engineering" laptop than it's alternatives. If anything, it's GPU is significantly weaker than most other laptops on the market.


This is a lot of really well expressed information that completely fails to grapple with the fact that most developers write code for the web and apple computers are extremely popular among web developers.

All of the things you mentioned are also true of intel macs, which again are wildly popular among web devs of all kinds.

If you can't explain the popularity of those machines in spite of those limitations, I don't see why I should accept those as reasons why apple arm computers won't be popular.


And developers will often be willing to take a grab at a new thing - and even if you're not doing web dev lots of your code likely runs on build farms anyway - the chance to play around with a workable ARM machine might be attractive.


> This is a lot of really well expressed information

Well, no, it's all totally made up.


I was trying to be nice.


I dewelop machine learning data pipelines. My dev environment of choice is a Macbook running JetBrains suite, Alfred, BetterTouchTool, and an iTerm pane or 3 with Mosh and Tmux running on an Ubuntu Server. HID is a tricked out Ergodox EZ, right hand mouse, left hand Magic Trackpad. Prior to that I developed directly on Ubuntu 18/20, and windows well before that. The experience unparalleled. The OS/windowed desktop is smoother and less buggy than Gnome 3, more responsive than Win 7 or 10.

I'm talking a 16GB Intel i7 8-thread Macbook vs a 32 thread twin-socket 72GB RTX2080 beast running Ubuntu 20. The Mac crushes it in terms of feel, fit and finish. I haven't tried M1 yet but I bet it'll one-up my current Intel macbook. I'm quite eager to get one.

> Besides that, the M1 doesn't support most of the software that most engineers are using...

??? Other than CUDA, the macbook meets my needs 95% of the time. I'm mostly want for a native Docker networking experience a small minority of the time. I need a responsive GUI head to act as my development environment. All the heavy lift compute is done on servers/on-prem/in-cloud.

> - BSD-styled memory management causes frequent page faults and thrashes memory/swap

Only under super heavy memory demand. I close some browser tabs or IDE panes (I normally have dozens of both).

> - Abstracting execution makes it harder to package software, harder to run it, and harder to debug it when something goes wrong

Almost everything I do is containerized anyways, so this is moot.

I was squarely one of those "why would anyone use mac? It's overpriced, lock in, $typical_nerd_complaints_about_mac" until COVID happened and it became my daily driver. Now I can't go back.

> - MacPorts and Brew are both pretty miserable package managers compared to the industry standard options.

No snark intended - Like what? I'm not exactly blown away by Brew, but it's been generally on par with Apt(get). Aptitude is marginally better. There's not a single package manager that doesn't aggravate me in some way.


This is pretty disconnected from reality.

> BSD-styled memory management causes frequent page faults and thrashes memory/swap

You're basically pulling this out of nowhere. Not once in six years of using MacOS has this ever happened to me.

> Quartz and other necessary kernel processes consume inordinate amounts of compute

Yes, because Windows is so much better. This is sarcasm. Just pull `services.msc` and take a look at everything that's running.

> MacPorts and Brew are both pretty miserable package managers compared to the industry standard options.

In your opinion, what is industry standard? `apt-get` and `yum`? I have yet to come across a better package manager than Brew. Brew just works. Additionally, most binaries that are installed in Brew don't require elevation. Which is fantastic because almost every program installation requires elevation in Windows.

> Macs have notoriously terrible hardware compatability

Hardware compatibility in what sense? As in, plug-and-play devices on a MacOS powered machine?

I'd argue the inverse; I often just plug in devices to my MacBook without having to install a single driver. Imagine my shock years ago when I plugged in a printer to my MacBook and I was able to immediately start printing without installing a single driver. Same with webcams, mice, etc.

Do you mean hardware compatibility in terms of build targets? I think here you might be correct, but even then you can compile for different operating systems from within MacOS... so again, I'm not entirely sure what you mean here.

I guess if you're talking about legacy devices where the hardware manufacturer hasn't bothered to create drivers for anything other than Windows, then your point might be valid, but how often does this happen...?

> Abstracting execution makes it harder to package software, harder to run it, and harder to debug it when something goes wrong

...more disingenuous statements. What do you mean by this? Under the hood, MacOS is Unix. Everything that runs on a MacOS machine is a process. You can attach to processes just as you would on a Windows machine. Similarly, if you have the debugging information for a binary you can inspect the running code as well.

MacOS is not a perfect operating system; for one, I do wish that it was better for game development. But I'm really struggling to understand your points here. Every single one is either not applicable or just straight up wrong.


> - BSD-styled memory management causes frequent page faults and thrashes memory/swap

macOS doesn't use the BSD VM.

> If anything, it's GPU is significantly weaker than most other laptops on the market.

You meant to say "stronger" at the task of not burning your lap.


My company seems intent on switching to M1 as soon as we can. I can't see it being very long until it's possible to develop and run our backend services locally (Go, Docker, IntelliJ, maybe a couple other small things).


I'm a VP of a small software shop focused on IoT. We do embedded, cloud, mobile, web and ML. My entire department is 100% mac.


This certainly makes a lot of sense, and it's a brilliant strategy to improve overall performance. But it doesn't account for the full picture either. My M1 Macbook Pro feels about as responsive as my desktop system with an AMD Ryzen 3900X, which is impressive to say the least. It doesn't quite have the raw computing power of the 3900X, but given that it's a portable device that's something else.


Well, they did it first in iOS which is more responsive than Android even when compared to devices that win benchmarks vs the iPhone.

It's just doable when you care enough to optimize the OS for responsiveness.


Much like the M1, I don't think anything actually wins benchmarks vs the iPhone. There's display refresh rate and a few multicore stores that are closer but in any real world test, Single core dominance wins out. (Plus the big/little design in the A series chips is better than the three scales seen in QC chips as QC backfills the single large core with a ton of little cores to make the multicore numbers look competitive when in reality, any multithreaded task of two cores is much faster and you don't end up with mismatched race conditions.)


Are you sure? iOS has been more responsive from day 1 (I mean the first 2-3 iphone versions), when they were using an off the shelf CPU that everyone else had.


I think the commenter is saying that iOS provides no discernible performance advancements over Android. If you ran Android on an M1, it likely would score just as well as an iOS/MacOS machine.


It would be an interesting comparison. Non-apple silicon has definitely not been great and lags behind significantly. Would be very interesting to see what Android would be like on more modern CPU tech.


Non-Apple has been perfectly fine. I mean, AMD was putting higher-performing APUs in laptops half the price of the M1 Macbook Air, and they did that 18 months before the M1 even hit the market. If you wanted an 8 core, 4ghz CPU with the GPU power of a 1050ti, the R74800u was prevalent in notebooks as cheap as $400, and was even lauded for a short period as one of the best budget gaming options on the laptop market.

If you're exclusively referring to the handheld market though, then you're mostly right. Apple's big lead in this generation is mostly because of how fast they were able to buy out the world's 5nm node. The only real "Android" manufacturer doing the same thing is Samsung, which is only because they have the means to fab their own chips.


> AMD was putting higher-performing APUs in laptops half the price of the M1 Macbook Air, and they did that 18 months before the M1 even hit the market

APUs with Zen and Zen 2 cores were absolutely not outperforming the M1's Firestorm cores.

>In SPECint2006, we’re now seeing the M1 close the gap to AMD’s Zen3, beating it in several workloads now, which increasing the gap to Intel’s new Tiger Lake design as well as their top-performing desktop CPU, which the M1 now beats in the majority of workloads.

In the fp2006 workloads, we’re seeing the M1 post very large performance boosts relative to the A14, meaning that it now is able to claim the best performance out of all CPUs being compared here.

https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...


How does this surprise anyone at all? You're comparing a 5nm-based core to a 7nm-based core, of course there will be a direct disparity in performance. If anything, I'm surprised they weren't able to eke more performance out of it. The GPU is frankly pathetic, and the memory controller and IO subsystems are both obviously gimped too. As a matter of fact, the current M1 Mac Mini quite literally could not connect to all of my hardware with all the dongles in the world. There's just not enough bandwidth to run it all. Even if I splurged for the "Pro" device, it will still have the same bottlenecks as the consumer product.

You can't spend a decade pretending that performance doesn't matter, only to rear your head again when Moore's law is sputtering out and gasping for air. Apple's power over the industry is unchecked right now, meaning they can play all sorts of petty tricks to gaslight mainstream manufacturers into a playing different game. ARM is the first step in that direction, and I for one couldn't be more happy with the hardware I own. Well, maybe if I sold my Macbook Air...


> How does this surprise anyone at all? You're comparing a 5nm-based core to a 7nm-based core, of course there will be a direct disparity in performance

Given that Apple took the power/heat drop with the transition from 7nm to 5nm instead of a performance increase, that's just not as relevant as it could have been if they had gone the other way.

However, that M1 is still running with a significant frequency and power draw deficit vs the Ryzen as well.

>During average single-threaded workloads on the 3.2GHz Firestorm cores, such as GCC code compilation, we’re seeing device power go up to 10.5W with active power at around 6.3W.

https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

However, you compete with the performance you have today, not some theoretical performance you might have if your chip was made differently.


> Given that Apple took the power/heat drop with the transition from 7nm to 5nm instead of a performance increase, that's just not as relevant as it could have been if they had gone the other way.

I still don't understand your argument here though. It's impressive because Apple was able to get first dibs on next generation technology? Should I be applauding their engineers or fearing their supply chain managers?

> However, you compete with the performance you have today, not some theoretical performance you might have if your chip was made differently.

Sure, but there's not much fun in bragging about how your V8 demolished a 4-stoke. Plus, as the market for silicon fabs continues to heat up, Apple's lead on logistics isn't going to be so noticeable. They've also painted themselves into a bit of a scalar issue, too: the M1 occupies a bit of an ARM sweet spot right now. As far as I can tell, Apple's only choice for making a more powerful machine is to use better silicon: which won't be available until 4nm/3nm starts to hit production in Q32022, at the earliest.


>I still don't understand your argument here though. It's impressive because Apple was able to get first dibs on next generation technology?

The M1's performance lead wasn't created by a move from TSMC 7nm to TSMC 5nm.

See, here: https://news.ycombinator.com/item?id=27184236

The Firestorm cores in the M1 in the M1 get their performance from (among other things) having a very wide core that performs more instructions in a single clock cycle, despite running the cores much more slowly than AMD does.

There's a good break down of that and other design advantages here:

https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...


Yes, it's a bummer that Apple has a deathgrip on 5nm and after that 3nm.

Zen2 would be incredible on 5nm. IMHO the improvements we saw with geforce 30XX series and zen 2 were 80% the switch to 7nm tsmc. I'd fully expect another jump if nvidia and amd were able to get their hands on 5nm, but it's apple exclusive for now :(.

Well, at least we get huge investments in chip manufacturing thanks to apple throwing money at tsmc.


Is that AMD example running at similar wattage?


It's pretty close. It's lowest TDP (before tinkering) rests at around 8-10w, and can get to 20-25w if you're running at full load (upper bounds normally come into play if the GPU is also running full blast.

So no, it's the the same wattage, but it's similar enough to be a valid point of comparison, especially for a chip that appears in laptops half the price of the base Macbook Air. Plus, as time goes on, AMD seems to be getting better about power management. If you take a look at the latest reviews for the Surface Laptop 4 with Ryzen, many reviewers were surprised to find that it lasted nearly 17 hours of video playback.


Try a low latency kernel on your desktop if you use Linux, that should improve responsiveness some more. Apparently the increase in power consumption is minimal - I use (K)Ubuntu's low-latency kernel even on laptops these days.


You can also do some "nice"-level management. My XMonad environment is simpler than most, so it's easier for me than someone running a full desktop environment with dozens of processes running around, but I got a lot of responsiveness improvements by nice-ing and ionice-ing my browser down in priority. It does mean that occasionally I switch back to it and it hiccups, but on the flip side it has stopped consuming so many resources and everything else feels much more responsive.

I'm actually not sure why it has the effect it does, entirely, because at least in terms of raw resource use the browser doesn't seem to be eating that much stuff, but the effect in practice has been well beyond what can be explained by perceptual placebo. (Maybe there's too many things competing to get the processor at the VBlank interval or something? Numerically, long before the CPUs are at 100% they act contended, even in my rather simple setup here.)

Or, to perhaps put it another way, Linux already has this sort of prioritization built in, and it works (though it's obviously not identical since we don't have split cores like that), but it seems underutilized. It's split into at least two parts, the CPU nice and the IO nice. CPU nice-ing can happen somewhat automatically with some heuristics on a process over time, but doing some management of ionice can help too, and in my experience, ionice is very effective. You can do something like a full-text index on the "idle" nice level and the rest of the system acts like it's hardly even happening, even on a spinning-rust hard drive (which is pretty impressive).


I feel it's kind of sad this doesn't happen automatically for distros specially geared to be installed on workstations. You sacrifice a tiny bit of throughput for making the computer continually smooth even under heavy load.

This, automatically turning on battery optimizations when an internal battery is detected, and automatically applying a 'small speaker' EQ to internal speakers are a few changes that would make Linux feel so much better on a laptop.


I run KUbuntu, how can I enable this?

EDIT: I searched a bit and, although I haven't found instructions anywhere, I did find people saying that the low-latency kernel decreases throughput as a tradeoff to lower latency. Have you found this to be the case?

Also, another source says that the preempt kernel might be better for some workloads: https://itectec.com/ubuntu/ubuntu-choose-a-low-latency-kerne...

Can anyone comment on these?


I have not tried the preempt kernel. I have tried the lowlatency kernel and found that it improved overall performance, not just latency.

Turns out that the cost of NUMA balancing (https://www.kernel.org/doc/Documentation/sysctl/kernel.txt) was outweighing any benefit we might get from it. The problem is that NUMA balancing is implemented by periodically unmapping pages, triggering a page fault if the page is accessed later. This lets the kernel move the page to the NUMA node that is accessing the memory. The page fault is not free, however; it has a measurable cost, and happens even for processes that have never migrated to another core. The lowlatency kernel turns off NUMA balancing by default.

Instead of switching to the lowlatency kernel, we set `kernel.numa_balancing = 0` in `sysctl.conf` and accepted the consequences.


There's an interesting Stack Overflow question/answers: https://askubuntu.com/q/126664.

In general terms, one either just installs the specifically recompiled package (-preempt, -lowlatency etc.), or recompiles their own. The related parameters can't be enabled, because they're chosen as kernel configuration, and compiled in.

I'd take such changes with a big grain of salt, because they're very much subject to perceptive bias.


If there were one simple trick to make your computer feel faster without downsides, defaults would change.


True in general, but there is seriously little downside to using a low-latency kernel. Sometimes it even improves throughput, usually the effect is tiny either way.

I think most people just don't care or don't know that they care about responsiveness.


Basically you install linux-lowlatency and reboot. Grub will prefer it over the generic kernel. Make sure that you also get linux-modules-extra in the lowlatency version or you might boot into a system with missing drivers. Happened on my laptop when I switched to lowlatency.


Oh very cool! I had no idea about this. I'm traveling for awhile but I'll definitely check that out once I'm back and working with my desktop machine again!


Yes - I sold my Ryzen desktop because I no longer feel the need to go to the 'more powerful' machine. The Air is just as fast.


That's true, although I do miss the memory! And realistically, when it comes to heavy compute or multi-tasking the 3900X really shows its power. I'm honestly very interested to see AMD tech at 5nm. I think the next few years are going to be fairly mind-blowing in terms of CPU technology.


This is a pretty smart strategy. I'd say the majority of the time when something is slow on my machine, it's because there's a resource intensive background process (that I don't really need to complete quickly) eating up all my system resources.

It seems like the same strategy would also make sense on Intel processors, although it probably requires at least 4 cores to make sense?


> It seems like the same strategy would also make sense on Intel processors

Intel agrees!

https://en.wikipedia.org/wiki/Alder_Lake_(microprocessor)


You really want those "efficiency" cores in addition to your regular ones, otherwise dividing your processor costs you a lot of throughput.

You also want them to be low-power even when saturated, otherwise you gain responsiveness from the "performance" cores but your "efficiency" cores aren't actually efficient.

It seems at least Linux's x86_energy_perf_policy tool lets you set multiplier ranges and some performance-vs-power values per-core, which means such a setup doesn't seem impossible on current Intel hardware.


You could clock down the background cores and make them run more efficiently even if they're identical.


You can do it on one processor with a process/thread scheduler. This is what makes multitasking possible in general.


This is extraordinarily basic stuff. We knew how to do this kind of multitasking, with priorities, back in the 80s (or earlier). Yet people still don't understand it.

As an example, a while back I ran a multi-threaded process on a shared work server with 24 cores. It used zero I/O and almost no RAM, but had to spend a couple of days with 24 threads to get the result. I ran it inside "chrt -i", which makes Linux only run it when there is absolutely nothing else it could do. I had someone email and complain about how I was hogging the server, because something like 90% of the CPU time was being spent on my process. That's because their processes spent 90% of their time waiting for disc/network. My process had zero impact on theirs, but it took some explaining.


That's a useful command to know, thanks! However, shouldn't `nice` be handling that sort of stuff? Do you know why there are two commands for the same thing (as far as I can see)?


I would love to know why nice doesn't do this, and why top/ps doesn't show something more sensible when you do.


Well yes, but presumably this impacts responsiveness. The whole point of this strategy is to keep cores free for interactive tasks.


Interesting, enjoyable read.

The conclusions seem a bit off though

-Low QoS tasks are limited to certain cores. This doesn't necessitate efficiency cores (though it makes sense if you want power efficiency in a mobile configuration) and they could as easily be performance cores. The core facet is that the OS has core affinity for low priority tasks and quarantines them to a subset of cores. And it has properly configured low priority tasks as such.

-It also has nothing to do with ARM (as the original title surmised). It's all in the operating system and, again, core affinity. Windows can do this on Intel. macOS/iOS has heavily pushed priorities as meaningful and important, so now with the inclusion of efficiency cores they have established their ecosystem to be able to use it widely.


If it has nothing to do with ARM, why doesn't Mac do it at similar speed on Intel?


The focus of the linked article is responsiveness/user interaction, attributing that to efficiency cores. Apple can achieve exactly the same responsiveness benefit on their Intel devices by, as mentioned, limiting low priority tasks to a subset of cores. Say 2 of 6 real cores on a MBP with an i7.

Apple actually pushes low priority tasks to the efficiency cores for power efficiency reasons (per the name), not user responsiveness. So that MBP runs for longer, cooler on a battery because things that can take longer run on more efficient, slower cores. They do the same with the efficiency cores on modern iOS devices.

The "feeling faster" is a side effect. I am talking about that side effect which isn't ARM specific.

And FWIW, Intel is adding efficiency/"little" cores to their upcoming architectures. Again for efficiency reasons.


There's no point in shuffling background processes to a subset of cores if it doesn't provide power-savings.

It sounds like upcoming Intel chips will have the ability to run efficiency-mode on some cores, at which case OS-level code to do shuffling makes sense.


Because there are no Intel Mac with heterogeneous cores. Lakefield is first CPU with LITTLE cores, but not available for mac.


Interesting, the article isn't clear how it's implemented and whether they dynamically adjust the QoS/Core, so I wonder if they solved the inversion problem. Basically you don't want a high-priority process to be dependent on a lower-priority process that's locked on a LITTLE core.

Windows solves this by placing basically everything onto the LITTLE cores, and user interactions serve as a priority bump up to the big cores. The key (which is difficult to implement in practice) is that they pass that priority along to every other called thread in the critical path so that every path necessary to respond to the user can be placed on the big cores. This means that if you are waiting for a response from the computer, every thread necessary to provide that response, including any normally low-priority background services, is able to run on the big cores.

I'd expect all the other OS's do a similar optimization for the big.LITTLE architectures. It seems to be the natural solution if you've worked in that space long enough.


As of recently, macOS does support priority inheritance for a bunch of synchronization primitives [1], ranging from pthread mutexes to Mach IPC. But it's not perfect. For instance, read-write locks currently don't support priority inheritance, though they could. More problematically, when userland processes use any kind of custom synchronization scheme based on condition variables or similar, there's no way for the kernel to know which thread is expected to signal the condition variable and thus should inherit priority. That includes when libraries implement their own mutexes on top of that [2]...

[1] https://opensource.apple.com/source/xnu/xnu-4903.241.1/osfmk...

[2] https://github.com/Amanieu/parking_lot/issues/205


You should look into libdispatch, in my understanding, essentially everyone is expected to use libdispatch, and it may have something special to somehow help with priority inheritance.


Where available, libdispatch uses the kernel primitives that support priority inheritance. However, there is nothing that can help you if you are blocked on a task that macOS cannot discern where it might wake up from (this is why waiting on a semaphore is usually not advised).


On macOS and iOS, when you want to talk to another process, you use libxpc. I believe that libxpc will automatically send a "dispatch token" (? don't remember exact name, you can see it on lists of mach syscalls) which boosts the priority of the process you're calling (I believe it also generates an OS transaction so the process isn't killed for memory reasons). I believe this priority will fan out as you described. Pierre Habouzit implemented this stuff I believe, you can see a lot of it in libdispatch.


Interesting solution to the problem of app and OS developers wasting CPU power at greater rates than Moore's law can deliver it: just shunt all the stuff the user isn't looking at (and therefore doesn't care about the timeliness of) onto a slower processor.


What's impressive is Apple's first "high-end" custom silicon is about equal performance to an AMD Ryzen 7 5600X.

That's not bad!

However the M1 is only about 20-25% of the power consumption of the Ryzen 7 5600X.

That's crazy!

It'll be very interesting to see what the M3 or M4 look like (especially in X or other higher end variants) several years from now. We already have tastes of it from Marvell, Altera, Amazon and others, though it looks as if we will see ARM make huge headway in the data center (it already is) and even on desktops before 2025.


I have a 2020 Intel Macbook from work and a M1 Macbook for personal use.

Granted I put my work laptop under more load (compiling, video chats at work) but it just feels like a normal laptop. I feel it take its time loading large programs (IDEs) and the fan clicks on when I'm doing big compiles.

I use my personal laptop for occasionally coding, playing games, and also video chats. But the M1 feels amazing. It's fast, snappy, with long battery life, and I've had it for several weeks and I never hear the fan. Even playing Magic Arena AND a Parallels VM runs silent. Arena alone turns my wife's Macbook Air into a jet turbine. It makes video chats so much nicer because it's dead silent.

I've run into occasional compatibility issues with the M1. sbt not working was the latest bummer.

So while my two laptops are technically a year apart, they feel like 5 years apart. The M1 laptop lives up to the hype and I'm glad to just have a laptop that's fast, quiet, has great battery, and is reliable.

Edit: More context, I had bought a maxed-out Macbook Air last year hoping it would be my "forever-laptop" but it was just not giving me the speed I wanted, and the noise was just too much. I couldn't play any game or do any real coding in the living room without disrupting my wife enjoying her own games or shows. I'm so glad I traded up to the M1


"I never hear the fan. Even playing Magic Arena AND a Parallels VM runs silent."

That is because there is no fan.


The regular M1 MacBook does have a fan. The M1 Air doesn't, but throttles the CPU to avoid overheating


While we're being technical, there is no M1 MacBook; there is a M1 MacBook Pro and an M1 MacBook Air, as we as an M1 Mac Mini.


You're right, I meant "M1 MacBook Pro", not "M1 MacBook". I cannot seem to edit after a certain period.


In the Macbook Air there isn't. In the MBP there is.


Sorry if there was confusion.

It's an M1 Macbook Pro, so there is a fan, but I've yet to hear it.

My previous laptop was an Intel Macbook Air (2019?).


I have an M1 Macbook Pro for almost a month now as work laptop and I've only heard the fan once so far, while doing a multi-file search/replace in VSCode. For whatever reason that maxed out the performance cores for a few minutes.

Other than that it's been entirely silent and blazingly fast and I had no major issues with it.


FWIW Android (and surely iOS, although I don't know anything about that in particular) have had systems like this for a long time. Aside from all the task scheduler smarts in the kernel itself, there are all kinds of things that can be marked as low priority so they run on the LITTLE (what Apple calls "efficiency") cores, while e.g. a UI rendering thread might go straight to a "big" ("performance") core even if its historical load doesn't show it using much CPU b/w, because it's known (and marked as such by Android platform devs) to be a latency-senstive element of the workload.

Been a few years since I worked on this so I'm a bit fuzzy on the details but it's interesting stuff! Without this kind of tuning, modern flagship devices would run like absolute garbage and have terrible battery life.


While yes, the hardware had been there for a long time, Android hasn't been taking much of an advantage of it and I'm not quite sure it does now.

Google Play likes to do shit in the background. A lot of it. All the time. Even if you have <s>deliberate RCE</s> auto-update disabled in settings, it will still update itself and Google services, silently, in the background, without any way to disable that. And while it's installing anything, let alone something as clumsy as Google services, your device grinds to a halt. It sometimes literally takes 10 seconds to respond to input, it's this bad. It's especially bad when you turn on a device that you haven't used in a long time. So it definitely wasn't scheduling background tasks on low-power cores, despite knowing which tasks are background and which are not (that's what ActivityManager is for). My understanding from looking at logcat was that this was caused by way too many apps having broadcast receivers that get triggered when something gets installed.

Now, in Android 11 (on Pixel 4a), this was partly alleviated by limiting how apps see other apps, so those broadcasts don't trigger much anything, and apps install freakishly fast. That, and maybe they've finally started separating tasks between cores like this. Or maybe they started doing that long ago but only in then-current kernel versions, and my previous phone was stuck with the one it shipped with despite system updates.


These kinds of optimizations often harm benchmarks. Benchmarks are batch jobs, even the ones that try and mimic interactive sessions. Apple has made a latency/throughput optimization that often harms throughput slightly. This is the right call, but benchmarks are the reason why this low hanging fruit isn't being picked on the Intel side.


This should not harm benchmarks unless they mark their threads as background.

As for a particular strategy I presume Apple could, for example, use 6 high-performance cores instead of 4+4 hybrid. But then thermal management will be an issue. So the choice was about throughout/latency/energy efficiency. Using simple low-performance cores for tasks that can wait is very good strategy as one can put more of those cores as opposite to running the performance core under a low frequency.


I was thinking about things like the HZ setting in Linux. Lowering the setting will get you a slight throughput advantage but interactive use will be not as smooth.


"Because Macs with Intel processors can’t segregate their tasks onto different cores in the same way, when macOS starts to choke on something it affects user processes too."

Why is this? I understand that Intel chips don't use BIG/little but couldn't you assign all OS/background tasks to one core and leave the other cores for user tasks? Shouldn't the scheduler be able to do this?


Because they didn't design the OS to do that. You very well could lock a core or two at low freq/power and assign background tasks to that. It's not really a new or exotic idea.

macOS doing this is a side effect of them needing to do so, with the added benefit of it actually making things nicer all around. They could very well do this on intel/amd chips to the same effect.

The bummer is, we really don't know how good the silicon is, because nothing but macOS runs on it natively, so you can't really get an apples to apples comparison.


Probably a fight with CPU frequency ramping. You could put all the tasks on one core, but then you risk the CPU frequency spiking when it shouldn't causing more power to be drawn than necessary.

With slow light power cores, that's less of an issue.


The OS also controls the frequency so it could lock the background core(s) at low frequency.


My 16 inch pro Intel is plenty fast - for the first 45 seconds before thermal throttling kicks in. I have a feeling the M1 is faster because it can actually do work for more than just the briefest bursts


The problem I ran into the other day, had an app crash taking 50% of cpu. I was on the couch for hours on battery. Only noticed when my battery life was at 60% when normally it would be at 85%. My intel machine would have alerted me by burning my legs and trying to create lift with the fans.


Ha, speaking of burning legs, I was doing some video editing in FCPX last night, and I had to keep changing positions trying to optimize for not burning my legs too badly and not blocking the air flow through the vents on the bottom/sides causing even more throttling.


I was sitting cross legged during a long civ session and using my left hand to balance part of my MBP up.

It wasn't until the next day that I discovered a painful red spot on my hand. That was a few months ago. While the spoot is no longer painful, it is still there today. The macbook slowly cooked a portion of my hand.

The thermals on those old machines are just terrible, and I have evidence of it :-)


Bamboo cutting boards make great, cheap lap desks.


The heat and noise was what led me to sell my 16" MBP in favour of a 13" Air. I was working from home the whole time I owned it and my partner could often hear it from a different room.

No idea how anyone works in an office with one of those things without irritating the hell out of their coworkers.


Most likely because everyone already wears headphones to isolate themselves. There is outside traffic noise and aircon to compete with, as well as general drone of conversations. It's only really an issue in environments that are already quiet.


I can assure you the 16-inch Intel MBP is loud enough to draw attention in any open-air office space. Pre-COVID WFH life, I would have to close out my IDE, browser tabs, Photoshop, everything when I had to go into a meeting room because it was embarrassing to hear that laptop whirr its fans. Now I have an M1 for personal use, no fan, heat issues gone. Can't wait to ditch the 16-inch Intel for work, I will accept the smaller screen size.


Apples intel MBP thermals are pretty abysmal wrt noise. My off brand 7th gen i7 sounds about the same as my colleagues similarly species MBP, but never throttles.

There were even people saying they deliberately sabotaged their intel thermals to make the up and coming M1 look better.


Those conspiracy theories were never confirmed. Linus thought the MacBook Air Early 2020 model’s cooling was clearly designed for an M1, but then the M1 launched without a fan at all.


I don't believe the conspiracy, but as a result, bad thermal designs works as great advertisement for M1.


So, like nice[1], except the developer sets the level of importance ahead of time? Makes sense, as long as I can reprioritize if needed.

1. https://en.wikipedia.org/wiki/Nice_(Unix)


It's nice but with additional limitations - on a normal Unix system even if you nice everything to the lowest level possible they'll still take over all cores. This limits things below a certain nice level to the efficiency cores, meaning that the performance cores are always available for foreground action.


So nice + taskset[1]?

1: https://linux.die.net/man/1/taskset


Seems so. Just that it works "automagically" and reserves some cores for certain priority groups. Maybe one can write a script/daemon that `taskset`s all tasks with a certain niceness range (or by additional rules) to a few "background cores" and everything else to the "interactive cores"?

On Linux you might also get some problems with kworker threads landing in the wrong group, but I'm not that much into Kernel internals I've to admit.


These days you'd probably want to use cpuset cgroups so you don't have to worry about individual processes.


Could do it with SystemD AllowedCPUs functionality, but that's not dynamic


Of course the applications (programmers of the applications) need to declare their "nice" levels correctly - they can't be greedy


Nothing has ever stopped Apple from pinning indexing work to specific cores. They just didn't give a shit.


I suspect they did it less for performance and more for energy savings - this prevents the scheduler from moving a background process to a performance core (which a naive one would do if it saw the performance core underutilized).


It's always done this but the effect wasn't visible on x86.


Maybe like the big.little cores of many ARM CPU in Android phone?

I'm not familiar with Android APP, maybe there are similar APIs in Android?


It's the same big.Little concept, slightly tweaked to allow this QoS work, I believe most of it was then used on Android but I doubt that Android uses the APIs well if they exist.


Finally, setting all these QoS priorities on DispatchQueue's is paying off big time.


An universally used, opinionated API with enough flexibility to let developers express their ideas about priority really helps.


Most people set way too many of those. Especially "background", where you're in for a surprise if you hit the actual background performance guarantees, since it may not run for minutes.


I wonder how much of this nice result could be achieved on a homogenous CPU if everyone used an API that expressed priorities on a fine grained task level. Process priority and CPU affinity is already a thing. But I don’t think I’ve ever set the ‘nice’ level for a single thread inside my code.

My point is, the snappiness is a result of decent system-wide adherence to a priority API more than the heterogeneous chip design. The low power consumption of course relies on both.


I'm curious if there is some architectural limit that prevents Apple from doing something similar on Intel Macs, at least ones with 8+ cores. The scheduler could arbitrarily nominate 4 cores as "efficiency" cores and keep them for background tasks, leaving the rest free for "performance".


I think the issue is that thermals, and thus overall efficiency, is optimized on the Intel chips when long-running work is split evenly across the cores.


Without knowing the inner workings of Apple's scheduler: I don't think there is any reasons this wouldn't work on x86 hardware. All you need is some API to tell the scheduler whether a task should be done "asap" or "whenever".

But you then get a huge tradeoff: If you reserve over 50% of the computational power for "asap" stuff, then all the "whenever" stuff will take twice as long as usual (assuming the tasks to be limited by compute power). On a 8C Intel that means that, for a lot of stuff, you're now essentially running on a 4C Intel. Given the price of these chips that MIGHT be a difficult sell, even for Apple. On the M1 they don't seem to care.


>All you need is some API to tell the scheduler whether a task should be done "asap" or "whenever".

We could call it ::SetThreadPriority()


For intel that would cause some serious thermal issues. The Four E cores in the M1 use about the same amount of power as one big core (5w total). In an Intel chip that would mean the four 'E' cores would use probably 10 watts each, that's 40 watts cooking a chip.

Intel does have a chip for you though, they have a version with mixed UlV cores and big cores.


Afaik, that isn't launching until 2021. https://en.wikipedia.org/wiki/Alder_Lake_(microprocessor) Or do you mean another CPU?


Lakefield launched in Q2 2020 with the i5 L16G7 and the i3 L13G4.


Would it be possible to achieve something like this in windows by pinning certain processess to specific cores? I've googled a bit and came accross this article below where it shows how to set the affinnity for a process to specific cores, and it looks like a fun experiment to do this with the system processess to see whether the performance of other applications will improve. :)

It would be a bit of a hassle to figure out which processes would all need to be changed, and whether these changes persist after rebooting and there's probably a lot more that I didn't think off though..

https://www.windowscentral.com/assign-specific-processor-cor...


Does having the actual best single thread performance available on market, has anything to with it? https://www.cpubenchmark.net/singleThread.html


Worth keeping in mind all mac users are going from underpowered Intel processors running in a terrible thermal environment for them too.

My 2018 USB-C MBP honestly feels like trash at times, just feels like every part of it is choking to hold the thing up.


  The Time Machine backup pictured above ran ridiculously slowly, taking over 15 minutes to back up less than 1 GB of files. Had I not been watching it in Activity Monitor, I would have been completely unaware of its poor performance. Because Macs with Intel processors can’t segregate their tasks onto different cores in the same way, when macOS starts to choke on something it affects user processes too.
this is really smart, and makes one wonder why intel didnt have the insight to do something like that already.... is this the result of thier "missing the boat" on smartphone processors (a.k.a giving up xscale)??


Question for the peanut gallery: In a -nix OS system, let's say single-threaded process is running on core 0. It makes a syscall. Does core 0 have to context-switch in order to handle the syscall, or could the kernel code run on core 1? This would allow a process to run longer without context switching and cache dumping, which would improve performance (as well as potentially security).

Corollary question: could one make a kernel-specific core and would there be benefit to it? Handle all the I/O, interrupts and such, all with a dedicated, cloistered root-level cache.


Cross-core interrupts and syscalls are slow. The caches for any memory to be transferred have to be flushed on the calling core and its APIC would have to interrupt the destination core to start executing the syscall, which means entering kernel mode on the calling core to get access to the APIC in the first place.

If the goal is low latency then staying on a single core is almost always better. To effectively use multiple cores requires explicit synchronization such as io_uring which uses a lock-free ring buffer to transfer opcodes and results using shared buffers visible from userspace and the kernel. io_uring has an option to dedicate a kernel thread to servicing a particular ring buffer, and this can also be limited/expanded to a set of cores. I have zero experience with io_uring in practice and so I don't know what a good tradeoff is between servicing a ring buffer from multiple or single cores. The entries are lightweight and so cache coherency probably isn't too expensive and so for a high CPU workload that also needs high throughout allowing other cores to service IO probably makes sense.

I think newer x86_64 chips also allow assigning interrupts from specific hardware to specific cores to effectively run kernel drivers mostly on a subset of cores, or to spread it to all cores under heavy I/O.


"A peanut gallery was, in the days of vaudeville, a nickname for the cheapest and ostensibly rowdiest seats in the theater, the occupants of which were often known to heckle the performers.[1] The least expensive snack served at the theatre would often be peanuts, which the patrons would sometimes throw at the performers on stage to convey their disapproval. Phrases such as "no comments from the peanut gallery" or "quiet in the peanut gallery" are extensions of the name.[1]"

https://en.wikipedia.org/wiki/Peanut_gallery


Posts like these are the gems that I come to HN for, as much as I've used the phrase, I never considered the source of it. I'd imagine I'd be in the cheap seats as well. Thanks!


If the system call is being handled by core 1, what does core 0 do? It must somehow wait for the core 1 system call to return. I'm not sure the waste created by waiting is outweighed by cache benefits.


Simple: It gets rescheduled to another process.


What about priority inversion? If a high QoS thread waits on a low QoS one? It seems like it would be a bigger issue here as the low QoS threads are only ever run on half the cores.


Grand Central Dispatch (the OS-level framework that handles these QoS classes) elevates the priority of a queue if there is high priority work waiting on low priority work to finish.

https://developer.apple.com/library/archive/documentation/Pe...


Importantly it doesn't do this for dispatch_async/dispatch_semaphore/dispatch_block_wait.


I love the fact that Apple prioritizes the human over the machine. I always felt that iPhones and iPads felt faster than desktops. I reckon the QoS strategy is the reason.


This strategy has had a strange negative effect for me a couple of times.

It used to be that I would notice rogue background tasks eating up my processor because I could feel the UI acting slow and the fans would kick on.

Now the only indication I get is my battery depleting more rapidly than expected. The UI is completely responsive still, and there are no fans. But I’m never checking my battery level anymore, so I don’t catch these processes for much longer.


I have Pluto running on a side monitor and sometimes catch the processors going insane when I don't expect it.

https://news.ycombinator.com/item?id=25624718


I wonder if shunting background processes to the slower efficiency cores has a salutary effect on other elements of system load. I’m thinking of tasks like Time Machine, which presumably are throwing out IOPS at a commensurately lower rate when effectively throttled by the efficiency cores.


The other half of the story for the efficiency cores seems to be the battery life. Most of these non-urgent background tasks are put on efficiency cores not only to separate them from my applications, but also to enable this insanely good battery life.


There is also a thermal benefit. Using the high efficiency cores puts a lower load on the cooling system. So even if the computer is plugged in and the performance cores are idle it may be best not to use them and let the cooling system gain some headroom for when the performance cores are needed.


Great point. Overall it seems like a very well designed system.


All this reminds me of when the HP PA machines were first introduced. Loads of posts about how compilation was now so fast that the screen turned red due to time dilation, etc. Of course now nobody remembers PA.


You're talking about the HP's old precision architecture, right?

https://en.wikipedia.org/wiki/PA-RISC

That's an interesting comparison because it's another case where a big player created their own chips for a competitive advantage.

There are surely some lessons in there for Apple silicon, but I don't there there's any particular reason to think it will follow the same path as HP PA. PA started in the '80s. It was aimed at high-end workstations and servers. My impression was that the value proposition was narrow: it was sold to customers who needed speed and had a pretty big budget. Until checking now, I was unaware, but it seems to have carved out a niche and was able to live there for a while, so it wasn't exactly a failure.

Time will tell how apple silicon fares. Today the M1 is a fast, power-efficient, inexpensive chip, with a reasonable (and rapidly improving) software compatibility story, and is available in some nice form factors... so basically a towering home run. But we'll just have to see how Apple sustains it over the long run. Frankly, they can drop a few balls and still have a strong story, but it's hard to project out 5, 10, 20 years and understand where this might end up.

(Personally, I'm almost surely going to get an Apple silicon Mac, as soon as they release the one for non-low-end machines. The risk looks small, the benefits look big.)


The real key for Apple is they already had a huge customer for their new chip - the iPhones and iPads have been using it for years.

This means they do not have to be dependent on Mac sales to support M1 development - they can amortize it over the largest phone manufacturer in the world.

And every time they do a successful architecture switch they make it easier to do it again in the future. If the IntAMD386 becomes the best chip of 2030 they can just start using it.


While I have my doubts whether this is the main reason for the perceived performance, it reminds me of how BeOS was able to also show incredible "apparent" performance on even m68k-level hardware.


Even windows prioritizes programs that have focus over those that don't to increase interactivity.


Yeah cool, it's fast. But why so little RAM? Virtualisation and other professional software must be already a pain to run due to the ARM architecture, so why adding a memory handicap to it?


It would be interesting to see if there are also optimisations to keep the main (UI) thread clear as much as possible. I mean if I just clicked a button then probably that is the most interesting thing for me to happen, so responding to that click should not be blocked by other high priority tasks.

I often have the feeling on my i9 machine it's almost idle but the specific core of my UI is working on something else which I don't particularly care for at the moment I'm interactive with the computer.


I would kill to pin Dropbox to efficiency cores like this. Half the time when my fans are whirring I discover it's just Dropbox frantically trying to sync things.


There are plenty of self-hosted dropbox alternatives that don't do that.

I get that not everyone is a "take matters into their own hands" kinda person, but it's worthwhile to at least look into it and see if it's something worth pursuing for you.


Maestral (https://github.com/SamSchott/maestral) is a pretty light-weight Dropbox client that works well for mac


> These [QOS levels] turn out to be integer values spread evenly between 9 and 33, respectively.

Anyone know why this is? seems fairly arbitrary...why not have 1,2,3,4,5?


They're a bit mask, with the LSB always set. The values are 100001 (33), 011001 (25), 010001 (17), and 001001 (9). And they're probably ORed with other values so multiple parameters can be set with a single word.


Best guess: either (1) it maps to internal values that macOS uses, or (2) they wanted to build in the potential to have more granular values later.


Typo in title. Doesn't parse as a result.


Thankfully my human brain is robust enough to understand by the context.


The only people who think human brains run on strict grammar rules are K-12 language teachers.


"is" should be "it's" probably.


Ok, corrected, sorry.


?! Sorry but I haven’t understood.


How it scheduled is interesting, but I wonder is the feeling really related to the core scheduling.


I'm still assuming it's faster due to unified memory architecture, latest generation lithography and less on tailored behavior for their OS.

An AMD/Intel using same soldered RAM next to CPU and same process node would give Apple a run for its money.

Still, the optimizations on OS side are interesting here.


I thought the "unified memory" meme had been debunked multiple times here.


Memory is unified its just not on die.

CPU and GPU and AI share unified memory which means zero copy transfers between them.


I laughed when they called it “unified memory.” Amazing what some marketing can do. In previous years, that was called “shared graphics memory” and it was only for low end systems.


Shared graphics-memory is not the same as unified memory.

In a shared graphics-memory system, a part of the system RAM is reserved for the GPU. The amount of RAM reported to the OS is the total RAM minus the chunk reserved the GPU uses. The OS only can use it's own part, and cannot access the GPU memory (and vice versa). If you want to make something available to the GPU, it still has to be copied to the reserved GPU part.

In unified memory both the OS and the GPU can access the entire range of memory, no need for a (slow) copy. Consoles use the same strategy (using fast GDDR for both system and GPU), and it's one of the reasons consoles punch above their weight graphically.

The main reason that high-end GPUs use discrete memory is because they use high-bandwidth memory that has to live very close to the GPU. The user-replaceable RAM modules in a typical PC are much too far away from the GPU to use the same kind of high bandwidth memory. If you drop the 'user replaceable' requirement and place everything close together, you can have the benefits of both high bandwidth and unified memory.


> If you drop the 'user replaceable' requirement and place everything close together, you can have the benefits of both high bandwidth and unified memory

Rather, if you drop the "big GPU" requirement then you can place everything close together. So called APUs have been unified memory for years & years now (so more or less Intel's entire lineup, and AMD's entire laptop SKUs & some desktop ones).

It still ends up associated with low-end because there's only so much die space you can spend on an iGPU, and the M1 is no exception there. It doesn't come close to a mid-range discreet GPU and it likely won't ever unless Apple goes chiplets so that multiple dies can share a memory controller.

With "normal" (LP)DDR you still run into severe memory bottlenecks on the GPU side as things get faster, so that becomes another issue with unified memory. Do you sacrifice CPU performance to feed the GPU by using higher-latency GDDR? Or do you sacrifice GPU performance with lower-bandwidth DDR?


From the Anand article you linked further up:

"The first Apple-built GPU for a Mac is significantly faster than any integrated GPU we’ve been able to get our hands on, and will no doubt set a new high bar for GPU performance in a laptop. Based on Apple’s own die shots, it’s clear that they spent a sizable portion of the M1’s die on the GPU and associated hardware, and the payoff is a GPU that can rival even low-end discrete GPUs."

This is their first low end offering, they seem to be taking full advantage of UMA more so than anyone to this point. It will be interesting to see if they continue this with a higher "pro" offering or stick with a discrete CPU to stay competitive.

My guess is Apple will be the one to make UMA integrated graphics rival discrete GPU's, it will be interesting to see if that happens.


None of those games were designed for UMA nor do they benefit as the API design for graphics forces copies to happen anyway.

The M1's GPU is good (for integrated), but UMA isn't the reason why. I don't know why people seem so determined to reduce Apple's substantial engineering in this space to a mostly inconsequential minor architecture tweak that happen a decade ago.


From the article: "Meanwhile, unlike the CPU side of this transition to Apple Silicon, the higher-level nature of graphics programming means that Apple isn’t nearly as reliant on devs to immediately prepare universal applications to take advantage of Apple’s GPU. To be sure, native CPU code is still going to produce better results since a workload that’s purely GPU-limited is almost unheard of, but the fact that existing Metal (and even OpenGL) code can be run on top of Apple’s GPU today means that it immediately benefits all games and other GPU-bound workloads."

https://developer.apple.com/documentation/metal/setting_reso...

https://developer.apple.com/documentation/metal/synchronizin...

"Note In a discrete memory model, synchronization speed is constrained by PCIe bandwidth. In a unified memory model, Metal may ignore synchronization calls completely because it only creates a single memory allocation for the resource. For more information about macOS memory models and managed resources, see Choosing a Resource Storage Mode in macOS."

I am not trying to minimize the other engineering improvements, however I do believe there may be less credit being given to the UMA than deserved due to past lackluster UMA offerings. As I said it will be interesting to see how far Apple can scale UMA I am not sure they can catch discrete graphics but I am starting to think they are going to try.


To leverage shared memory in Metal you have to target Metal. Otherwise take for example glTexImage2D: https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/gl...

Apple can't just hang onto that void* that's passed in as the developer is free to re-use for something else after the call. It must copy, even on a UMA system. And even if it was adjusted such that glTexImage2D took ownership of the pointer instead, there'd still be an internal copy anyway to swizzle it as linear RGBA buffers are not friendly to typical GPU workloads. This is why for example per Apple's docs above when it gets to the texture section it's like "yeah just copy & use private." So even though in theory Metal's UMA exposure would be great for games that stream textures, it still isn't because you still do a copy anyway to convert it to the GPU's internal optimal layout.

Similarly the benefits of UMA only help if transfering data is actually a significant part of the workload, which is not true for the vast majority of games. For things like gfxbench it may help speedup the load time, but during the benchmark loop all the big objects are only used on the GPU (like textures & models)


I believe most of the benchmarks where Metal based in the Anand article, also PBO have been around for quiet a while in OpenGL:

https://developer.apple.com/library/archive/documentation/Gr...

Any back and forth between CPU and GPU will be faster with unified memory especially with a coherent on die cache.

This is the same model from iOS so just about anyone doing metal will already be optimizing for it same with any other mobile development.

It doesn't seem like a minor architectural difference to me:

"Comparing the two GPU architectures, TBDR has the following advantages:

It drastically saves on memory bandwidth because of the unified memory architecture. Blending happens in-register facilitated by tile processing. Color, depth and stencil buffers don’t need to be re-fetched."

https://metalkit.org/2020/07/03/wwdc20-whats-new-in-metal.ht...


> I believe most of the benchmarks where Metal based in the Anand article

But that doesn't tell you anything. Being Metal-based doesn't mean they were designed nor benefit from UMA.

Especially since, again, Apple's own recommendation on big data (read: textures) is to copy it.

> Any back and forth between CPU and GPU will be faster with unified memory especially with a coherent on die cache.

Yes, but games & gfxbench don't do this which is what I keep trying to get across. There are workloads out there that will benefit from this, but the games & benchmarks that were run & being discussed aren't them. It's like claiming the sunspider results are from wifi 6 improvements. There are web experiences that will benefit from faster wifi, but sunspider ain't one of them.

Things like GPGPU compute can benefit tremendously here, for example.

> also PBO have been around for quiet a while in OpenGL:

PBO's reduce the number of copies from 2 to 1 in some cases, not from 1 to 0. You still copy from the PBO to your texture target, but it can potentially avoid a CPU to CPU copy first. When you call glTexImage2D it doesn't necessarily do the transfer right then, it instead may copy to a different CPU buffer to later be copied to the GPU.

> "Comparing the two GPU architectures, TBDR has the following advantages:

> It drastically saves on memory bandwidth because of the unified memory architecture. Blending happens in-register facilitated by tile processing. Color, depth and stencil buffers don’t need to be re-fetched."

> https://metalkit.org/2020/07/03/wwdc20-whats-new-in-metal.ht...

Uh, that blogger seems rather confused. TBDR has nothing to do with UMA, nor is Nvidia or AMD immediate mode anymore.

Heck, Mali was doing TBDR long before it was ever used on a UMA SoC.


First API's don't support it, can't pin memory (which is what a PBO does). Then oh well they are not taking advantage of it. Move the goal post much?

TBDR came to prominence in UMA mobile architectures, it's a big part of what allows it to perform so well with limited memory bandwidth. The M1 is just an evolution of Apples mobile designs and PowerVR before that.

Mali GPU's are UMA and alway have been AFAIK

https://community.arm.com/developer/tools-software/graphics/...


> First API's don't support it, can't pin memory (which is what a PBO does). Then oh well they are not taking advantage of it. Move the goal post much?

No, they don't, so no, I didn't move the goal posts at all. PBOs are a transfer object. You cannot sample from them on the GPU. The only thing you can do with PBOs is copy them to something you can use on the GPU.

As such, PBOs do not let you take advantage of UMA. In fact, their primary benefit is for non-UMA in the first place. UMA systems have no issues blocking glTexImage2D until the copy to GPU memory is done, but non-UMA ones do. And non-UMA ones are what gave us PBOs.

> TBDR came to prominence in UMA mobile architectures, it's a big part of what allows it to perform so well with limited memory bandwidth.

Support that with a theory or evidence of literally any kind. There's nothing at all in TBDR's sequence of events that has any apparent benefit from UMA.

Here: https://developer.arm.com/solutions/graphics-and-gaming/deve...

Look at that the sequence of steps. ARM doesn't even bother including a CPU in there, so which step would UMA be helping with?

What UMA can do here is improve the power efficiency by reducing the cost of sending the command buffers to the GPU, but that's not going to get you a performance improvement as those command buffers are not very big. If sending data from the CPU to GPU was such a severe bottleneck then you'd see the impact of things like reducing the PCI-E bandwidth on discreet GPUs, but you don't.


The modern approach to textures is to precompile them, so you can hand the data straight over. It's not as common to have to convert a linear to swizzled texture, though it can happen.

Also, the Apple advice for OpenGL textures was always focused on avoiding unnecessary copies. (for instance, there's another one that could happen CPU side if your data wasn't aligned enough to get DMA'd)

One reason M1 textures use less memory is the prior systems had AMD/Intel graphic switching and so you needed to keep another copy of everything in case you switched GPUs.


As SigmundA points out a huge advantage Apple has is control of the APIs (Metal, etc) and the ability to structure them years ago so that the API can simply skip entire things (even when ordered to do them) as it's known it's not needed. An analogy would be a copy-on-write filesystem (or RAM!) that doesn't actually do a copy when asked to, it returns immediately with a pointer, and only copies if asked to write to it.


Yeah I believe the M1 GPU(2.6 TFLOPS) falls between a PS4 (1.8 TFLOPS) and PS4Pro (4.2 TFLOPS). Yes the original PS4 came out in 2013, but still I find it impressive that a mobile integrated GPU has that much computational Power with no fan and that power budget.

I do wonder what they are going to do with the higher end MBP, iMacs, and Mac Pro (if they make one). Will they have an “M1X” with more GPU cores or will they offer a discrete option with AMD GPUs. I do think we could potentially see an answer at WWDC. I wouldn’t be surprised if eGPU support was announced for ARM Macs at WWDC.


Part of the issue appears to be the number of PCI lanes the M1 can support (which is why it's limited to two monitors).

I'm not sure they'll improve that by making a M1X or just gluing multiple M1s together.


> If you want to make something available to the GPU, it still has to be copied to the reserved GPU part.

Or allocate on the GPU side and get direct access to it from the CPU, achieving zero-copy:

https://software.intel.com/content/www/us/en/develop/article...


It's basically the same thing. It's the same address space. If you want to get technical about it, the Amiga had "unified memory" in 1985.


Yes the Amiga had a form of UMA as did many other systems, the term UMA seems more widely used than "shared memory" its definite not just a marketing term.

I don't believe Apple claimed to invent unified memory only that they are taking maximum advantage of the architecture more so than anyone to this point.

Federighi:

"We not only got the great advantage of just the raw performance of our GPU, but just as important was the fact that with the unified memory architecture, we weren't moving data constantly back and forth and changing formats that slowed it down. And we got a huge increase in performance."

This seems to be talking about the 16mb SLC on die cache that CPU,GPU and other IP cores share:

"Where old-school GPUs would basically operate on the entire frame at once, we operate on tiles that we can move into extremely fast on-chip memory, and then perform a huge sequence of operations with all the different execution units on that tile. It's incredibly bandwidth-efficient in a way that these discrete GPUs are not. And then you just combine that with the massive width of our pipeline to RAM and the other efficiencies of the chip, and it’s a better architecture."


Its not a new term, copying data between cpu and gpu memory has always been expensive:

https://docs.microsoft.com/en-us/windows/win32/direct3d11/un...

https://patents.justia.com/patent/9373182


Well, any iGPU systems. Interestingly, game consoles too, where GDDR is the only memory, so it's kinda "inverted" in a sense from the laptop setup.


As far as I understand, it has a lot to do with the actual design of the processor[1], and not so much to do with the on-chip memory or the software integration.

[1]: https://news.ycombinator.com/item?id=25257932


> I'm still assuming it's faster due to unified memory architecture

AMD, Intel, ARM, & Qualcomm have all been shipping unified memory for 5+ years. I'd assume all the A* SoCs have been unified memory for that matter too unless Apple made the weirdest of cost cuts.

Moreover literally none of the benchmarks out there include anything at all that involves copying/moving data between the CPU, GPU, and AI units. They are almost always strictly-CPU benchmarks (which the M1 does great in), or strictly-GPU benchmarks (where the M1 is good for integrated but that's about it)

> An AMD/Intel using same soldered RAM next to CPU and same process node would give Apple a run for its money.

AMD's memory latency is already better than the M1's. Apple's soldered RAM isn't a performance choice:

"In terms of memory latency, we’re seeing a (rather expected) reduction compared to the A14, measuring 96ns at 128MB full random test depth, compared to 102ns on the A14." source: https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

"In the DRAM region, we’re measuring 78.8ns on the 5950X versus 86.0ns on the 3950X." https://www.anandtech.com/show/16214/amd-zen-3-ryzen-deep-di...


> AMD's memory latency is already better than the M1's. Apple's soldered RAM isn't a performance choice:

Careful what you are comparing, in your examples the other CPU is also faster.

3950x is a desktop CPU and is faster then M1 -> https://gadgetversus.com/processor/apple-m1-vs-amd-ryzen-9-3...

5950x is even faster -> https://gadgetversus.com/processor/apple-m1-vs-amd-ryzen-9-5...

Lower latencies are likely due to higher clock.

For equivalent laptop specific CPU, you will get a speedup from on-package RAM vs user replaceable RAM placed further away, even desktops would benefit but it would not be a welcome change there.


> Lower latencies are likely due to higher clock.

That's not really how dram latency works. In basically all CPUs the memory controller runs at a different clock than the CPU cores do, typically at the same clock as the DRAM itself but not always.

If you meant the dram was running faster on the AMD system then also no. The M1 is using 4266mhz modules while the AMD system was running 3200mhz ram

> For equivalent laptop specific CPU, you will get a speedup from on-package RAM vs user replaceable RAM placed further away, even desktops would benefit but it would not be a welcome change there.

Huge citation needed. There's currently no real world product that matches that claim nor a theoretical one as the physical trace length is minimal latency difference and far from the major factor.


Will M1 silicon make the jump to iPhone/iPad in the future?


The latest iPad Pro does use M1.

It's more accurate to say that M1 made the jump from the A14 chips already in iPhones, as it's an enhanced variation of that design.


Zoom! Zoom! Look at my first world computation speeds!


It's much faster even if you max out all cores. This may be part of it but the chip is objectively faster.

Lots of Intel damage control these days. AMD is kicking their butt from one side and ARM from the other.


Docker doesn't feel "faster than Intel" on Mac, does it? And it certainly lags behind on features.


I was wondering how macOS scheduled the efficiency cores but was too lazy to dig into the docs. Thanks!


Launching apps takes ages on M1 mini. Reminded me of the good Windows Phone times.


Strange. With the exception of electron apps (which are slow everywhere), I've found my M1 mini to be extremely responsive. Much moreso than my 16" mpb.


Cold start of Electron, MS office, Affinity (all ARM versions) takes me north of 15".

8GB M1 Mini, so maybe that makes the situation worse.


Affinity Photo starts in 15 seconds on my M1, which IMO is not okay. It's just as slow as on my Haswell Intel Mac. Everything else launches in a second or so. Spotify (Electron) draws a window in three seconds, but takes another four to display all content.


For comparison, I have a maxed out current gen 16" Intel MBP and Affinity takes 25 seconds to launch.


Not sure what's causing the difference but it launches in 4 seconds for me. I have the base model 16" MBP.


I have the 8GB M1 as my daily driver, the only apps that take a long time for me (but NOWHERE NEARE 15 seconds - maybe close to 5, but I'm not counting) are visual studio Mac (which is probably rosetta), Unity (which IS rosetta) and blender (also rosetta).


VS Code launches instantly on my i7-4770 under Linux with XFCE. Same under Windows.

So do Slack and Spotify...

That's an 8 year old CPU! If you're having issues, the problem lies elsewhere not with Electron.


I don't know why this was downvoted. You see what you see. Maybe my experience with electron is tainted :D

I don't have any measurements, and I'm not really going to take any. It's just always felt slow. So I'll admit I should've perhaps been less prescriptive.


I think half the people in this thread are posting what they want to see and don't reflect reality.


That's a bit disingenuous to compare blazingly fast M1 to Windows Phones. What are you launching?

It's been known M1 is super fast launching apps compared to Intel macbooks - https://www.youtube.com/watch?v=vKRDlkyILNY

There's tons of video comparisons on YouTube.


Ah, Windows Phones were very responsive.


They were and they weren't. UI was pretty responsive, but starting programs was often slow. In 8.1 and above, starting the home screen was sometimes slow. So many dots crossing the screen.


Are you sure they aren't being run through Rosetta? If I remember correctly, x86 apps run through a translation process on first launch which obviously takes time. An acceptable trade off given the alternative is to not have the app at all on Apple Silicon.


Launching apps is lighting fast on M1 mini, and much faster than on Intel that it's not even funny.

Perhaps you have in mind some specific problematic app?

Or you include Intel apps that are translated to ARM on the fly on their first launch?


Weird, are they Rosetta apps? I haven't seen that on my M1.



That was my first Mac, so maybe it is a MacOS issue not M1.

I just described my bad experience with M1 with Big Sur. Not sure who is to blame.


Ha - from another HN thread: https://fabiensanglard.net/lte/index.html


People say that Ryzen is faster - but if it has a fan, then this is a deal breaker. I have become so annoyed by fans I often just stop work when they start whirling, go make a coffee etc. It seems like M1 is now the only laptop that can let you work in silence and be productive.


usernanme checks out :)


Wow I am fluttered


I guess only the tape heads would understand that...


If fans are keeping you from working it might be time for water cooling or ear plugs.


I could appreciate this take if it wasn't for constantly getting side-eyed in a meeting room while my 16-inch MBP is blasting off because I forgot to close out a twitch.tv browser tab.


M1 will soon have a fan due to the overheating issue.

Although I imagine given how slow the response was on the butterfly keyboard issue it might be a few years out.


What overheating issue?

There isn't one in M1.


There isn't any thermal slowdowns on the M1 devices with a fan, no. The only M1 device without a fan shows thermal limitations, though.

Not an overheating issue, no, but the M1 still benefits from a fan, and obviously whatever Apple does in a Mac Pro class machine will also have a fan. They aren't going to keep at ~25W in a full size desktop, that'd be nonsense.


Only if you run it at 100% for like half an hour and even then the throttling is so minor that most reviewers said to not buy the MBP for a fan.


When was the last time Apple made anything with good thermal design?

The M1 is designed for this. Wide cores that do a lot per clock, but can't be clocked high is perfect, because Apple would rather not clock high. Intel and AMD are still fighting the MHz wars, and can't release a core design that doesn't approach 5 GHz, even if the lower power chips won't.


What overheating issue? There isn’t one.


That is all well and good. Great actually. And it makes sense. The high efficiency reduced instruction set computer has highly optimized architecture to improve efficiency. But we all knew that deep down. It's like the Ford Tempo. A mere 80 horsepower mated to a well designed and highly optimized transmission gave levels of performance that were higher than other cars in the same market.

What I can't stand are all the people saying the M1 is capable of replacing high end x86 computers. It's like owning a Tempo and mistakenly thinking your highly specialized transmission means you can walk up and challenge a 6.7L Dodge Challenger to a race. It's completely ridiculous and demonstrates a stunning lack of self awareness.


...This is partially true in that the M1 is overhyped by some...

But the M1 is also literally, by the numbers, the fastest machine in the world for some workloads. For us, my M1 Macbook Air compiles our large TypeScript Angular app in half the time of a current i9 with 64Gb ram on node v15. These are real-world, tested, and validated results. Nothing comes close..because the chip is designed to run JS as fast as physically possible. It's insanely fast at some workloads. Not all of them of course, but it's no Tempo. I would say it's more like a Tesla, faster than it should be considering the HP (a model S is much faster than cars with way more HP) but faster due to 'How' it makes the power rather than how much.

That said, I replaced a $2000 iMac 4K with a base model Macbook Air and it was a huge upgrade for my day to day work. It really is perfectly fine to replace some workstations.


> That said, I replaced a $2000 iMac 4K with a base model Macbook Air and it was a huge upgrade for my day to day work.

Essentially the same here, I just happened to replace my $2-3k Win10/Ubuntu desktop with my 8GB(!) M1 Air and it truly has been a huge upgrade for my day to day work. It feels like I'm working in the future.

That said - I primarily use this Air to ssh to all of my servers that do my heavy lifting for me. But that doesn't stop me from driving this thing hard - dozens of open tabs, multiple heavyweight apps left open (e.g. Office apps), multiple instances of VS Code, Slack, Teams, all running - zero slowdown. Zero fan sound.

It's black magic good.


What environment you compiling your large typescript app in? Windows with WSL on a 7400 rpm disk drive?

I think you have other things going on that's slowing down the compilation on the i9.

Maybe if run linux and install a fast SSD your i9 will be faster than your m1. Then by your logic you should ditch the m1.


Ironically it was a Samsung Pci-E drive in non-WSL Windows. A 16 inch Macbook Pro i9 also put out roughly alike times in MacOS. Node compile is a single-core workload with a very strong focus on small fast translated instructions, the M1 is actually that fast at that specific task.


https://doesitarm.com/tv/m1-macbook-vs-intel-i9-in-continuou...

That dude says both his intels are faster.


I think you're right on this. I've also been irritated at the "eGPUs / dGPUs are no longer necessary" crowd. If an SoC GPU with the performance / size of an RTX could fit into a notebook we'd actually have them in notebooks... but we don't. The fact of the matter is that this technology doesn't exist. So the allegations that GPUs are no longer necessary because SoCs have already wholesale replaced them are comical.


>What I can't stand are all the people saying the M1 is capable of replacing high end x86 computers.

Whether or not this is possible completely depends on their unique workload. The M1 could probaly obviate any x86 chip for the average programmers. But that wouldn't be the case for gamer workloads, for example.


Is it really about workload though?

A high end gaming PC will compile typescript, play games with high quality graphics, and run your 100 browser tabs. Let's be honest, an Intel Core 2 Duo would obviate most programmer workloads just fine.

By your logic the Apple M1 is comparable to a high-end x86 chips, but you go on in the same breath with the disclaimer that it is "only certain workloads."

That is my point. A high end PC doesn't care about your workload. It is unobjectively fast. Period. You are the Tempo driver claiming that you would have beaten that Challenger if the temperature of the drag strip were just a little bit hotter....

But that Challenger doesn't care about operating conditions or workload or whatever. It will win today, tomorrow, and the next day while you're claiming that the Tempo is "comparable in the snow on Tuesdays."


>Let's be honest, an Intel Core 2 Duo would obviate most programmer workloads just fine.

That's absolutely not the case. At least for my workflows. My work-issued five year old Intel i5 XPS would get incredibly slow, and incredibly hot when running unit tests or compiling code in Angular. Goes without saying that a Core 2 Duo would be simply unusable.

My personal machine was a 2018 i5 MacBook Pro. On that machine I could open just one iOS Simulator or Android Emulator and expect to have merely decent performance. And by decent, I mean the system would be running hot, the fans would be loud and the system would be noticeably slower. Having multiple simulators or emulators open is simply not a thing this Intel notebook was willing to do. Meanwhile on my M1, I've ran six iOS Simulators / Android Emulators at once, and the system did not slow down, at all (for reasons discussed in this article).

In a futile attempt to push my M1 to its limits in a dev workflow, I even trained TensorFlow models on it. Surprise, surprise: not only was the model trained in a decent time (for a notebook without a dGPU; I wasn't expecting miracles), but it did all this without any perceptible slowdown in system performance. The absolute worse thing I could have said about the M1s performance here was that the MacBook Air's chassis got a little hot. And keep in mind that when I way a little hot, I mean that it still produced less heat than my Intel notebooks often produce when watching a 4K YouTube video (and it does this all without a fan!).

So in short, no, the Intel Core 2 Duo would absolutely not be fine for my programmer workloads. And, yes, the M1 has pretty much obviated Intel's lineup for my work. Dare I say, the M1 MacBook Air has been the single biggest quality-of-life upgrade I've had as a developer. It's the first notebook I've used where I've never been bottlenecked by system performance. It's a fantastic machine through and through.


Is this an honest assessment though?

I'm usually running 5+ electron programs (IDE, Slack, Discord, etc), multiple browser tabs, test runners (Jest, Detox), IDE plugins (eslint, auto import, gitlens), an emulated terminal (iTerm2), multiple VMs (android, simulator), xCode, Android Studio, and that's just my base setup. From there add more casual programs like calendar, memos, omnifocus, spotify and so on. An intel core 2 duo would thrash at just 10% of that workload.


I was thinking hard about getting an m1 mini but the upgrade pricing for ssd/ram is offensive and you can't upgrade them later. I will likely end up getting a newer ryzen system but it will end up costing much more for the cpu+motherboard, but will have access to dedicated graphics and I can actually just move my current 32gb of ddr4, video card, and SSDs over.


I'm curious what you think of all the benchmarks that sais otherwise. In a race between a motorized lawn mower and a racecar - if the lawn mower wins which one is ultimately the best "racecar"?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: